> ## Documentation Index
> Fetch the complete documentation index at: https://docs.chkk.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Glossary of Terms

> Definitions of key Chkk terms to help you navigate our platform and docs

## Account

A grouping of Chkk resources, such as clusters and subscriptions.
*Accounts* often map to a specific **billable entity boundary** in the your organization.

## Actions

An action is a single, side‑effect‑bearing operation executed by the Workflow—e.g., invoking a REST API, calling an internal micro‑service, running a shell command, or posting to Slack.

## Project Upgrade Plan

Similar to a [Cluster Upgrade Plan](#cluster-upgrade-plan) but specific to a particular [Cloud Native Project](#cloud-native-project),
[Application Service](#application-service), or [Operator](#kubernetes-operator) instance in a given cluster.

Created from an approved [Project Upgrade Template](#project-upgrade-template).

## Project Upgrade Template

An agentic workflow for a specific [Cloud Native Project](#cloud-native-project), [Application Service](#application-service),
or [Operator](#kubernetes-operator) across multiple clusters.

While simpler Projects and Application Services can be upgraded as part of the Cluster Upgrade Template,
stateful or datapath Projects and Application Services have their own templates for specialized handling.

## AI Agent

A software entity that perceives its environment, reasons about what to do next, and autonomously executes actions (often by calling tools) to achieve a user-specified goal.

## Application

An *Application* is a type of [Project](#project) that **implements business logic or end-user-facing functionality** and is deployed on Cloud Native platforms as part
of an [Application Stack](#application-stack).  It typically consists of one or more services and depends on both [Application Services](#application-service)
and [Cloud Native Projects](#cloud-native-project).

## Application Service

An **Application Service** is a type of Project that provides essential
services to the rest of an application stack. An application stack is one or
more Projects that provide related non-Kubernetes functionality -- in other
words, application-specific functionality.

Application Services do *not* extend or implement cluster-level Kubernetes
functionality.

Examples of Application Services include:

* `MySQL`,
  `PostgreSQL` and
  `Redis`

  These provide database services to Application Stacks.

* `Kafka` and
  `NATS`

  These provide message queueing and event streaming services to Application
  Stacks.

* `ArgoCD`,
  `FluxCD`

  Both are [Kubernetes Custom Controllers](/resources/glossary#kubernetes-custom-controller) that provide
  provide GitOps and CI automation services to Application Stacks running on
  Kubernetes.

* `ArgoRollouts` and
  `Flagger`

  Both are [Kubernetes Custom Controllers](/resources/glossary#kubernetes-custom-controller) that provide
  progressive delivery services for Application Stacks.

* `Crossplane` and
  `ACK S3 Controller`

  Both are [Kubernetes Custom Controllers](/resources/glossary#kubernetes-custom-controller) that provide
  cloud resources and service integrations to Application Stacks.

## Application Stack

An *Application Stack* is one or more [Projects](#project) that provide related
non-Kubernetes functionality -- in other words, application-specific
functionality.

## Blast Radius

The **potential scope of impact** an error, failure, or disruption may have on your system.
It represents how far negative consequences can propagate within your environment.

## Chkk Cloud Connector

*Chkk Cloud Connector* is a secure, read-only integration that fetches relevant resource data from your
cloud environment and correlates it with your [Kubernetes Clusters](#kubernetes-cluster). By focusing on resources that
affect — or are affected by — your clusters (e.g., security groups, IAM roles, networking settings), the Connector
facilitates a unified view of your infrastructure. Cloud Connector metadata is used to detect [Operational Risks](#operational-risk-or) that
span beyond the K8s layers. Examples of such cross-layer risks include security groups misconfigurations, risks that
are only latent for certain kernel versions, risks that only trigger when using certain AWS services
(e.g. Route53 records with external-dns), etc.

Details of Chkk Cloud Connector can be found [here](/connectors/cloud).

## Chkk Dashboard

Chkk Dashboard is a UI for you to interact with Chkk.

Details of Chkk Dashboard can be found [here](/overview/understanding-chkk).

## Chkk Integrations

A set of pre-built connectors which allow Chkk to fit smoothly with your existing tools.
Integrations include operational tools (GitHub, Jira, Slack, etc.) and SSO (Okta, PingIdentity).

## Chkk Kubernetes Connector

The *Chkk Kubernetes Connector* is composed of two main components:

* Chkk Operator
* Chkk Agent
  Working together, these components periodically (or on-demand) extract cluster metadata and ingest it
  into the Chkk SaaS platform. Once ingestion is complete, Chkk scans and analyzes your environment for
  potential risks or helpful insights (e.g., Cloud Native Project instances running in your cluster).

Details of Chkk Kubernetes Connector can be found [here](/connectors/kubernetes)

## Chkk Proxy Filter

Redacts any private or sensitive data before it leaves your cluster. It automatically excludes Kubernetes
secrets and can be configured to exclude additional data as needed.

## Chkk Research Team

A dedicated team of Kubernetes experts that reviews and curates [Operational Risks](#operational-risk-or) to make them actionable.

## Cluster Metadata

Information about a [Kubernetes Cluster](#kubernetes-cluster)—versions, [Pods](#kubernetes-pod), [Cloud Native Projects](#cloud-native-project), cloud-provided components, node configs, etc.
Secrets and config maps are redacted by default, and additional filters can be configured.

## Cluster Upgrade Plan

An instantiation of a [Cluster Upgrade Template](#cluster-upgrade-template), customized for a specific [Kubernetes Cluster](#kubernetes-cluster).
The instantiated Upgrade Plans inherit all the information present in Upgrade Templates + additional cluster-specific information like:

* The cluster's name and region
* Node Group details
* Which [Applications](#application) are using deprecated/removed APIs?
* Are there any Application client changes in [Cloud Native Projects](#cloud-native-project)?
* Are there any Application misconfigurations-like incorrect Pod Disruption Budgets (PDBs)-that can cause the upgrade to fail?

Upgrade Plans move through these statuses:

1. **Waiting for Plan** - Being generated.
2. **In Progress** - Ready for execution.
3. **Completed** - Cluster upgraded successfully.
4. **Canceled** - User-initiated cancellation.

[More details here](/overview/understanding-chkk)

## Cluster Upgrade Template

A *Cluster Upgrade Template* is an agentic workflow containing a tested and structured sequence of steps and stages to
safely upgrade your [Kubernetes Clusters](#kubernetes-cluster). A Cluster Upgrade Template is generated on-demand and is scoped to an
Environment (e.g. dev, staging or prod). Cluster Upgrade Templates support three commonly-used upgrade patterns: In-Place, Blue-Green, and Rolling/Surge.

A template includes steps for each stage of an upgrade, preverified via a [Digital Twin](#digital-twin). Once an Upgrade Template is generated, you can perform
the following actions::

* **Action**: Add custom markdown steps before or after any step.
* **Action**: Request regenerations for refined versions.
* **Action**: Approve the template so other team members can instantiate it as an Upgrade Plan for specific
  clusters.

Upgrade Templates move through these statuses:

1. **Waiting for Template** - Being generated.
2. **Available** - Ready for review/customization.
3. **Approved For Use** - May now be used to create Upgrade Plans.
4. **Environment Upgraded** - All clusters in the environment have used this template to upgrade.

[More details here](/overview/understanding-chkk)

## Collective Learning

Collective Learning comprises a suite of technologies that **codify operational wisdom to prevent incidents,
breakages, and disruptions.**

Collective Learning has two main parts:

1. [Operational Risk Signature Database (RSig DB)](#operational-risk-signature-database-rsig-db)
2. [Knowledge Graph](#knowledge-graph)

These components are defined on the [Understanding Chkk](/overview/understanding-chkk) page, with additional
etails provided on the [Technology](https://chkk.io/technology) page.

## Deactivated Clusters

[Kubernetes Clusters](#kubernetes-cluster) explicitly offboarded from Chkk via a **"Deactivate Cluster"** action in the
[Chkk Dashboard](#chkk-dashboard). Deactivation in the dashboard doesn't remove all Chkk components; you can find instructions for
complete removal in the [Troubleshooting](/resources/troubleshooting) page.

## Digital Twin

A *Digital Twin* is a virtual replica of your infrastructure, simulating how it runs and interacts.
There are four levels of Digital Twins:

* **Level 1**: Basic replica using the same cluster version, node versions, and [Cloud Native Project](#cloud-native-project) versions with default config.
* **Level 2**: Extends Level 1 by including custom configs of all [Cloud Native Projects](#cloud-native-project).
* **Level 3**: Adds dummy [Applications](#application) to emulate functionality on top of Level 2.
* **Level 4**: Fully functioning staging environment, an exact replica of your cluster (including real Applications), typically running within your own cloud account.

## Disconnected Clusters

[Kubernetes Clusters](#kubernetes-cluster) that have not been [Deactivated](#deactivated-clusters) but are not sending metadata to Chkk. They appear with an alert
icon in the [Chkk Dashboard](#chkk-dashboard) so you can diagnose why the [Chkk Kubernetes Connector](#kubernetes) isn't sending data.

## Grounding

Grounding constrains an AI system’s outputs to verifiable fact and policies.

## Grounding Layer

A curated corpus that integrates all authoritative sources consumed by the Chkk Knowledge Engine. AI pipelines ingest and normalize each source, while the Chkk Research Team continuously reviews and validates the content. The layer models clouds, Cloud Native Projects, and application services so that agents, workflows, and tools can reference the same trusted schema, avoid hallucinations, and maintain provable accuracy for every knowledge attribute.

## Guardrails

Rules and policies from a cloud provider, Cloud Native Project vendor, distribution, or the open source community.
Whether or not to follow a Guardrail is a business/team decision.

## Helm Chart

A *Helm Chart* is a type of [Package](#package) that uses the `helm` [Package System](#package-system) to format its artifacts (called "charts").

## Knowledge Base Article (KBA)

A single page explaining an [RSig](#operational-risk-signature-rsig) or [Guardrail](#guardrail)—covering its severity, impact,
trigger conditions, remediation steps, and potentially code snippets. All RSigs and Guardrails have an associated KBA.

KBAs support multiple actions on Operational Risks and Guardrails, such as:

1. **Action: Create Ticket** - Generates a Jira ticket for tracking.
2. **Action: Mark** - Mark as False Positive, By Design, or other reasons if not fixing.
3. **Action: Ignore** - Stop receiving notifications for this risk (optionally ignoring only specific resources).

## Knowledge Graph

*Knowledge Graph* models and stores AI-curated data and relationships across hundreds of [Cloud Native Projects](#project)
in the ecosystem, modeling their impact and identifying the safest upgrade paths.
Oversight is provided for AI-curated data and relationships by the Chkk Research Team.

Knowledge Graph covers Kubernetes releases of all major clouds and distributions:
EKS, GKE, AKS, VMware Tanzu, OpenShift, Rancher RKE1/RKE2, Nutanix. We also support DIY and Self-Hosted Kubernetes clusters.
Chkk also covers 300+ Projects, and coverage for a new Project can be extended within 48hrs.

## Cloud Native Project

A **Cloud Native Project** is a type of Project that **extends Kubernetes cluster
functionality but is not part of the Kubernetes core**. Cloud Native Projects
typically run inside the cluster as regular workloads (e.g., Deployments,
DaemonSets) and provide services like networking, monitoring, logging, and DNS.

Cloud Native Projects typically modify or enhance *cluster-level behavior*.

Examples of Cloud Native Projects include:

* `CoreDNS` and `kube-dns`

  They provide cluster DNS services, a critical but optional functionality of
  the Cloud Native cluster used for service discovery.

* `Amazon VPC CNI`, `Cilium` and
  `Calico`

  They provide implementations of Kubernetes data plane networking and
  Container Networking Interface (CNI) plugins.

* `Amazon EBS CSI Driver` and
  `AzureDisk CSI Driver`

  They provide implementations of the Container Storage Interface (CSI)
  plugin for plumbing Kubernetes Volumes to a cloud provider-specific storage
  backend.

* `Ingress NGINX controller` and
  `AWS Load Balancer Controller`

  They provide implementations of Kubernetes Service and Ingress
  functionality, which are core Kubernetes Resources.

* `Kubernetes Metrics Server` and
  `kube-state-metrics`

  They provide cluster-level aggregation of resource usage and resource health
  metrics that are used by other Kubernetes components like Horizontal Pod
  Autoscaler and Vertical Pod Autoscaler.

* `External Secrets Operator`

  Despite being called an "Operator", the External Secrets Operator is *not* a
  [Kubernetes Operator](#kubernetes-operator) because it does
  not install or manage the lifecycle of some *other* Cloud Native Project or
  Application Service. However, it *is* a Cloud Native Project because it implements
  storage functionality for Cloud Native Secrets and therefore extends the
  Kubernetes cluster's functionality.

## Kubernetes Cluster

A *Kubernetes Cluster* (or just "Cluster") is a group of physical of virtual
machines ([Kubernetes Nodes](#kubernetes-node)) that run containerized
applications.

## Kubernetes Controller

A *Kubernetes Controller* is software that follows the [controller design
pattern][kube-ctrl-pattern]. This design pattern features a control loop (also
called a "reconciliation loop") that repeatedly attempts to make the actual
state of some resource match the desired state of that resource.

## Kubernetes Custom Controller

A *Kubernetes Custom Controller* is a [Kubernetes
Controller](#kubernetes-controller) that tracks and reconcile [Kubernetes
Custom Resources](#kubernetes-custom-resource).

## Kubernetes Custom Resource

A *Kubernetes Custom Resource* is a special type of [Kubernetes
Resource](#kubernetes-resource) that falls outside of the core Kubernetes API
groups.

## Kubernetes DaemonSet

A *Kubernetes DaemonSet* (or just "DaemonSet") is a type of [Kubernetes
Resource](#kubernetes-resource) that describes a related set of
[Pods](#kubernetes-pod) that will run on every [Node](#kubernetes-node) in the
[Kubernetes Cluster](#kubernetes-cluster).

## Kubernetes Deployment

A *Kubernetes Deployment* (or just "Deployment") is a type of [Kubernetes
Resource](#kubernetes-resource) that describes a related set of
[Pods](#kubernetes-pod) that run an application workload.

## Kubernetes Deployment System

A *Kubernetes Deployment System* is something that manages the rollout of
[Kubernetes Resources](#kubernetes-resource) like
[Deployments](#kubernetes-deployment), [DaemonSets](#kubernetes-daemonset) and
[StatefulSets](#kubernetes-statefulset).

## Kubernetes Operator

A **Kubernetes Operator** is a type of Project that is responsible for
installing and **managing the lifecycle of another Cloud Native Project or
Application Service**.

Generally, Kubernetes Operators encode domain-specific knowledge to manage
complex stateful software on Cloud Native platforms.

<Tip>
  A Kubernetes Operator is often confused with a [Kubernetes
  Controller](#kubernetes-controller). Almost all Kubernetes
  Operators use the Cloud Native Controller design pattern, but not all software
  Projects that use the Cloud Native Controller design pattern are Kubernetes
  Operators!
</Tip>

A Kubernetes Operator is sometimes mistakenly defined as a Kubernetes
Controller that uses Kubernetes Custom Resources. The more accurate term for
*that* concept is \["Kubernetes custom
controller"]\(#kubernetes-custom-controller]. What
differentiates a Kubernetes Operator is the focus on **managing complex
lifecycle operations for some *other* piece of software**.

Examples of Kubernetes Operators include:

* `Postgres Operator`

  A [Kubernetes Custom Controller](#kubernetes-custom-controller)
  that manages the installation and lifecycle of PostgreSQL database servers,
  database schemas and database users.

* `Prometheus Operator`

  Manages the installation and lifecycle of Prometheus metrics database and
  associated Application Services like Prometheus Alertmanager.

* `OpenTelemetry (OTEL) Operator`

  Manages the installation, lifecycle management and configuration of
  OpenTelemetry collectors and auto-instrumentation libraries.

## Kubernetes Pod

A *Kubernetes Pod* (or just "Pod") is a type of [Kubernetes
Resource](#kubernetes-resource) that describes a group of containers with
shared storage and network resources.

## Kubernetes Node

A *Kubernetes Node* (or just "Node") is a physical or virtual machine that
comprises part of a [Kubernetes Cluster](#kubernetes-cluster).

## Kubernetes Resource

A *Kubernetes Resource* is a representation of the desired and observed state
of some object. Common Cloud Native platforms Resources are [Deployments](#kubernetes-deployment),
[DaemonSets](#kubernetes-daemonset) and [StatefulSets](#kubernetes-statefulset).

## Kubernetes StatefulSet

A *Kubernetes StatefulSet* (or just "StatefulSet") is a type of [Kubernetes
Resource](#kubernetes-resource) that describes a related set of
[Pods](#kubernetes-pod) that run an application workload. The difference
between a StatefulSet and a [Deployment](#kubernetes-deployment) is that the
Pods in a StatefulSet typically have an ordered initialization and the
application workload maintains some form of persistent state.

## Mitigation

Mitigation is a short‑term workaround that lowers the probability or impact
of a risk until full remediation can occur. Typical mitigations include disabling
a feature flag, throttling traffic, or rolling back a change.

## Mitigation Workflow

A mitigation workflow is a durable workflow that automates or guides the application of mitigations.

## Notifications

Inform about team invites, cluster onboarding, newly detected [Operational Risks](#operational-risk-or), or published
[Project Upgrade Templates](#project-upgrade-template), [Cluster Upgrade Templates](#cluster-upgrade-template),
[Project Upgrade Plans](#project-and-application-service-upgrade-plan), and [Cluster Upgrade Plans](#cluster-upgrade-plan).

Chkk supports Email, Slack, and in-app notifications.

## Operational Risk (OR)

*Operational Risk* refers to any known or potential defect, misconfiguration, or incompatibility in Cloud Native
infrastructure that can cause incidents, disruptions, or breakages. These risks, which may include known
defects or issues stemming from unsupported versions, deprecated APIs, and software nearing end-of-life, are
categorized by severity—Critical, High, Medium, or Low. An Operational Risk is detected by scanning for at-risk components,
identifying trigger conditions, and assessing availability impact, root cause, remediation steps, and possible mitigations.
In Chkk, these risks are codified as [Risk Signatures (RSigs)](#operational-risk-signature-rsig) that continuously scan customer environments to proactively
uncover and address Operational Risks before they cause breakages or outages.

## Operational Risk Signature (RSig)

An *RSig* is the logic used to detect the presence of a specific [Operational Risk](#operational-risk-or) in your environment.

## Operational Risk Signature Database (RSig DB)

*RSig DB* takes inspiration from cybersecurity, where security vulnerabilities are reported publicly in the CVE Database.
We extended this idea to operational safety: If there's an Operational Risk (e.g. an error, failure, or disruption)
that has happened anywhere in the world, Chkk AI aggregators and data connectors learn about it, convert it into a
[Operational Risk Signature](#operational-risk-signature-rsig)-similar to a virus signature-and store it in the RSig DB.
Any new Operational Risk Signature is streamed to all our customers, where it is scanned in their environments.
That way, our customer can proactively detect, identify, and remediate Operational Risks before they cause breakages and
disruptions, much like antivirus software detects and removes viruses before they start causing harm.

## Organization

A grouping of multiple Chkk Accounts under a common ownership, typically matching an entire company or large department.

## Package

A *Package* is a named bundle of software that is bundled (and identified) in
the format of a specific [Package System](#package-system).

## Package System

A *Package System* identifies of a packaging ecosystem.

## Preverification Engines

Preverification Engines create a [Digital Twin](#digital-twin) of your Cloud Native environment and simulate a
[Project Upgrade Template](#project-upgrade-template) or [Cluster Upgrade Template](#cluster-upgrade-template) before its published to you.
Ensuring that all steps are verified to execute without errors.

## Project

A *Project* is **software that provides some functionality**.

## Project Release Series

A [Project](#project) may have one or more *Project Release Series*. A Project
Release Series is a single release *series* for the Project.

Project Release Series are identified by a simple string that must be unique for the
Project. Typically the ProjectRelease Name will be a `major` or `major.minor`
version series, e.g. "4" or "1.28", however this is not universally true. Some
Projects use date-based names for the release series, like "2024.04".

## Project Release

A [Project](#project) may have one or more *Project Releases*. A Project
Release is the coordinated publication of one or more [Release
Artifacts](#release-artifact) for the Project having the same Version string.

[deps-dev]: https://docs.deps.dev/api/v3/

[kube-ctrl-pattern]: https://kubernetes.io/docs/concepts/architecture/controller/#controller-pattern

## Remediation

Remediation is the permanent correction of a defect or risk, aimed at eliminating the root cause rather than merely reducing impact. Remediation is the long-term fix.

## Remediation Workflow

The sequence of actions stitched together into a workflow to rollout remediations of a risk or a defect. For instance, code changes, or package upgrades, and validates success via post‑conditions and metrics.

## Revalidation

*Revalidation* starts as soon as a new cluster is onboarded. The first pass of Revalidation leverages multiple
[Classifiers](#classifiers) (part of [Collective Learning](#collective-learning)) to catalog what's running in your fleet and where.

Once Classifiers finish, a secondary pass is kicked off to extend coverage and remove false positives.

This process can take up to 24 hours. You can also report false positives via the **Action: Feedback**.

[Details of Chkk's coverage dimensions here](/overview/understanding-chkk)

## Risk Scan (aka RSig Scan)

An *RSig Scan* matches your running Cloud Native environment against relevant [RSigs](#operational-risk-signature-rsig) in the
[RSig DB](#operational-risk-signature-database-rsig-db).

Scanning an RSig has two stages:

1. **Contextualize** - Check if the RSig's components/versions are present in your environment.
2. **Test** - If the context is relevant, compare version numbers and conditions to determine if the RSig is present.

RSig Scans can be periodic (default: every 12 hours), on-demand, or event-driven (triggered by CI/CD).

You can also set the frequency of the scans--see instructions [here](/connectors/kubernetes#chkk-agent-helm-configuration)

## Scan Engines

Scan Engines identify [Operational Risks](#operational-risk-or) at various layers of Kubernetes and the underlying cloud infrastructure.
Multiple engines run in parallel, each focusing on a subset of risks.

## Source‑Grounded AI

Source‑grounded AI produces answers that link directly back to the underlying asset record or source, enabling users to verify every factual claim and trace the agent’s chain of thought.

## Task‑Specific AI Agent

A task‑specific AI agent is an agent pre‑loaded with exactly one core skill and an intentionally narrow toolset, making its behavior easier to predict, govern, and audit.

## Team

A group of users or Cloud Identities (such as AWS Accounts, IAM users, and IAM roles) within a [Chkk Organization](#organization).
Team members may be assigned ownership of certain resources, like clusters or certain Cloud Native Projects across the fleet of clusters.

## Tool

A tool is any external capability—software function, API endpoint, database query, CLI command, or workflow-that an AI Agent can invoke to extend its own reasoning and perception.

## Trigger

A trigger is the initiating event, schedule, or external signal that instantiates a workflow run. It can be time‑based (cron, interval), state‑based (configuration drift detected), or externally invoked.

## Upgrade Agent Scratchpad

A **Scratchpad** is a hidden, repo-scoped workspace where the agent stages transient artifacts during an upgrade—diffs, CRD archives, rendered manifests, logs, and small caches. It's **excluded from version control** and can be safely deleted at any time.

**Why the Scratchpad exists (server + agent perspective)**

* **Keep PRs focused.** Reasoning artifacts (diffs, logs, bundles) never land in commits—only the intended edits do.
* **Speed & reproducibility.** Per-run manifests and caches enable faster retries and leave an inspectable trail.
* **Right-sized context.** The MCP server streams structured context; large/ephemeral files live locally instead of over-stuffing prompts.

**Resolution order (where the agent looks)**

1. `CHKK_SCRATCHPAD_DIR` environment variable
2. `.chkk/scratchpad/<agent>/` at the repository root
3. If none exist, the agent creates:

```text theme={"dark"}
.chkk/scratchpad/upgrade-agent/
```

**Typical contents**

* `*_diff.*` (e.g., `helm_values.diff`, template diffs)
* `crd_bundle.(json|tar.gz)` (when relevant)
* `APPLIED_CHANGES.md` and per-run logs
* Small caches for faster iterative runs

<Info>
  The MCP server streams context; your agent performs edits locally and uses the Scratchpad to stage transient state.
</Info>

## Upgrade Assessment

An **Upgrade Assessment** is a high-level report that maps the scope, impact, and dependencies of upgrading a Kubernetes cluster—along with its Cloud Native Projects and application services—to the next minor version.
It surfaces required version hops, deprecated APIs, and other potential blockers across platform and application layers so teams can gauge readiness and remediate issues early.

## Workflow

A workflow is a sequence of coordinated steps that binds AI agents, tools, humans, and even nested workflows into a single deterministic process to achieve a defined operational outcome. Each step’s intent, inputs, outputs, and side-effects are durably recorded, so the entire flow can be replayed, resumed, or audited—guaranteeing that every decision and external action is applied exactly once, in order, and with full lineage.

## Workflow Engine

Workflow Engine persists state for long-running multi-step operations such as Upgrade Templates and Upgrade Plans.

Upgrade paths are pre-verified on a [Digital Twin](#digital-twin) of the underlying cluster, ensuring predictable and safe execution before changes are applied.\
After pre-verification, Workflow Engine generates long-running, durable upgrade workflows that are highly contextualized and pre-verified to prevent failures.