Coverage Matrix

Chkk Curated Release Notesv23.1 to latest
Private RegistrySupported
Custom Built ImagesSupported
Safety, Health, and Readiness Checksv24.2 to latest
Supported PackagesHelm, Kustomize, Kube
EOL InformationAvailable
Version Incompatibility InformationAvailable
Upgrade TemplatesIn-Place, Blue-Green
PreverificationAvailable

CockroachDB Overview

CockroachDB is a distributed, horizontally scalable SQL database designed for resilience and high availability. It automatically shards and replicates data across nodes using Raft consensus, enabling seamless recovery from node or datacenter failures. CockroachDB provides PostgreSQL compatibility, supporting ACID transactions and consistent, low-latency reads globally. Its geo-distribution capabilities, including geo-partitioning and follower reads, allow a single logical database across multiple regions or clouds, enhancing user proximity and performance. The architecture simplifies operational complexity, eliminating manual sharding or complex failover setups.

Chkk Coverage

Curated Release Notes

Chkk monitors CockroachDB’s official release notes, highlighting relevant features, breaking changes, and configuration deprecations that affect your clusters. Platform teams receive contextual summaries focused on operational impacts, such as changes in default GC settings or new mandatory operator flags. Chkk ensures you stay informed about essential upgrades without unnecessary details, identifying only the most critical adjustments required for seamless operations.

Preflight & Postflight Checks

Chkk’s preflight checks validate your cluster state and prerequisites, ensuring nodes are healthy, within supported upgrade paths, and free of under-replicated data. It detects deprecated SQL features or configurations needing updates before upgrades. Postflight checks verify that all nodes run the new version smoothly, confirming successful cluster finalization and identifying any performance or replication anomalies quickly.

Version Recommendations

Chkk continuously tracks CockroachDB’s version lifecycle, alerting when your version approaches end-of-life or poses known risks. Recommendations are based on official support timelines, community feedback, and documented compatibility issues. Chkk explicitly suggests stable, vetted upgrade targets to balance feature urgency with cluster stability, helping avoid problematic releases.

Upgrade Templates

Chkk provides detailed templates for both rolling in-place upgrades and blue-green migrations, incorporating CockroachDB’s best practices. Rolling upgrade templates guide node-by-node updates with careful quorum management, ensuring minimal disruption. Blue-green templates detail parallel cluster setups, controlled workload migrations, and built-in rollback points, reducing upgrade risks and downtime.

Preverification

Preverification rehearses CockroachDB upgrades in isolated test environments mirroring your production setup. Chkk identifies issues such as incompatible schema changes, performance regressions, and resource constraints before they impact your live environment. This “dry run” approach ensures adjustments can be made proactively, significantly reducing risks during actual upgrades.

Supported Packages

Chkk supports deployment via CockroachDB Kubernetes Operator, Helm charts, and standard Kubernetes YAML manifests. It integrates seamlessly with existing customizations, including private registries and bespoke builds. Chkk intelligently identifies and recommends only necessary configuration changes for each package format, preserving consistency across deployments and simplifying upgrades.

Common Operational Considerations

  • Quorum Requirements: Always maintain quorum (majority of nodes alive) to ensure cluster availability. Plan maintenance carefully to avoid accidental quorum loss, especially in smaller clusters.
  • Clock Sync: Ensure strict clock synchronization across nodes, ideally using NTP or Chrony. Excessive clock drift beyond default limits (500ms) will cause automatic node shutdown to prevent inconsistency.
  • Disk Capacity Management: Proactively monitor and maintain sufficient disk space, ideally below 85% usage. Disk exhaustion triggers automatic node shutdown; regular monitoring prevents unexpected downtime.
  • Time-Series Data Overhead: Internal telemetry storage consumes significant disk space over time. Regularly review and tune retention settings to manage this overhead effectively.
  • MVCC Garbage Collection: Deleted data isn’t immediately reclaimed due to MVCC version retention. Adjust garbage collection TTL or manually trigger compaction to quickly free space after bulk deletions.
  • Proper Node Decommissioning: Always use the official decommissioning procedure to safely remove nodes. This process ensures data is replicated properly elsewhere, maintaining cluster health and availability.

Additional Resources