Coverage Matrix
Chkk Curated Release Notes | v23.1 to latest |
Private Registry | Supported |
Custom Built Images | Supported |
Safety, Health, and Readiness Checks | v24.2 to latest |
Supported Packages | Helm, Kustomize, Kube |
EOL Information | Available |
Version Incompatibility Information | Available |
Upgrade Templates | In-Place, Blue-Green |
Preverification | Available |
CockroachDB Overview
CockroachDB is a distributed, horizontally scalable SQL database designed for resilience and high availability. It automatically shards and replicates data across nodes using Raft consensus, enabling seamless recovery from node or datacenter failures. CockroachDB provides PostgreSQL compatibility, supporting ACID transactions and consistent, low-latency reads globally. Its geo-distribution capabilities, including geo-partitioning and follower reads, allow a single logical database across multiple regions or clouds, enhancing user proximity and performance. The architecture simplifies operational complexity, eliminating manual sharding or complex failover setups.Chkk Coverage
Curated Release Notes
Chkk monitors CockroachDB’s official release notes, highlighting relevant features, breaking changes, and configuration deprecations that affect your clusters. Platform teams receive contextual summaries focused on operational impacts, such as changes in default GC settings or new mandatory operator flags. Chkk ensures you stay informed about essential upgrades without unnecessary details, identifying only the most critical adjustments required for seamless operations.Preflight & Postflight Checks
Chkk’s preflight checks validate your cluster state and prerequisites, ensuring nodes are healthy, within supported upgrade paths, and free of under-replicated data. It detects deprecated SQL features or configurations needing updates before upgrades. Postflight checks verify that all nodes run the new version smoothly, confirming successful cluster finalization and identifying any performance or replication anomalies quickly.Version Recommendations
Chkk continuously tracks CockroachDB’s version lifecycle, alerting when your version approaches end-of-life or poses known risks. Recommendations are based on official support timelines, community feedback, and documented compatibility issues. Chkk explicitly suggests stable, vetted upgrade targets to balance feature urgency with cluster stability, helping avoid problematic releases.Upgrade Templates
Chkk provides detailed templates for both rolling in-place upgrades and blue-green migrations, incorporating CockroachDB’s best practices. Rolling upgrade templates guide node-by-node updates with careful quorum management, ensuring minimal disruption. Blue-green templates detail parallel cluster setups, controlled workload migrations, and built-in rollback points, reducing upgrade risks and downtime.Preverification
Preverification rehearses CockroachDB upgrades in isolated test environments mirroring your production setup. Chkk identifies issues such as incompatible schema changes, performance regressions, and resource constraints before they impact your live environment. This “dry run” approach ensures adjustments can be made proactively, significantly reducing risks during actual upgrades.Supported Packages
Chkk supports deployment via CockroachDB Kubernetes Operator, Helm charts, and standard Kubernetes YAML manifests. It integrates seamlessly with existing customizations, including private registries and bespoke builds. Chkk intelligently identifies and recommends only necessary configuration changes for each package format, preserving consistency across deployments and simplifying upgrades.Common Operational Considerations
- Quorum Requirements: Always maintain quorum (majority of nodes alive) to ensure cluster availability. Plan maintenance carefully to avoid accidental quorum loss, especially in smaller clusters.
- Clock Sync: Ensure strict clock synchronization across nodes, ideally using NTP or Chrony. Excessive clock drift beyond default limits (500ms) will cause automatic node shutdown to prevent inconsistency.
- Disk Capacity Management: Proactively monitor and maintain sufficient disk space, ideally below 85% usage. Disk exhaustion triggers automatic node shutdown; regular monitoring prevents unexpected downtime.
- Time-Series Data Overhead: Internal telemetry storage consumes significant disk space over time. Regularly review and tune retention settings to manage this overhead effectively.
- MVCC Garbage Collection: Deleted data isn’t immediately reclaimed due to MVCC version retention. Adjust garbage collection TTL or manually trigger compaction to quickly free space after bulk deletions.
- Proper Node Decommissioning: Always use the official decommissioning procedure to safely remove nodes. This process ensures data is replicated properly elsewhere, maintaining cluster health and availability.