1 of 17

CEP-37 Auto Repairs

Streamlining Repair Operations

2 of 17

Motivation

Why Auto-Repairs in Cassandra?

  • Anti-entropy repairs are essential for fixing data inconsistencies.
  • Frequent deletions and node downtimes increase inconsistency
  • Existing custom repair solutions create confusion in the community.
  • Integrated, automated repair like Compaction is crucial for a complete Cassandra solution

3 of 17

Proposal Overview

CEP-37 Auto-Repair Proposal

  • Aligning existing solutions into core Cassandra functionality.
  • Supports multiple repair types:
    • Full Repair
    • Incremental Repair
    • Preview Repair
    • Paxos Repair
  • Minimal operator intervention and fully automated repair orchestration.

4 of 17

Design of the Repair Scheduler

  • Dedicated thread pool manages repair scheduling.
  • Tracks repair history using a new replicated tables in the system_distributed keyspace.
  • Fully automated with retries for transient failures.
  • Reduces dependency on external orchestration and

control planes.

5 of 17

Maintaining a Global Repair View

Two key tables:

  • auto_repair_history: Tracks the repair status of each node.
  • auto_repair_priority: Allows admins to prioritize specific nodes.

Scheduler tracks node availability and repair progress.

Ensures globally consistent repair views across nodes.

6 of 17

Support for Multiple Repair Types

  • Each repair type (full, incremental, paxos, preview) runs independently.
  • Repair sessions handled via threads every 5 minutes.
  • Each repair type’s lifecycle tracked in metadata tables, ensuring no interference.

7 of 17

Cassandra Node

Auto-Repair Config

Default Scheduler Config

Full Repair Config Overrides

Incremental Config Overrides

Auto-Repair System Tables

repair_history

repair_priority

Full Repair

Inc�Repair

Full Repair

Inc�Repair

Auto-Repair Scheduler

Full Repairs

Incremental Repairs

8 of 17

Repair Flow

  1. Periodic check every 5 minutes.
  2. Node decides if it’s its turn to repair.
  3. Announce repair start in auto_repair_history.
  4. Repair keyspaces and tables sequentially.
  5. Retry failed sessions up to 3 times.
  6. Announce repair completion.

9 of 17

Configuration

  • YAML Configurations to enable/disable repair types
  • Control over:
    • Number of repair threads
    • Repair retry limits
    • Incremental repair disk usage
  • Admins can tune settings dynamically through nodetool
  • Table options to control which tables are repaired

10 of 17

Configuration - Table Properties

CREATE TABLE test.test (

key text PRIMARY KEY,

value blob

) WITH additional_write_policy = '99p'

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND cdc = false

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND memtable = 'default'

AND crc_check_chance = 1.0

AND default_time_to_live = 0

AND extensions = {}

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair = 'BLOCKING'

AND speculative_retry = '99p'

AND automated_repair_full = {'enabled': 'true'}

AND automated_repair_incremental = {'enabled': 'true'};

11 of 17

Configuration - YAML

auto_repair:

enabled: true

repair_type_overrides:

full:

enabled: true

number_of_repair_threads: 2

repair_max_retries: 2

repair_primary_token_range_only: true

incremental:

enabled: true

number_of_repair_threads: 1

repair_max_retries: 3

repair_primary_token_range_only: false

12 of 17

Observability & Metrics

Monitoring Repair Progress

  • Metrics tracked in JMX:
    • Repairs in progress
    • Node and cluster repair time
    • Token ranges successfully repaired or skipped
  • Metrics enable dashboards and alarms to monitor automated repair activity
  • Can use repair virtual tables

13 of 17

Observability & Metrics

> nodetool getautorepairconfig

repair scheduler configuration:

repair eligibility check interval: 5m

TTL for repair history for dead nodes: 2h

max retries for repair: 3

retry backoff: 30s

configuration for repair type: incremental

enabled: true

minimum repair interval: 15m

repair threads: 1

number of repair subranges: 16

priority hosts:

sstable count higher threshold: 10000

table max repair time in sec: 6h

ignore datacenters:

repair primary token-range: true

number of parallel repairs within group: 3

percentage of parallel repairs within group: 3

mv repair enabled: false

initial scheduler delay: 5m

repair session timeout: 3h

14 of 17

Incremental Repair

Reliable Incremental Repair Onboarding

  • No need for restarts during incremental repair onboarding/offboarding
  • Ensures smooth migration for large clusters.

15 of 17

Incremental Repair

Migration

  • Dynamic setting of repaired_at
    1. nodetool sstablerepairedset
    2. Un-incremental repairing
    3. sstablerepairedset
  • UnrepairedBytesBasedTokenRangeSplitter
  • New mechanism to prevent disk overfill (automatically stops if disk >80%)

16 of 17

Ship It! Currently in Use

  • Uber
  • Netflix
    • Deployed in test
  • Pull Request Available for review

17 of 17

Questions