1 of 17

CEP-37 Auto Repairs

Streamlining Repair Operations

2 of 17

Motivation

Why Auto-Repairs in Cassandra?

Anti-entropy repairs are essential for fixing data inconsistencies.
Frequent deletions and node downtimes increase inconsistency
Existing custom repair solutions create confusion in the community.
Integrated, automated repair like Compaction is crucial for a complete Cassandra solution

3 of 17

Proposal Overview

CEP-37 Auto-Repair Proposal

Aligning existing solutions into core Cassandra functionality.
Supports multiple repair types:

Full Repair
Incremental Repair
Preview Repair
Paxos Repair

Minimal operator intervention and fully automated repair orchestration.

4 of 17

Design of the Repair Scheduler

Dedicated thread pool manages repair scheduling.
Tracks repair history using a new replicated tables in the system_distributed keyspace.
Fully automated with retries for transient failures.
Reduces dependency on external orchestration and

control planes.

5 of 17

Maintaining a Global Repair View

Two key tables:

auto_repair_history: Tracks the repair status of each node.
auto_repair_priority: Allows admins to prioritize specific nodes.

Scheduler tracks node availability and repair progress.

Ensures globally consistent repair views across nodes.

6 of 17

Support for Multiple Repair Types

Each repair type (full, incremental, paxos, preview) runs independently.
Repair sessions handled via threads every 5 minutes.
Each repair type’s lifecycle tracked in metadata tables, ensuring no interference.

7 of 17

Cassandra Node

Auto-Repair Config

Default Scheduler Config

Full Repair Config Overrides

Incremental Config Overrides

Auto-Repair System Tables

repair_history

repair_priority

Full Repair

Inc�Repair

Full Repair

Inc�Repair

Auto-Repair Scheduler

Full Repairs

Incremental Repairs

8 of 17

Repair Flow

Periodic check every 5 minutes.
Node decides if it’s its turn to repair.
Announce repair start in auto_repair_history.
Repair keyspaces and tables sequentially.
Retry failed sessions up to 3 times.
Announce repair completion.

9 of 17

Configuration

YAML Configurations to enable/disable repair types
Control over:

Number of repair threads
Repair retry limits
Incremental repair disk usage

Admins can tune settings dynamically through nodetool
Table options to control which tables are repaired

10 of 17

Configuration - Table Properties

CREATE TABLE test.test (

key text PRIMARY KEY,

value blob

) WITH additional_write_policy = '99p'

AND bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND cdc = false

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND memtable = 'default'

AND crc_check_chance = 1.0

AND default_time_to_live = 0

AND extensions = {}

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair = 'BLOCKING'

AND speculative_retry = '99p'

AND automated_repair_full = {'enabled': 'true'}

AND automated_repair_incremental = {'enabled': 'true'};

11 of 17

Configuration - YAML

auto_repair:

enabled: true

repair_type_overrides:

full:

enabled: true

number_of_repair_threads: 2

repair_max_retries: 2

repair_primary_token_range_only: true

incremental:

enabled: true

number_of_repair_threads: 1

repair_max_retries: 3

repair_primary_token_range_only: false

12 of 17

Observability & Metrics

Monitoring Repair Progress

Metrics tracked in JMX:

Repairs in progress
Node and cluster repair time
Token ranges successfully repaired or skipped

Metrics enable dashboards and alarms to monitor automated repair activity
Can use repair virtual tables

13 of 17

Observability & Metrics

> nodetool getautorepairconfig

repair scheduler configuration:

repair eligibility check interval: 5m

TTL for repair history for dead nodes: 2h

max retries for repair: 3

retry backoff: 30s

configuration for repair type: incremental

enabled: true

minimum repair interval: 15m

repair threads: 1

number of repair subranges: 16

priority hosts:

sstable count higher threshold: 10000

table max repair time in sec: 6h

ignore datacenters:

repair primary token-range: true

number of parallel repairs within group: 3

percentage of parallel repairs within group: 3

mv repair enabled: false

initial scheduler delay: 5m

repair session timeout: 3h

14 of 17

Incremental Repair

Reliable Incremental Repair Onboarding

No need for restarts during incremental repair onboarding/offboarding
Ensures smooth migration for large clusters.

15 of 17

Incremental Repair

Migration

Dynamic setting of repaired_at

nodetool sstablerepairedset
Un-incremental repairing
sstablerepairedset

UnrepairedBytesBasedTokenRangeSplitter
New mechanism to prevent disk overfill (automatically stops if disk >80%)

16 of 17

Ship It! Currently in Use

Uber

Deployed in production
https://www.uber.com/blog/how-uber-optimized-cassandra-operations-at-scale/

Netflix

Deployed in test

Pull Request Available for review

https://github.com/apache/cassandra/pull/3598 - CEP-37 on Trunk

17 of 17

Questions