User sees some nodes crash due to OOM. The user was only updating (writing) to the columns that were indexed.
resource leak
Critical
Yes. OOM.
no
yes
Clients who are using those nodes.
no
1.0.10
basic configuration
1
start cassandra
1. indexed column (feature start)
2. many updates to such columns (file write)
Eventually you will see OOM
2
In this order
yes
Yes. INFO [FlushWriter:1] 2012-06-07 10:52:09,078 Memtable.java (line 246) Writing Memtable-LocationInfo@91455740(29/36 serialized/live bytes, 1 ops)
1
OOM. There is even warning message before the final OOM:
WARN [ScheduledTasks:1] 2012-06-07 12:09:40,671 GCInspector.java (line 145) Heap is 0.9158073167593992 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
database (now at 1.0.10) is in a state in which it goes out of memory with hardly any activity at all. A key slice nothing more. Then client times out.
Out of memory problem was found. The pattern only shows during compaction of tombstones. We also see the user doing lots of indexed column overwrite which generates “deletes” in the indexed column family.
Each write on an indexed column will be translated internally into deletes and create (log-structured). Compaction is supposed to remove those deleted data (called tombstones). However, in the buggy version, compaction only removes tombstone if they are older than gc_grace_seconds on purely local table --- and this number is 10 days by default. Therefore the compaction won’t take any effects in removing these tombstones.
semantic
no
yes
Overided the garbage collection grace period to 0 for pure local table instead of inheriting the default value from parent. This way, tombstone would be immediately removed.
--- a/src/java/org/apache/cassandra/config/CFMetaData.java
+++ b/src/java/org/apache/cassandra/config/CFMetaData.java
@@ -251,7 +251,7 @@ public final class CFMetaData
.keyValidator(info.getValidator())
.keyCacheSize(0.0)
.readRepairChance(0.0)
- .gcGraceSeconds(parent.gcGraceSeconds)
+ .gcGraceSeconds(0)
.minCompactionThreshold(parent.minCompactionThreshold)
.maxCompactionThreshold(parent.maxCompactionThreshold);
}