When running nodetool repair on cas01 node it gets stuck at some point. See following exceptions in cas01 system.log:
ERROR [Streaming to /10.10.45.60:28] 2013-04-02 09:03:55,353 CassandraDaemon.java (line 132) Exception in thread Thread[Streaming to /10.10.45.60:28,5,main]
java.lang.RuntimeException: java.io.EOFException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:193)
at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:114)
at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more
ERROR [Thread-2076] 2013-04-02 09:07:12,261 CassandraDaemon.java (line 132) Exception in thread Thread[Thread-2076,5,main]
java.lang.AssertionError: incorrect row data size 130921 written to /var/lib/cassandra/data/EDITED/content_list/footballsite-content_list-tmp-ib-3660-Data.db; correct is 131074
at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:285)
at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
On other machines there are also some exceptions:
ERROR [Thread-1424] 2013-04-02 09:07:12,248 CassandraDaemon.java (line 132) Exception in thread Thread[Thread-1424,5,main]
java.lang.AssertionError: incorrect row data size 130921 written to /var/lib/cassandra/data/EDITED/content_list/footballsite-content_list-tmp-ib-2268-Data.db; correct is 131074
at org.apache.cassandra.io.sstable.SSTableWriter.appendFromStream(SSTableWriter.java:285)
at org.apache.cassandra.streaming.IncomingStreamReader.streamIn(IncomingStreamReader.java:179)
at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:122)
at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:238)
at org.apache.cassandra.net.IncomingTcpConnection.handleStream(IncomingTcpConnection.java:178)
at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)
ERROR [Streaming to /10.10.45.58:55] 2013-04-02 09:07:12,263 CassandraDaemon.java (line 132) Exception in thread Thread[Streaming to /10.10.45.58:55,5,main]
java.lang.RuntimeException: java.io.EOFException
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(Unknown Source)
at org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:193)
at org.apache.cassandra.streaming.compress.CompressedFileStreamTask.stream(CompressedFileStreamTask.java:114)
at org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:91)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
... 3 more
Then see frozen status in nodetool netstats and repair never complete.
When running nodetool repair on any node in the cassandra cluster, the repair process gets stuck.
On the node which initiates the repair and the other nodes, the error logs are the similar: both java.io.EOFException and java.lang.AssertionError.
This occurs when the user is doing lots of deletes for the same column family.
Based on the exception stack trace, we only know that the assertion error occurs when reading incomming stream.
The logs are not sufficient to figure out the root cause without enough domain knowledge.
“The assertion is caused by element written twice on ColumnIndexer block boundary.
But column_index_size_in_kb is same on every node and set to default 64k.”
Duplication of columns on index block boundary when appending from stream.
It is hard to connect the ClolumnIndex with the error log.
Fixing the error. Fix the incorrect semantic.
The main logic to be fixed is not in exception stack trace.
Avoid duplication of columns on index block boundary when appending from stream (source stream already duplicated them). We should only write what we get from the stream.
• /src/java/org/apache/cassandra/db/ColumnIndex.java
@@ -99,7 +108,7 @@ public class ColumnIndex
public int writtenAtomCount()
{
- return atomCount + tombstoneTracker.writtenAtom();
+ return tombstoneTracker == null ? atomCount : atomCount + tombstoneTracker.writtenAtom();
}
/**
@@ -153,11 +162,11 @@ public class ColumnIndex
{
firstColumn = column;
startPosition = endPosition;
- // TODO: have that use the firstColumn as min + make sure we
- // optimize that on read
- endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, output, atomSerializer);
+ // TODO: have that use the firstColumn as min + make sure we optimize that on read
+ if (tombstoneTracker != null)
+ endPosition += tombstoneTracker.writeOpenedMarker(firstColumn, output, atomSerializer);
blockSize = 0; // We don't count repeated tombstone marker in the block size, to avoid a situation
- // where we wouldn't make any problem because a block is filled by said marker
+ // where we wouldn't make any progress because a block is filled by said marker
}
long size = column.serializedSizeForSSTable();
@@ -177,7 +186,8 @@ public class ColumnIndex
atomSerializer.serializeForSSTable(column, output);
// TODO: Should deal with removing unneeded tombstones
- tombstoneTracker.update(column);
+ if (tombstoneTracker != null)
+ tombstoneTracker.update(column);
lastColumn = column;
}
· /src/java/org/apache/cassandra/io/sstable/SSTableWriter.java
@@ -240,7 +240,7 @@ public class SSTableWriter extends SSTable
ColumnFamily cf = ColumnFamily.create(metadata, ArrayBackedSortedColumns.factory());
cf.delete(deletionInfo);
- ColumnIndex.Builder columnIndexer = new ColumnIndex.Builder(cf, key.key, columnCount, dataFile.stream);
+ ColumnIndex.Builder columnIndexer = new ColumnIndex.Builder(cf, key.key, columnCount, dataFile.stream, true);
OnDiskAtom.Serializer atomSerializer = cf.getOnDiskSerializer();
for (int i = 0; i < columnCount; i++)
{