HBase-4078 Report
https://issues.apache.org/jira/browse/Hbase-4078
Data loss (column family loss) when HDFS error right when the column family just been compacted.
Blocker
yes: IO Exception and later data not found
Yes
One or a few column families.
0.90.3
Standard
1. Compact a memstore (feature start, long running)
2. HDFS failure (data corrupt)
In this order.
No. HDFS failure must occur right after compact and before “completeCompaction” is called.
Yes.
2 (RS + HDFS)
The exception:
java.io.IOException: java.lang.IllegalArgumentException: Invalid HFile version: 2162721 (expected to be between 1 and 2)
You can observe from the stack trace,
During completeCompaction, the renaming from:
.tmp/REGION_ID to colfamily11/8dc5109d70a240e7887c81bd934dbc16 failed, and the IOException was not handled at all.
StoreFile completeCompaction(final Collection<StoreFile> compactedFiles,
final StoreFile.Writer compactedFile)
throws IOException {
// 1. Moving the new files into place -- if there is a new file (may not
// be if all cells were expired or deleted).
StoreFile result = null;
if (compactedFile != null) {
+ validateStoreFile(compactedFile.getPath());
Path p = null;
try {
p = StoreFile.rename(this.fs, compactedFile.getPath(),
} catch (IOException e) {
LOG.error("Failed move of compacted file " + compactedFile.getPath(), e);
return null;
}
--- They should have re-validate the file path.
Incorrect error handling (handled), but they should have validated the file.