HBASE-8732

1. Symptom

Opening a scanner fails on a file that has no encoding.

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes

1.2.1 Were there multiple exceptions?

Yes. java.io.IOException, RemoteWithExtrasException and RemoteWithExtrasException

1.3 Scope of the failure

Single request

2. How to reproduce this failure

2.0 Version

0.95.1

2.1 Configuration

2.2 Reproduction procedure

1 open a scanner on a file that has no encoding

2.2.1 Timing order

Single event

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

No

2.4 How many machines needed?

2 (RS+1 client)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: java.io.IOException: Could not seek StoreFileScanner[HFileScanner for reader reader=hdfs://localhost:57053/user/eclark/hbase/IntegrationTestModifyColumns/d2c63aa3399aaf7e40bf7d045c0bb1ca/test_cf/d020ed015d9b4c73b08b06192095e4be, compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false], firstKey=1115ec15a4637bb614390d16c81ea881-105445/test_cf:0/1370980134228/Put, lastKey=221cdbd49831660e254edeb0c4b51109-102317/test_cf:0/1370980122463/Put, avgKeyLen=59, avgValueLen=100, entries=6441, length=1089866, cur=null] to key 1a860448b5d2824f0a7163839fe04f6e-109693/test_cf:/LATEST_TIMESTAMP/DeleteFamily/vlen=0/mvcc=0

             at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:154)

             at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:160)

             at org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:1623)

             at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.<init>(HRegion.java:3507)

             at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1705)

             at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1697)

             at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1674)

             at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4452)

             at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4427)

             at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2743)

             at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20926)

             at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122)

             at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829)

Caused by: java.io.IOException: Cached block under key d020ed015d9b4c73b08b06192095e4be_590914_FAST_DIFF has wrong encoding: null (expected: FAST_DIFF)

             at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:319)

             at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)

             at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:469)

             at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:490)

             at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:222)

             at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:142)

             ... 12 more

 

             at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336)

             at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1540)

             at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1597)

             at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:21331)

             at org.apache.hadoop.hbase.protobuf.ProtobufUtil.get(ProtobufUtil.java:1233)

             ... 8 more

 

4. Root cause

The problem is the HFileBlockDefaultEncodingContext is not thread safe. onDiskBytesWithHeader, uncompressedBytesWithHeader, and blockType variables are currently shared between threads without any protection so this mangles the block headers.

4.1 Category:

Concurrency Bug

5. Fix

Index: hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java

===================================================================

--- hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java            (revision 1503919)

+++ hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java            (working copy)

@@ -41,7 +41,7 @@

 public class HFileDataBlockEncoderImpl implements HFileDataBlockEncoder {

   private final DataBlockEncoding onDisk;

   private final DataBlockEncoding inCache;

-  private final HFileBlockEncodingContext inCacheEncodeCtx;

+  private final byte[] dummyHeader;

 

   public HFileDataBlockEncoderImpl(DataBlockEncoding encoding) {

         this(encoding, encoding);

@@ -75,16 +75,7 @@

             onDisk : DataBlockEncoding.NONE;

         this.inCache = inCache != null ?

             inCache : DataBlockEncoding.NONE;

-        if (inCache != DataBlockEncoding.NONE) {

-          inCacheEncodeCtx =

-          this.inCache.getEncoder().newDataBlockEncodingContext(

-              Algorithm.NONE, this.inCache, dummyHeader);

-        } else {

-          // create a default encoding context

-          inCacheEncodeCtx =

-              new HFileBlockDefaultEncodingContext(Algorithm.NONE,

-                  this.inCache, dummyHeader);

-        }

+        this.dummyHeader = dummyHeader;

 

     Preconditions.checkArgument(onDisk == DataBlockEncoding.NONE ||

             onDisk == inCache, "on-disk encoding (" + onDisk + ") must be " +

@@ -166,7 +157,7 @@

           }

           // Encode the unencoded block with the in-cache encoding.

           return encodeDataBlock(block, inCache, block.doesIncludeMemstoreTS(),

-          inCacheEncodeCtx);

+          createInCacheEncodingContext());

         }

 

         if (block.getBlockType() == BlockType.ENCODED_DATA) {

@@ -256,6 +247,22 @@

         return encodedBlock;

   }

 

+  /**

+   * Returns a new encoding context given the inCache encoding scheme provided in the constructor.

+   * This used to be kept around but HFileBlockDefaultEncodingContext isn't thread-safe.

+   * See HBASE-8732

+   * @return a new in cache encoding context

+   */

+  private HFileBlockEncodingContext createInCacheEncodingContext() {

+        return (inCache != DataBlockEncoding.NONE) ?

+        this.inCache.getEncoder().newDataBlockEncodingContext(

+                Algorithm.NONE, this.inCache, dummyHeader)

+            :

+            // create a default encoding context

+            new HFileBlockDefaultEncodingContext(Algorithm.NONE,

+                this.inCache, dummyHeader);

+  }

+

 

5.1 How?

Re-creating the HFileBlockDefaultEncodingContext each time a block is read from disk.