HBase-5606 Report

1. Symptom

Mutation.getFamilyMap() crashes on downstream apps.

1.1 Severity

Critical.

1.2 Was there exception thrown?

IOException (or a subtype, such as EOFException)

1.2.1 Were there multiple exceptions?

No.

2. How to reproduce this failure

2.0 Version

0.92.1

2.1 Configuration

A downstream app that makes use of the getFamily and differs the order of the columns (and was written to run with any version up to 0.94.x) connected to an HBase 0.95.1.

In my case, I just wrote two simple apps, a java server and a java client, connected to each other. The server sends the new type - public NavigableMap<byte[],List<? extends Cell>> getFamilyMap() - while the client expects the old type public Map<byte[],List<KeyValue>> getFamilyMap().

2.2 Reproduction procedure

Run the server, run the client and fetch the FamilyMap by the getFamilyMap method.

2.2.1 Timing order

Simple.

2.2.2 Events order externally controllable?

Yes.

2.3 Can the logs tell how to reproduce the failure?

Yes.

2.4 How many machines needed?

One.

3. Diagnosis procedure

The diagnosis is straightforward. The client crashes with a IOException right after receiving the wrong Map.

3.1 Detailed Symptom (where you start)

The client crashes right after fetching a FamilyMap from the server.

3.2 Backward inference

The client crashed right after fetching the FamilyMap object and the exception was clear: EOFException. Something went wrong with the data received from the server. When checking the corresponding source-code on the server, we can see that the return type was changed and it does not match the expected return on the client anymore.

4. Root cause

Change made on the source code between HBase-0.94.11 and HBase-0.95.0

4.1 Category:

Semantic Error

5. Fix

5.1 How?

Restoring the old return type, marking the corresponding function as deprecated and putting a new function with the new return type.