[hadoop]MAPREDUCE-5166 Report

1. Symptom

ConcurrentModificationException in LocalJobRunner

1.1 Severity


1.2 Was there exception thrown?


[junit] java.util.ConcurrentModificationException

[junit]         at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793)

1.2.1 Were there multiple exceptions?


1.3 Scope of the failure

Users try to run mapper execution using LocalJobRunner

2. How to reproduce this failure

2.0 Version

1.2.0, 2.0.4-alpha

2.1 Configuration

1 LocalJobRunner

2.2 Reproduction procedure

1. Try to run a map-reduce job locally using LocalJubRunner (feature start)

2.2.1 Timing order

Mapper running on a single machine.

When it tries to update the status, there will be exceptions.

2.2.2 Events order externally controllable?


2.3 Can the logs tell how to reproduce the failure?


2.4 How many machines needed?

1 LocalJobRunner

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

LocalJobRunner throws a ConcurrentModificationException exception when running locally.

3.2 Backward inference

LocalJobRunner will try to modify a static const variable when running locally, thus throws the exception.

4. Root cause

In the getCurrentCounters function of LocalJobRunner, it tries to add on the static value EMPTY_COUNTERS, which leads to the exception.

4.1 Category:


5. Fix

5.1 How?

Fix re-use of EMPTY_COUNTERS by using 'new Counters()'

Implement a readonly version of EMPTY_COUNTERS which throws exceptions on modifications.

Fix LJR.statusUpdate to do a deep-copy of Counters using Writable.write & Writable.readFields.