A Comparative Study of Synthetic Data Generation Methods for Grammatical Error Correction
Max White, Alla Rozovskaya
Queens College, City University of New York
Grammatical Error Correction (GEC)
Original Sentence:
If any problems faced , i 'm went to my parents and say the problem , there are solve the problem .
Corrected:
If any problems arise , I go to my parents and tell them the problem , and they solve the problem .
The BEA 2019 Shared Task on GEC
Unrestricted Track: any resources allowed
Restricted Track: limited to publicly available learner corpora
Low Resource Track: significantly limited use of annotated data
Contributions
The Inverted Spellchecker Method (UEDIN-MS)
Original Sentence:
I started with Kopi last year and I played one game with him and that was it .
Noisified:
I started with Kopi last year any I player one game with him and than wad it .
Word | Confusion set |
and | ans an ad sand rand land hand band wand end aid ant add any |
was | saw wad as wars wast wads wags wasp wash ways wan war wag gas mas |
The Patterns+POS Method (Kakao&Brain)
Original Sentence:
I started with Kopi last year and I played one game with him and that was it .
Noisified:
I started in Kopi last year and I play one game with him , that is it .
Token | Error patterns |
with | with [removed] in on w/ |
played | played play plays |
was | was is [removed] got ‘s |
Fair Comparison of Methods
Results
| W&I+L dev | FCE test | ||||
Noising Method | P | R | F0.5 | P | R | F0.5 |
Inverted Spellchecker | 31.30 | 16.24 | 26.41 | 35.31 | 19.48 | 30.37 |
Patterns+POS | 42.96 | 20.00 | 34.94 | 41.55 | 19.94 | 34.15 |
Error-Type Analysis
Inverted Spellchecker Method Strengths:
Patterns+POS Method Strengths:
Conclusion