Comp Ling II (Boyd-Graber)
Assignment 1: Machine Translation
Due: 15. February 2012
In this homework assignment, you will implement two basic machine translation models: IBM Model 1 and IBM Model 2. Check this page often, as I will update it as people ask questions.
What to Download
You can download data, sample output, and stub source code from this location:
http://terpconnect.umd.edu/~ying/cl2/hw1/
Unit Tests
Unit testing is a good way to develop code. Make sure that your code passes the (basic) unit tests provided - you can see how you’re doing by running the command:
python hw1/ibm_trans_test.py
Note that many of the tests will fail before you’ve started modifying the code, as you modify the code, more and more of the tests should go from failing to passing. It’s possible that different assumptions might lead to correct code not passing a unit test - if so, bring up the issue on the class mailing list.
The tests (and example alignments) defined in the unit tests depend on the translation direction (as should be obvious). You may need to tweak the tests to make them useful to your assumptions. This isn't part of the assignment, but it might be helpful during development.
Data
The data are gzipped samples from Europarl: http://www.statmt.org/europarl/
Each line is an aligned sentence. There is a Python class that iterates through the aligned sentences.
What to Do
What to Turn In
Grading
This homework (and all homeworks) will be out of 100 points possible as follows.
Points | Aspect |
40 | Correctness of Algorithm |
30 | Description of what you did
|
10 | Documentation/readability of the code
|
10 | Example outputs (as required above) |
10 | Good programming practice
|
Questions
Extra Credit
It’s possible to get more than the total number of points by attempting these extra credit problems. I think they’re doable, but it’s possible that there are hidden subtleties. Be sure to document which extra credit you attempt (and how you verified that you did them correctly). The number of points of extra credit will depend on how well your extensions work and how much effort you put into implementing, verifying, and explaining them.
References This type of assignment is a rite of passage in NLP graduate courses. There are many versions of this assignment out there. You may find their explanations useful or more detailed than mine.
You may find the following paper useful for the extra credit: