WMT16 Metric Task Subtasks

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z	AA	AB	AD
1	WMT16 Metric Task Tracks				Colors:	Blue (and green) cells distributed in per-langpair packages, because they include hybrid systems and are very big.
2	Some more details here					Yellow (and green) cells distributed as one package for seg-level participants only.
3						So if you download blues, you'll have everything.
4						Language Pairs; the cell specifies the domains included (WMT news task, WMT IT task, HUME medical task)																		Task long description
5	Track short name	New?	Texts	Systems	Hybrids?	cs2en	de2en	ro2en	fi2en	ru2en	tr2en	en2cs	en2de	en2ro	en2fi	en2ru	en2tr	en2bg	en2es	en2eu	en2nl	en2pl	en2pt	Training Data (Optional)	Input	Output	Golden data	Evaluation	New in 2016
6	RRsysNews	modified	newstest2016	news task systems + tuning task systems (en<->cs)	yes	T3+F3	T3+F1	T3+F1	T3+F1	T3+F2	T3+F2	T4+F4	T4+F5	T4+F6	T4+F6	T4+F2	T4+F6							past years of metrics tasks	system outputs + references of the whole test set	your metric score for the test set	TrueSkill interpretation of RR judgements	Pearson correlation of your metric score against TrueSkill score for just the real primary submissions (not across the 10k additional synthetic systems)	you will get ~10000 MT systems to score, not just the ~20 per language pair; these systems will be generated automatically by randomly taking translation candidates (along with the corresponding manual judgements) of each sentence
7	RRsysIT	new	it-test2016	IT task systems	yes							T6	T6					T6	T6+F7	T6	T6+F7		T6+F7	as above	as above	as above	as above	as above	standard track but on a brand new domain; and 10k syntetic systems for confidence estimation
8	DAsysNews	new	newstest2016	as RRsysNews	yes	T3+F3+T5	T3+F1+T5	T3+F1+T5	T3+F1+T5	T3+F2+T5	T3+F2+T5	no	no	no	no	T4+F2+T5	no							None: since the language pairs Yvette has (es-en,en-es) are not in translation task.	as above	your metric score for the test set	WMT16 DA judgements	Pearson correlation of your metric score against DA sys-level score for just the real primary submissions (not across the 10k additional synthetic systems)	you will get ~10000 MT systems to score, not just the ~20 per language pair; generated as above
9	RRsegNews		newstest2016	as RRsysNews	no	T7	T7	T7	T7	T7	T7	T8	T8	T8	T8	T8	T8							past years of metrics tasks	system outputs + reference for each sentence	your metric score for the sentence	The set of simulated pairwise judgements as collected in RR judgements	The WMT14 variation inspired by Kendall's tau of your metric score against manual judgements.	no change
10	DAsegNews	new	newstest2016	2016 news task systems (excluding tuning task systems)	no	T7+F8	T7+F8	T7+F8	T7+F8	T7+F8	T7+F8					T8+F9								new devsets prepared by Yvette (random sample of wmt'15 cs-en, de-en, fi-en and ru-en 500 translations per language pair)	as above (500 translations per language pair)	as above	DA judgements for candidate translations (no relative comparison, only absolute judgements); candidates will be sampled to have a reliable average absolute score for each candidate in the set (scores are standardized per human assessor)	Pearson correlation of your metric score against the DA seg-level score across all annotated sentences (500 per language pair) of all primary submissions	brand new
11	HUMEseg	new	himltest	HimL year 1 systems	no							T9+F10	T9+F10	T9+F10								T9+F10		none	as above	as above	HUME manual annotation, collapsed automatically into one score per segment	Pearson correlation of your metric score against HUME aggregate seg-level score for all annotated sentences of all systems	brand new
12
13
14	Each 'yes' indicates that we will provide some systems producing some translations of the given test sets (row) and language pair (column).
15
16	The expected output of participants is clear, described on metrics web page.
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100