Reading-bound data as inline secondary tags
Reading-bound data is best transported as inline secondary tags, proven both by practical experience and theoretical complexity.
The Apertium Mission
https://wiki.apertium.org/wiki/Bylaws includes non-machine-translation goals:
What do Apertium devs want?
Of those that replied to the apertium-stuff threads (1, 2, 3), the majority expressed a preference for
"...we're stating that we're OK having crappy monodixes because we *fix* that later on with trimming. I'm sure that's where we are now, but as a project that focuses a lot on provided free (as in speech) language resources that are later used for many other use cases, I don't feel comfortable with that status. I think we should aim to have as correct as possible dictionaries." [#]
Why?
Uses
Note that this is not about LU-bound data - the LU implementation is settled.
Potential uses, non-exhaustive list:
High-level examples
In the stream
^There/there<adv><id:1><ip:2>$
^is/be<vbser><pres><p3><sg><vf:exist><id:2><ip:0>$
^world/world<n><sg><s:Lstar><r:th><id:4><ip:2>$
^dogs/dog<n><pl><s:Adom><r:atr><id:7><ip:5>$
^and/and<cnjcoo><id:8><ip:7>$
^people/people<n><sg><s:H><r:atr><id:9><ip:7>$
...easy to grep for, easy to write scripts for, easy to make filters for.
Practical experience
In VISL, GramTrans, and the Greenlandic pipes, we have free-form inline secondary tags. Works great.
VISL's style is fairly ugly and inscrutable - Apertium's doesn't need to be.
Alternatives