1 of 12

PARSEME MWE Surveys

2 of 12

First survey (Federico & Gyri)

LREC paper

Survey designed using Google Forms

“Free text”

Several repositories crawled (mwe, mwu, multi word, multiword, multi-word): META-SHARE, ELRA, SIGLEX-MWE

3 of 12

First survey (Federico & Gyri)

First section

  • Name
  • Link to the LR
  • Type of the LR
  • Contact information
  • Language(s)
  • Size of the LR
  • Maximum length of MWEs
  • Whether non-contiguous expressions are present
  • License and accessibility policies

4 of 12

First survey (Federico & Gyri)

Second section:

  • relevant publications
  • special MWE features
  • grammatical or lexical formalism (if any)

5 of 12

First survey (Federico & Gyri)

  • Over 100 LRs gathered by means of the survey
  • Results publicly available
  • Non-uniform and non-comparable data
  • Metadata issues: free text vs. closed vocabularies
  • Licensing issues: not all LRs have a license!
  • Cataloging issues: need to add information about MWEs in repositories
  • (e.g. CLARIN, LT-Observe)

6 of 12

First survey (Federico & Gyri)

7 of 12

First survey (Federico & Gyri)

8 of 12

First survey (Federico & Gyri)

9 of 12

First survey (Federico & Gyri)

10 of 12

Second survey:

multilingual MWE LRs

http://goo.gl/forms/X12Yi9Zid8I1tDzH2

11 of 12

Second survey:

multilingual MWE LRs

https://awesome-table.com/-KMxGtOyp8q3fqjwlR3w/view

  • 67 resources but 24 are Thamus dictionaries and 13 are INCYTA dictionaries
  • 13 new resources not included in the previous survey

12 of 12

Next steps

  • Converting the “old” survey into a standardized database and merge both surveys
  • Publish all resources gathered in the PARSEME website/MWE Siglex?
  • Lobby for adding a MWE metadata component in infrastructure initatives?
  • Lobby to add information about MWEs in repositories?