Call for Volunteers: Please fill out the form below to express your interest in the following topics:
Background:
O'Brien, et al. (2019) "Directions for the future of technology in pronunciation research and teaching," Journal of Second Language Pronunciation (4)2:182-207:
jbe-platform.com/content/journals/10.1075/jslp.17001.obrExcerpts: p. 186: "pronunciation researchers are primarily interested in improving L2 learners’ intelligibility and comprehensibility, but they have not yet collected sufficient amounts of representative and reliable data (speech recordings with corresponding annotations and judgments) indicating which errors affect these speech dimensions and which do not." P. 192: "Collecting data through crowdsourcing...."
Contextual situation:
Chinese CELST and its vocabulary: "Chinese National College English Test" e.g. 全国大学英语四六级考试, being automated in 2020.
Example efforts:
Robertson et al. (2018) "Designing Pronunciation Learning Tools: the Case for Interactivity against Over-Engineering"
researchgate.net/publication/324664555 -- video:
youtube.com/watch?v=RPSPXCLWKK0 Guo et al. (2017) "Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art"
arxiv.org/abs/1709.01713Wiktionary "Say in phrase" button:
youtube.com/watch?v=8Euhu4Q7HF4&t=38mDemo:
youtube.com/watch?v=Bof5sJWZ100&t=103sTechnical update:
docdroid.net/gvbP0Jc/paslides.pdfExample schemata:
www.talknicer.com/dataset-schema.pyHair et al. (2018) "Apraxia world: a speech therapy game for children with speech sound disorders," in Proceedings of the 17th ACM Conference on Interaction Design and Children (pp. 119-131)
psi.engr.tamu.edu/wp-content/uploads/2018/04/hair2018idc.pdfYilmaz et al. (2018) "Articulatory Features for ASR of Pathological Speech," Proc. Interspeech 2018:2958-2962
arxiv.org/pdf/1807.10948.pdf See also:
arxiv.org/abs/1905.06533Franco et al. (2014) "Adaptive and Discriminative Modeling for Improved Mispronunciation Detection," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
sri.com/sites/default/files/publications/franco_ferrer_bratt_is_adept_camera_ready.pdfKibishi et al. (2015) "A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers," ReCALL, 27(1), 58-83
www.slp.cs.tut.ac.jp/old_2013/Material_for_Our_Studies/Papers/shiryou_last/e2014-Paper-01.pdfLi et al. (2018) "A study of Assessment model of Oral English Imitation Reading in College Entrance Examination" 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)
docdroid.net/GGf4a8W/china-celst.pdfBernstein et al. (1989) "Automatic evaluation and training in English pronunciation" First International Conference on Spoken Language Processing (Kobe, Japan)
researchgate.net/publication/221485930 -- please see section 2.2.2, "Intelligibility."
Example goal:
Enhance Mozilla Common Voice to accept transcriptions of marginal speech instead of just boolean proofreading of ideally exemplary speech.
voice.mozilla.org/enAlternative goal:
Avoid an approximately 80% false negative rate from holding accent, intonation, or stress above intelligibility. The Key Performance Indicators for this Work will include:
Validity. Baseline: Ratio of accuracy variance to instructor inter-rater agreement variance. Relative: “System-Human agreement” of 58.4% of ETS (Chen et al., 2018, p. 24.)
docdroid.net/31XXwZI/chen-et-al-2018-ets.pdf#page=26Components:
The number of speaking exercises assigned and performed;
The number of listening-typing exercises assigned and performed;
The number of minutes each type of exercise has been performed;
Intelligibility-assessable words and phrases, for each language;
Number of branching scenario interactions, for each language;
Spoken recordings in total, and per assessable words and phrases;
Transcripts collected in total, and per spoken recordings;
Exemplar pronunciation recordings identified per assessable prompts;
Spoken remediation responses provided; and
Observed intelligibility difference each student has achieved on their assigned groups of words and phrases.
Ease of use: Sub-components include: (1) the median duration required to complete assignments achieving a specific level of validity, (2) the proportion of assignments completed, and (3) the median numbers of (a) clicks, (b) keystrokes, and (c) utterances required to complete assignments.
Example learner analytics: diphthong-free diphones (~650 out of a few times that), instead of words (hundreds of thousands), prompts, or phonemes (dozens.) CMUBET:
cmusphinx.github.io/wiki/cmubet Diphones:
cmusphinx.github.io/wiki/diphones Evaluation questions:
User Experience: How easy is it to use the system? How accessible is it to users with disabilities? On which mobile and desktop devices can the system be used? Can it be used by anyone? Is the system suitable for self study with spoken remediation feedback? Is it suitable for marketing efforts?
Engagement: Are students engaged in assigned system activities? Does the experience add value to the student experience? Does it add value to Customer’s user experience?
Operationally: Does the system solve the problem of assessing students quickly and without excessive demands on instructor time and effort? Does it provide effective remediation interactions? Are its sequencing choices optimized for efficient learning?
Quality: How closely can the system perform oral fluency skill placement to that of a human instructor? How accurate is it, relative to inter-rater instructor variability? How does it compare to human instructors in general? How much data collection is required for the system to be as accurate as human instructor assessment and remediation?
Alternative goal:
Send a request to accomplish either of, both of, or some combination of other goals, to Aaron Halfaker and the Wikimedia Foundation's AI mailing list:
lists.wikimedia.org/mailman/listinfo/aiPatent licensing statement: all participants are asked to agree to provide reasonable and customary licensing terms for patent rights, if any, as a firm condition of membership to which all members must adhere in perpetuity for their membership to be valid. Corporate sponsors will have the opportunity to pledge mutual cross-licensing in return for support of development efforts. Moral rights, including exclusion of anti-competitive, exclusionary, criminal, or defective individual, corporate, or state actors, are reserved.
This is the Spoken Language Interest Group Call for Volunteers:
bit.ly/sligcfv currently also
j.mp/sligQuestions? Email Jim Salsman
jim@talknicer.com