Undergraduate and Graduate Projects
The Althingi Bell
Speeches at Althingi are sometimes interrupted by the speaker’s bell indicating that the allotted time is over. The project focuses on 1) developing an algorithm that detects the bell, 2) evaluating how well that algorithm does (by calculating false positives/negatives on labelled data), 3) Re-labeling these utterances with a "bell" phoneme and see if the ASR can't learn to detect it, 4) attenuating the bell using filtering and evaluate by listening tests and ASR performance.
ASR for underperforming speaker
The specific way a person pronounces certain words and phonemes, or the choice of words and their concatenation into sentences influence the performance of speech recognition. The task is to build a custom ASR for people who experience a subpar recognition performance for one of our existing ASR systems. The reasons can include mumbling, trailing off sentences, incorrect grammar, incorrect pronunciation, or other reasons. The project will focus on training and adapting acoustic and language models based on observed speaker characteristics.
Robert Kjaran, Jon Gudnason
Foreign named entities in Icelandic texts
Foreign words in Icelandic texts do not follow the same letter-to-sound rules as Icelandic words. Names like ‘George’ or ‘New York’ therefore cannot be automatically transcribed using a model trained on Icelandic words and pronunciations. The task of this project is to manually transcribe common named entities from parliamentary speeches and/or news articles. Further, to examine how unknown foreign entities can be detected and automatically transcribed.
Anna Nikulasdottir, Jon Gudnason
Risamálheild phone distribution
Using patterns of diphones and triphones in Icelandic, compute the distribution of diphones and triphones (grapheme based) in the corpus "Risamálheild". Extract sentences with good di-/triphone coverage to use in TTS and ASR training and testing.
Anna Nikulasdottir, Jon Gudnason
Detect silence and breathy voice activity
The main objective is to distinguish between loud breathing and breathy voice events in recordings of pathological speech and audio recordings containing snoring. The assumption is that snoring is mainly a harmonic signal while heavy breathing is mostly noisy signal. The model will be utilized for snoring and voice quality assessment.
Michal Borsky, Jon Gudnason
Improved Automatic QC for Eyra/Voice
Fine tune the QC on Eyra to either work with the Mozilla Voice project or to involve some human evaluations as well after the first pass of the QC. QC can currently be started with the touch of a button. However, work needs to be done to make its results usable for accurate language modeling with minimal effort.
Judy Fong, Jon Gudnason
End-to-end speech recognition
A 800 hour data set of Alþingi speeches and transcripts is available in a format fitting for tensorflow, and multiple end-to-end algorithms exist online. Train an automatic speech recognizer (ASR) and compare the quality and speed with a more traditionally trained ASR for the Alþingi dataset.
Inga Run, Jon Gudnason
Code-switching occurs when a speaker switches between languages or dialects sometimes multiple times end often in mid-sentence. An important example of this is spoken Icelandic which is increasingly used with English words and phrases interjected where needed. Language technology for Icelandic needs to be able to handle code switching if it is to be useful in practice. The objective of the project is therefore to implement a bilingual code model using state-of-the-art methods and investigate how these approaches need to be adapted to tailor for Icelandic peculiarities.
Analysing EEG, Cardiovascular and speech for CWM
The project focuses on analysing Electroencephalography (EEG) during reading and relate the outcome to cognitive workload. Ten participants undertook cognitive workload task by reading text aloud and in silence during which an EEG recording was made. The objective is to identify patterns in the EEG signal that indicate cognitive workload related to the tasks and the reading process of the participants.
Estimating the difficulty level in ATC
The job of air traffic controllers is to direct flights safely and efficiently across a designated sector. The state of the sector can be characterized by the location, type and velocity vectors of each aircraft as well as extrinsic variables such as weather. The state can be considered to be easy if it has for example few aircrafts flying in parallel directions or difficult if it has many aircrafts with many paths crossing. The objective of the project is to map the characterized sector onto a difficulty level using machine learning. The results can be used either in operation to control workload or in simulation to achieve a set of pedagogical goals.
Rescoring ASR output using a word-tag language model
Icelandic words can take different forms depending on gender, number and case. Automatic speech recognizers (ASR) often get the word correct but, not the word ending. However, grammatical rules can help determine the correct form of a word in a sentence. It is interesting to research whether including grammatical knowledge, e.g. morphological features, will improve the correctness of an ASR.
Anna Nikulasdottir, Inga Run, Jon Gudnason
Language model adaptation by topic
Language models are statistical models based on word frequency in text. In the case of the Althingi speech recognizer, the language model is trained on a big text corpus of speech transcripts. In Althingi there is, however, often a schedule available over the upcoming discussions. Develop a language model adaptation procedure and check if the speech recognition transcripts improve if we adapt the language model to the relevant discussion topic. Report also on the time the language model adaptation takes.
Anna Nikulasdottir, Inga Run, Jon Gudnason
Machine translation with deep learning
The purpose of this project is to learn the concept of deep learning in the context of machine translation. The student uses some existing deep learning toolkit (e.g. Tensorflow) and some parallel corpora for training and testing.
Full parsing with machine learning
Greynir (https://greynir.is/) has been developed using a context-free grammar, containing thousands of rules. In this project, Greynir is used to construct a training corpus which is then used by a parser to learn from. The underlying machine learning technique could be based on statistics or deep learning.
Error detection in corpus
Risamálheild (http://malfong.is/?pg=rmh) contains about 1.3 billion tokens of texts stored in a standard format. Each token is tagged with a word class and morphological features. In this project, the student develops methods for detecting (and possibly correcting) tagging errors in the corpus.
The Gold (http://malfong.is/?pg=gull) is a tagged corpus containing about 1 million tokens. The words in the texts have been tagged with automatic methods and hand-corrected afterwards. In this project, several taggers are trained and tested on the Gold and accuracies compared.
Airline ticket ordering using voice commands in Icelandic
The expected outcome of this project is an extension to Icelandair’s booking system. The user would activate voice search on www.icelandair.is and utter a command such as : "Reykjavík til Boston 16. júní 2018. Til baka 22. júní 2018." The utterance is converted to text using Icelandic ASR and analysed using natural language processing. Successful analysis results in boxes on the website being filled out correctly. This project is funded by Icelandair.