Voice-controlled Information DeliveryVCID

The goal of the project is to design a system that enables voice-driven delivery of web content, such as content from news sites, blogs or radio programs. The idea is to have an environment specifically designed for audio interaction and not tied to visual layout of a web page. The user can choose content with voice commands. The content is then presented to the user as audio, e.g. recorded speech or generated speech or a radio episode or podcast. A pure audio interface is useful for situations where hands and eyes are busy, such as when driving, cooking, running, etc., as well as for people with disabilities.

Contributor/s: Kristján Rúnarsson, Róbert Kjaran

Free and Open Speech Recognition for Icelspeechgizmoandic

The project Free and Open Speech Recognition for Icelandic is a 12-month project conducted at the Reykjavík University, funded by The Icelandic Language Technology Fund. The project is about building a free accessible environment where Icelandic speech recognition systems can be developed and customised.

Contributor/s: Anna Björk Nikulásdóttir

ASR for parliament speeches

The Icelandic Parliament Building

All parliament speeches have to be stored in a text format and kept accessible to the public.

As it is, a considerable time is spent on transcribing the speeches. By using automatic speech recognition the manual labor needed for the transcription of each speech can be reduced considerably.

In this project we are developing an automatic speech recognizer for the Icelandic parliament and the aim is to have it fully take over the transcription process in 2018.

Contributor/s: Inga Rún Helgadóttir, Judy Fong

Eyra – speech data acquisition

A screenshot from the Eyra software recording screen where the prompts are read

Eyra is a free and open source project designed to provide tools to gather speech data for languages.

Speech data acquisition is particularly important for under-resourced languages. The data gathering is the most labour-intensive part of developing speech technologies such as automatic speech recognizers and synthesizers.

Eyra aims to make this task cheaper (and better), by  providing a free platform to handle the data acquisition process. Eyra analyzes incoming data and can provide feedback on the quality to attempt to get better quality data.

Designed as a web app Eyra remains flexible, working with any compatible browser, and allowing for offline setting as well by using a laptop as a server. It is open source so you can contribute, use only parts of it, or modify it to suit your needs (e.g. if you want to use pictures instead of prompts).

An article about the software was presented at SLTU 2016. This project was funded, and some guidance offered, by Google.

Contributor/s: Matthías Pétursson, Róbert Kjaran, Simon Klüpfel

Voice patterns in air-traffic control

Air traffic controller (ATCO) in Reykjavik Airport Tower.

The aim of this project is to build a system that monitors voice patterns of air traffic controllers and assesses cognitive workload in real-time. The project is a collaboration between the CADIA,  the Icelandic air traffic control provider Isavia, Tern Systems and Icelandair.

Contributor/s: Eydís Huld Magnúsdóttir

Objective Assessment of Voice Quality

A labeled anatomical diagram of the vocal folds or cords.

Human voice can assume various qualities such as breathy voice, modal voice, and pressed voice.  The assessment of voice quality has been used clinically for the diagnosis and management of voice disorders.  However, professional rating of voice quality can be labor-intensive and subjective.

This project aims to automate this rating procedure. The primary focus is on the use of signal processing algorithms to extract suitable features from different speech related signals.

Our current research is on using an inverse filtering algorithm to obtain glottal flow estimates from speech signals, from which features can be extracted and used in the training and test procedures for a voice quality predictor.
The secondary focus is on the use of machine learning schemes to construct assistant systems for medical practice. Our current research regarding this area is the use of Gaussian mixture models and deep neural networks for the purpose of speech modality classification. 

This work was supported in part by The Icelandic Centre for Research (RANNIS) under the project Model-Based Speech Production Analysis and Voice Quality Assessment under Grant 152705-051


Contributor/s: Dr. Yu-Ren Chien, Michal Borský


Cognitive workload monitoring using voice

Recording a database of people performing the Stroop task.

The focus of this project is to discover relationship between cognitive workload, task performance, voice patterns and physiological signals.  Cognitive workload experiments are set up to obtain voice and physiological data recording.  Speech signal processing and pattern recognition algorithms are used to discover how cognitive workload affects the voice.  This project is a collaborative project between The Speech Pattern Recognition Group and The Cognitive Signals Group.

Contributor/s: Eydís Huld Magnúsdóttir