First milestone in the Language Technology for Icelandic project

The LVL team celebrating the first milestone in the Language Technology for Icelandic project. Ólafur Helgi Jónsson, Sunneva Þorsteinsdóttir and Steinþór Steingrímsson are missing from the picture.

Last week we celebrated achieving the first milestone in the Language Technology for Icelandic project with a cake!

After a lot of hard work the past few months we achieved the first milestone in Automatic Speech Recognition (ASR), Text-to-Speech (TTS) and Machine Translation (MT).

In ASR, the focus has mostly been on data creating and gathering. 55,000 utterances have been collected (donated by adults) via the crowd-sourcing platform samromur.is (based on Common Voice) with plans to reach 100.000 utterances for the next milestone. The process is being extended to include younger voices in collaboration with schools and authorities. Today we started working with Öldutúnsskóli in Hafnarfjörður. The goal is to reach 80.000 young voice utterances for the next mileston. Additionally, data has been gathered from RÚV (audio, video and subtitles) and CreditInfo (transcriptions). Along with data gathering, the team is also developing tools to post-process Icelandic ASR text for better readability.

In TTS, we successfully created a voice recording client (LOBE) and three reading scripts in order to collect high quality speech and corresponding text data. The reading scripts were created from Risamálheild and seek to maximize diphone coverage. So far 20 hours have been collected from two speakers, male and female. The aim is to finish collecting 20 hours from each speaker early this year. From the collected data two TTS prototypes have been created in Ossian, which extends the Merlin back-end. The current prototypes are quite naive but we have integrated a grapheme-to-phoneme model for the Icelandic language into the prototypes.

In MT, we successfully created a phrase-based statistical machine translation system using the open source tool Moses. Our collaborators at Miðeind created neural machine translation systems based on BiLSTMs and Transformers. The models were trained on the newly available English-Icelandic parallel corpus, ParIce. The systems were then evaluated w.r.t. training time, throughput and BLEU score. The code and 
systems are freely available but are still under development for milestone two. In milestone two we will continue to develop the systems further and adjust them to specific needs of the Icelandic language.

Language Technology Seminar this Saturday

The cooperation between LVL and other leading icelandic organizations is increasing. Tomorrow Reykjavik University and  Societas Scientiarum Islandica (Vísindafélag Íslendinga) are holding a seminar and panel discussion on the current progress and the future of implementing language technologies for Icelandic.

It will be held at Reykjavik University room M105. Hrafn Loftsson, of LVL, will be moderating the seminar starting at 13:30. It will consist of talks from a professor at University of Iceland, the chairman of Almannaromur, Jón Guðnason of LVL, and the director of Miðeindar ehf. Afterwards is the panel discussion.

We welcome everyone to attend the lively Saturday afternoon discussion!

Researchers’ Night

This Friday is Researchers’ Night (Vísindavaka Rannís 2018). It is an all ages event on the 28th of September, 2018 from 16:30 – 22:00 at Laugardalshöllin, Reykjavik.

We will be there with Reykjavik University demonstrating the possibilities of speech with tech: evaluating collected speech data (Eyra), testing the accuracy of an automatic speech recognizer(ASR) – https://tal.ru.is, listening to a text-to-speech synthesizer, and telling your phone to read the news to you. Come try out the state-of-the-art in Icelandic speech technology, and tell us what you think!

researcher
Researcher by Nick Youngson CC BY-SA 3.0 ImageCreator10

Student Projects Available

For the students of Reykjavik University or summer exchange students, we now have a list of student projects available. They are on  https://lvl.ru.is/student-projects/ or available from the Menu of the LVL website as Student Projects. They range from straight forward to difficult and are suitable for undergraduate final projects, Masters students, and PhD students. If you want to work on a one, please contact the people listed in the contact column, and they can give you more details to get you started. We look forward to hearing from you!

 

Using language technology to assist the hard of hearing

The Nordic association of the hard of hearing (Nordiska Hörselskadades Samarbetskommitté, NHS) had a seminar at Hotel Selfoss last week. On Friday, Anna gave a talk there on how language technology might assist people hard of hearing to communicate and access information in a predominately hearing world. Automatic transcription of live communication and automatic caption of video material is already working for English and some other languages, and the Nordic participants of the seminar were eager to see this technology advance in their languages. At LVL, we are working on open ASR systems, making the development of technology like this possible for Icelandic.

The rest of the slides can be viewed by selecting the first slide below.

NHS_cover_icon

Meeting with Mycroft

This week LVL sat down with Mycroft to discuss the possibilities of collaborating and working together to bring more speech technology to Iceland.  We discussed using Mozilla’s Common Voice to bring about another open source Icelandic speech dataset, and possibly an Icelandic voice assistant. The Mozilla project requires just 5,000 phrases which anyone can contribute, even you!

mycroft_at_RU
LVL meets with Joshua Montgomerey of Mycroft

RANNIS Infrastructure grant 2018 – Awarded.

The complexity of neural network models increases every year and it takes a lot to keep up with computational hardware fast enough to train them efficiently.  Earlier this year, we applied to the RANNIS Infrastructure Fund for funding to expand our current HPC cluster.  We are happy that our proposal “Deep Learning Infrastructure for Speech and Language Technology” was selected to be funded. Only three grants were granted to Reykjavik University, and ours was one of them. This money will allow us to buy a fully equipped SuperMicro 4028GR-TR2 server with NVIDIA 1080Ti GPUs. We hope to sign the grant contract within several weeks and then order the machine. Next comes the process of assembling and integrating it into our current cluster. I can say that several group members can’t wait to have more power available to them.

 

02I72646
List of awarded grants – RANNIS