First milestone in the Language Technology for Icelandic project

The LVL team celebrating the first milestone in the Language Technology for Icelandic project. Ólafur Helgi Jónsson, Sunneva Þorsteinsdóttir and Steinþór Steingrímsson are missing from the picture.

Last week we celebrated achieving the first milestone in the Language Technology for Icelandic project with a cake!

After a lot of hard work the past few months we achieved the first milestone in Automatic Speech Recognition (ASR), Text-to-Speech (TTS) and Machine Translation (MT).

In ASR, the focus has mostly been on data creating and gathering. 55,000 utterances have been collected (donated by adults) via the crowd-sourcing platform samromur.is (based on Common Voice) with plans to reach 100.000 utterances for the next milestone. The process is being extended to include younger voices in collaboration with schools and authorities. Today we started working with Öldutúnsskóli in Hafnarfjörður. The goal is to reach 80.000 young voice utterances for the next mileston. Additionally, data has been gathered from RÚV (audio, video and subtitles) and CreditInfo (transcriptions). Along with data gathering, the team is also developing tools to post-process Icelandic ASR text for better readability.

In TTS, we successfully created a voice recording client (LOBE) and three reading scripts in order to collect high quality speech and corresponding text data. The reading scripts were created from Risamálheild and seek to maximize diphone coverage. So far 20 hours have been collected from two speakers, male and female. The aim is to finish collecting 20 hours from each speaker early this year. From the collected data two TTS prototypes have been created in Ossian, which extends the Merlin back-end. The current prototypes are quite naive but we have integrated a grapheme-to-phoneme model for the Icelandic language into the prototypes.

In MT, we successfully created a phrase-based statistical machine translation system using the open source tool Moses. Our collaborators at Miðeind created neural machine translation systems based on BiLSTMs and Transformers. The models were trained on the newly available English-Icelandic parallel corpus, ParIce. The systems were then evaluated w.r.t. training time, throughput and BLEU score. The code and 
systems are freely available but are still under development for milestone two. In milestone two we will continue to develop the systems further and adjust them to specific needs of the Icelandic language.

Donate your voice to bring Icelandic and technology together!

Hello everyone!

We are collecting audio clips to create a new, more robust Icelandic dataset. We plan to use them for automatic speech recognition and text-to-speech applications. This could lead to many applications that you can use in your everyday life, like an Icelandic voice assistant, screenreaders, and searchable audio and video content. The dataset will soon be available directly on the Samromur website for users and developers alike to download and use to creating exciting new technology. But to do this, we need your voices, Icelandic speakers.

You can get started by going to Samromur.is  and choosing Tala. You only need to donate 5 recordings per session, but more is always welcome. If you want to do more, you can also evaluate clips made by other users with Hlusta.

Samromur

 

We are also organizing data gathering competitions within companies. So, please contact us if you and your company would be interested in holding a competition.

This is a joint effort with Deloitte and Almannaromur. If you want to learn more about this project, visit the Um Verkefnið tab on the Samromur site.

Conference – Er íslenskan góður „bisness“?

Tomorrow, 16th of October 2019, there will be a conference on Icelandic language technology. The conference will take place at Veröld – hús Vigdísar and starts at 8:00.

A number of people affiliated (past and present) with the LVL will be giving talks there such as:

  • David Erik Mollberg, Ólafur Helgi Jónsson, Viktor Sveinsson Sunneva Þorsteinssdóttir – students at HÍ and RU will launch an open speech data collection initivate for Icelandic.
  • Anna Björk Nikulásdóttir – project manager at SÍM (Samstarf um íslenska máltækni – Collaboration on Icelandic Language Technology) and CEO of Grammatek will talk about tools in language technology.
  • Hrafn Loftsson – docent at the School of Computer Science in RU will talk about automatic text summarization for Icelandic.

The conference focuses on the importance of Icelandic language technology for academy and industry and is open for anyone to attend.

For more details on the conference and speaker list, see the facebook event (Icelandic)

Tune in this weekend as Rás 1 interviews ASR expert Inga Rún and Althingi editor, Steinunn about the Icelandic Parliament’s automated transcription system.

September 2019 signals the end of the LVL automatic speech recognition, ASR, project with Althingi, the Icelandic parliament. To close, the radio station Rás 1 is airing an interview September 8 at 9:30pm (21:30). The interview is conducted by the head of the Althingi speech department, Berglind Steinsdóttir. In the interview, Berglind talks to both Inga Rún, our ASR expert, and Steinunn, an Althingi editor. They discuss both sides of the project: software development and user experiences. This broadcast will hopefully give our Icelandic readers and listeners a deeper understanding of the specifics involved in ASR. Thus, we invite you to tune in this Sunday at 21:30.

Practical information about the radio program is below:

Date: Sunday, September 8th @ 21:30 (re-airing Saturday, September 14th 20:45)
Location: Rás 1 website or the radio station
Duration: 30 minutes
Title: “Háttvirtur þingmaður tekur til máls”
Topic: Sjálfvirknivæðing. Gervigreind. Fjórða iðnbyltingin. Hvað kemur þetta ræðum þingmanna við? Tekinn hefur verið í notkun talgreinir sem skrifar upp ræður þingmanna og í þættinum er rætt við Ingu Rún Helgadóttur eðlisfræðing sem hefur tekið þátt í að þróa hann og Steinunni Haraldsdóttur íslenskufræðing sem hefur notað talgreininn.
Language of Interview: Icelandic
Interviewees: Inga Rún Helgadóttir, ASR developer
Steinunn Haraldsdóttir, icelandic specialist who uses the ASR
Interviewer: Berglind Steinsdóttir
Supervisor: Ásdís Emilsdóttir Petersen.

During the interview go to https://ruv.is/ras1 Click on Í BEINNI. Select Rás 1 and press the play button.

Acceptances Galore

This has been a great summer for LVL. We have many conference acceptances: 8 papers, 3 conferences. It will also be a busy autumn, as all the conferences are in September.

Our first is Recent Advances in Natural Language Processing, a very competitive NLP conference. This year it is in Varna, Bulgaria. Steinþór, Örvar and Hrafn’s paper, “Augmenting a BiLSTM tagger with a Morphological Lexicon and a Lexical Category Identification Step” and Hrafn’s paper “A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System” will both be presented.

Next is Interspeech in Graz, Austria. We have four papers:

  • Yu-Ren Chien – “F0 Variability Measures Based on Glottal Closure Instants”
  • Inga Helgadóttir – “The Althingi ASR System”
  • Anna Rúnarsdóttir – “Lattice re-scoring during manual editing for automatic error correction of ASR transcripts”
  • Anna Nikulásdóttir – “Bootstraping a Text Normalization System for an Inflected Language. Numbers as a Test Case”
InterSpeech_2019_Althingi
poster describing “the Althingi ASR System”

To wrap up the month, we will be attending the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) in Turku, Finland. We will be representing two papers by Svanhvít Ingólfsdóttir: “Nefnir: A high accuracy lemmatizer for Icelandic” and “Towards High Accuracy Named Entity Recognition for Icelandic.”

All these acceptances exemplify our successful push to increase language technology for Icelandic.

Note: For more details about the papers, please go to our Publications page.

2019 Rannis Grants

Rannis (The Strategic Research and Development Programme for Language Technology) has awarded Hrafn two grants this year. Congratulations! The first project, Automatic Text Summarization (ATS) for Icelandic, will be worked on by a post-doctoral researcher and an Icelandic linguist in collaboration with mbl.is, Morgunblaðið’s news website. The second one is Named Entity Recognition (NER) for Icelandic. Svanhvít Lilja Ingólfsdóttir and Ásmundur Guðjónsson, two students from the Language Technology (Máltækni) masters program will work on the NER project in collaboration with the Icelandic Stock Exchange. Welcome to LVL!

Anna Björk has also been awarded a grant, for her company, Grammatek ehf., in cooperation with the city of Akranes. Congratulations and we wish you all the best with your new endeavor!

More information regarding the ATS post-doctoral research position can be found at https://lvl.ru.is/jobs.

 

 

Can stress be measured?

For the past few years, Eydís has been looking into it. She did a cross analysis on different measurements of cognitive workload with respect to the Icelandic flight industry. Cognitive workload, for the purposes of this article is synonymous with stress. The research indicates there are clear differences in measurements when an individual is rested or stressed. While the research focused on individuals in the aviation industry, the results can be applied to all people.

So, the answer to “Can stressed be measured?” is yes, most definitively. To get a more in-depth explanation of the indicators of stress, please read Eydís’ PhD thesis here

For more on the topic, read it here in Icelandic: https://www.ru.is/haskolinn/frettir/maeldi-alagseinkenni-i-tali

This research has been done in collaboration with ISAVIA and Icelandair.