We now have a Facebook page for our spoken Icelandic collection efforts, Samrómur. We’re working on enabling school children to also participate. Follow Samrómur to be told when that platform goes live: https://www.facebook.com/samromur/
We are collecting audio clips to create a new, more robust Icelandic dataset. We plan to use them for automatic speech recognition and text-to-speech applications. This could lead to many applications that you can use in your everyday life, like an Icelandic voice assistant, screenreaders, and searchable audio and video content. The dataset will soon be available directly on the Samromur website for users and developers alike to download and use to creating exciting new technology. But to do this, we need your voices, Icelandic speakers.
You can get started by going to Samromur.is and choosing Tala. You only need to donate 5 recordings per session, but more is always welcome. If you want to do more, you can also evaluate clips made by other users with Hlusta.
We are also organizing data gathering competitions within companies. So, please contact us if you and your company would be interested in holding a competition.
This is a joint effort with Deloitte and Almannaromur. If you want to learn more about this project, visit the Um Verkefnið tab on the Samromur site.
September 2019 signals the end of the LVL automatic speech recognition, ASR, project with Althingi, the Icelandic parliament. To close, the radio station Rás 1 is airing an interview September 8 at 9:30pm (21:30). The interview is conducted by the head of the Althingi speech department, Berglind Steinsdóttir. In the interview, Berglind talks to both Inga Rún, our ASR expert, and Steinunn, an Althingi editor. They discuss both sides of the project: software development and user experiences. This broadcast will hopefully give our Icelandic readers and listeners a deeper understanding of the specifics involved in ASR. Thus, we invite you to tune in this Sunday at 21:30.
Practical information about the radio program is below:
Date: Sunday, September 8th @ 21:30 (re-airing Saturday, September 14th 20:45)
Location: Rás 1 website or the radio station
Duration: 30 minutes
Title: “Háttvirtur þingmaður tekur til máls”
Topic: Sjálfvirknivæðing. Gervigreind. Fjórða iðnbyltingin. Hvað kemur þetta ræðum þingmanna við? Tekinn hefur verið í notkun talgreinir sem skrifar upp ræður þingmanna og í þættinum er rætt við Ingu Rún Helgadóttur eðlisfræðing sem hefur tekið þátt í að þróa hann og Steinunni Haraldsdóttur íslenskufræðing sem hefur notað talgreininn.
Language of Interview: Icelandic
Interviewees: Inga Rún Helgadóttir, ASR developer
Steinunn Haraldsdóttir, icelandic specialist who uses the ASR
Interviewer: Berglind Steinsdóttir
Supervisor: Ásdís Emilsdóttir Petersen.
During the interview go to https://ruv.is/ras1 Click on Í BEINNI. Select Rás 1 and press the play button.
This has been a great summer for LVL. We have many conference acceptances: 8 papers, 3 conferences. It will also be a busy autumn, as all the conferences are in September.
Our first is Recent Advances in Natural Language Processing, a very competitive NLP conference. This year it is in Varna, Bulgaria. Steinþór, Örvar and Hrafn’s paper, “Augmenting a BiLSTM tagger with a Morphological Lexicon and a Lexical Category Identification Step” and Hrafn’s paper “A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System” will both be presented.
Next is Interspeech in Graz, Austria. We have four papers:
- Yu-Ren Chien – “F0 Variability Measures Based on Glottal Closure Instants”
- Inga Helgadóttir – “The Althingi ASR System”
- Anna Rúnarsdóttir – “Lattice re-scoring during manual editing for automatic error correction of ASR transcripts”
- Anna Nikulásdóttir – “Bootstraping a Text Normalization System for an Inflected Language. Numbers as a Test Case”
To wrap up the month, we will be attending the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) in Turku, Finland. We will be representing two papers by Svanhvít Ingólfsdóttir: “Nefnir: A high accuracy lemmatizer for Icelandic” and “Towards High Accuracy Named Entity Recognition for Icelandic.”
All these acceptances exemplify our successful push to increase language technology for Icelandic.
Note: For more details about the papers, please go to our Publications page.
Rannis (The Strategic Research and Development Programme for Language Technology) has awarded Hrafn two grants this year. Congratulations! The first project, Automatic Text Summarization (ATS) for Icelandic, will be worked on by a post-doctoral researcher and an Icelandic linguist in collaboration with mbl.is, Morgunblaðið’s news website. The second one is Named Entity Recognition (NER) for Icelandic. Svanhvít Lilja Ingólfsdóttir and Ásmundur Guðjónsson, two students from the Language Technology (Máltækni) masters program will work on the NER project in collaboration with the Icelandic Stock Exchange. Welcome to LVL!
Anna Björk has also been awarded a grant, for her company, Grammatek ehf., in cooperation with the city of Akranes. Congratulations and we wish you all the best with your new endeavor!
More information regarding the ATS post-doctoral research position can be found at https://lvl.ru.is/jobs.
For the past few years, Eydís has been looking into it. She did a cross analysis on different measurements of cognitive workload with respect to the Icelandic flight industry. Cognitive workload, for the purposes of this article is synonymous with stress. The research indicates there are clear differences in measurements when an individual is rested or stressed. While the research focused on individuals in the aviation industry, the results can be applied to all people.
So, the answer to “Can stressed be measured?” is yes, most definitively. To get a more in-depth explanation of the indicators of stress, please read Eydís’ PhD thesis here
For more on the topic, read it here in Icelandic: https://www.ru.is/haskolinn/frettir/maeldi-alagseinkenni-i-tali
This research has been done in collaboration with ISAVIA and Icelandair.
This year we’re branching out and attending a smaller conference, IEEE SLT 2018!
We will be presenting the results from our paper, “An Icelandic Pronunciation Dictionary for TTS.” The work was done in collaboration with the linguist and Icelandic specialist, Eiríkur Rögnvaldsson from the University of Iceland.
The paper describes an Icelandic pronunciation dictionary for use in a text-to- speech system for Icelandic. Procedures were implemented to create a consistent training set for grapheme-to-phoneme (g2p) conversion modeling, needed for automatic extensions of the dictionary. The experiments show a clear benefit of using clean data for training, both in terms of PER and in terms of categories of errors made by the g2p algorithm. The results of the dictionary processing were also used to create an initial version of an open source database for Icelandic speech applications. The scripts used in the experiments are available via our Github repository: https://github.com/cadia-lvl/SLT2018.
We hope to see you there. If you see Anna or Jón please stop by and say hello.