Current Projects

Language Technology for Icelandic 2018-2022

Language Technology for Icelandic 2018-2022

This project is a part of a collaboration effort funded by the Icelandic government to make Icelandic available for use in today’s technological environment. Automatic Speech Recognition, Text-to-Speech and Machine Translation are three of the five core projects defined in the report, Language Technology for Icelandic 2018-2022 (Máltækni fyrir íslensku 2018-2022). Within the collaboration, called Samstarf um íslenska máltækni (SÍM), are nine companies and organizations specialized in linguistics and Natural-Language Processing. Entities within SÍM are Reykjavik University, University of Iceland, Árni Magnússon Institute for Icelandic studies, Blindrafélagið (BIAVI), Ríkisútvarpið (RÚV- National radio), Creditinfo (Media monitoring), Tiro ehf., Grammatek ehf. and Miðeind ehf.
Work on the project started formally the 1st of October 2019 with a 5 year total duration accepted by the Icelandic parliament.

Funding: Language Technology for Icelandic 2018-2022 (Máltækni fyrir íslensku 2018-2022)

Projects:
“Machine Translation (MT) of Icelandic and English”
“Automatic speech recognition (ASR) for Icelandic”
“Text-to-Speech (TTS) for Icelandic speech synthesis”

Timeline: October 2019 – October 2024.

Paper: Language Technology Programme


Automatic Text Summarization for Icelandic

ATS –
extracting sentences for a summary

Automatic Text Summarization (ATS) is the task of creating a concise and fluent summary from a given source text. The summary preserves the main content of the original source and the overall meaning. Due to the increasing number of articles being published on-line every day, there is a growing need for robust ATS systems. The aim of this project is to develop the first ATS systems for Icelandic based on machine learning methods, as well as to create the first corpus of human-generated summaries of Icelandic news articles. Three different systems will be developed, and the best performing system selected for deployment and testing at an Icelandic news site, mbl.is.

Funding: Strategic Research and Development Programme for Language Technology (Markáætlun í tungu og tækni)

Contributors: Jón Friðrik Daðason, Hrafn Loftsson, Salome Lilja Sigurðardóttir, Þorsteinn Björnsson


Text-to-Speech (TTS) for Icelandic speech synthesis

Speech data

The Language and Voice Lab is responsible for developing text to speech synthesis for Icelandic in such a manner that it will be possible to produce multiple different voices. LVL will create an environment, and language resources, that will be released to enable players in the market to quickly and simply build synthetic voices for end users. LVL will ensure that the speech synthesis solutions developed can be integrated into software, where e.g. automatic recital or voice answering is needed.

Funding: Language Technology for Icelandic 2018-2022 (Máltækni fyrir íslensku 2018-2022)

Contributors: Atli Thor Sigurgeirsson, Þorsteinn Daði Gunnarsson

Timeline: First iteration: October 2019 – October 2020.

Research paper: Manual Speech Synthesis Data Acquisition

Code: LOBE


Automatic speech recognition (ASR) for Icelandic

The Language and Voice Lab is responsible for developing automatic speech recognition for Icelandic within the Language Technology for Icelandic project. The aim of developing ASR within the project is to enable people who design and develop voice-based user interfaces to add Icelandic easily. An open environment will be established for the development of speech recognition systems, and recipes for common usage will be made open and accessible.

Funding: Language Technology for Icelandic 2018-2022 (Máltækni fyrir íslensku 2018-2022)

Contributors: Helga Svala Sigurðardóttir, Jón Guðnason, Judy Fong, Þorsteinn Daði Gunnarsson, Michal Borský, Ragnheiður Kr. Þórhallsdóttir, Carlos Mena, Caitlin Richter, Ragnar Pálsson, Helga Svala,

Timeline: October 2019 – October 2022.

News, Voice Donation Platform, Code: Gáfu 1.500 raddsýni fyrir hádegi (Icelandic), Samromur.is, Broad Data Prep repository, Samromur paper, Punctuation models, Speaker Diarization recipes

Video:


Machine Translation (MT) of Icelandic and English

The goal of machine translation is to translate text or speech between two or more natural languages. In this project the goal is to implement a baseline statistical machine translation system between Icelandic and English and vice versa. The project is a part of the core machine translation project (V3) within the Icelandic National Language Technology Programme, defined in the Language Technology for Icelandic 2018-2022 project plan. We leverage the newly released ParIce corpus, a parallel corpus of 3.5M Icelandic and English translation segments.

Funding: Language Technology for Icelandic 2018-2022 (Máltækni fyrir íslensku 2018-2022)

Contributors: Steinþór Steingrímsson, Haukur Páll Jónsson, Hrafn Loftsson. Luke O’brien

Timeline: October 2019 – Summer 2020.

Code: Moses SMT, ParIce


Microservices at your service: bridging the gap between NLP research and industry

This project aims to increase inclusiveness and accessibility for the EU languages by making natural language processing (NLP) tools freely and openly available on the European Language Grid (ELG) platform. The project will make the NLP tools more accessible to a larger audience of software developers through:

  • identifying relevant and interesting NLP tools. The tools will be identified via a bottom-up search on the software platforms, as well as by contacting the research institutions;
  • conducting a survey and collecting standard or available test data sets for NLP tasks;
  • testing the set of collected tools on the existing test data and selecting them based on the metrics performance and language coverage;
  • dockerising the tools and expose an industry standard API to the service;
  • sharing the docker images via the ELG platform.

The project targets the following languages: Finnish, Swedish, Norwegian, Spanish, Portuguese, Icelandic, Faroese, Lithuanian, Latvian and Estonian.

Funding:

Contributors: Bjarni Bjarkason, Jökull Snær Gylfason, the University of Tartu (Estonia) and Gradient (Spain).

Timeline: Mar. 2021 – Feb. 2023

Code: https://github.com/cadia-lvl/Icelandic-NER-API (any of the API repos there)


National Language Technology Platform (NLTP)

In this project, the most advanced language technology (LT) tools and solutions will be united in a novel, artificial intelligence driven National Language Technology Platform (NLTP). By tightly integrating mature, state-of-the-art LT technologies and services developed in CEF AT and other European and national programmes, the NLTP will provide public administrations, SMEs and general public with an efficient way to ensure multilingual access to online services, websites, documents and information removing the language barriers, increasing accessibility and fostering cross-border services.

The translation and speech processing services available in the platform will give public administration entities, their employees, SMEs and the public convenient and secure access to high quality tools with which to translate and make accessible a wide array of content, including confidential documents, across all the languages of the Digital Single Market and finally enable the vision of language parity and the full multilingualism enshrined in the European Charter of Fundamental Rights in an efficient, cost effective, and equitable manner.

Funding:

This project is in collaboration with Culture Information Systems Centre (Latvia), Malta Information Technology Agency, Office of the State Advocate (Malta), University of Malta, University of Tartu (Estonia), Central State Office for the Development of Digital Society (Croatia), and University of Zagreb (Croatia).

Timeline: April 2021 – March 2023


Spoken Dialogue Framework for Icelandic

The spoken dialogue framework enables users to communicate with computers and other devices with their voice in Icelandic. The goal of this project is to develop and provide an open development framework for Icelandic spoken dialogue. The framework will feature automatic speech recognition (ASR), language understanding questions, text-to-speech synthesis (TTS), as well as several other language modules. Several of these modules are already in development as part of the five year Language Technology Programme for Icelandic while others will be new developments or areas for end users. This project will be developed and tested in collaboration with industry partners (Grammatek ehf and Tiro ehf) as well as the open sector.

Funding: Rannis

Contributors: Caitlin Richter, Ragnar Pálsson, Tiro, Grammatek


Using Machine Learning Models for Clinical Diagnoses

The goal is to examine the feasibility of using automatic models for clinical analyses. The project consists of two sub-goals. The first sub-goal is to develop a model based on deep neural networks which will use data from the icelandic healthcare system. The second sub-goal is to develop a prediction model for clinical diagnoses. The dataset will come from the capital region’s healthcare clinics. A portion of the dataset will be handmarked by clinical experts. This project will be developed jointly by LVL and Heilsugæsla, the health clinics.

Contributors: Hlynur Davíð Hlynsson, Hrafn Loftsson


Computer-Assisted Pronunciation Training in Icelandic

Language technology can be used to make teaching easier and more fun. It is important for small languages like Icelandic to get more users and an important step in getting more users is language learning and teaching. Computer-assisted pronunciation training (CAPT) makes it easier to teach more students simultaneously and automatically. This training will be integrated with the Icelandic Online system used in the Icelandic as a second language program at the University of Iceland.

Funding: Rannis

Contributors: Caitlin Richters, Ragnar Pálsson, Þorsteinn Daði Gunnarsson, Tiro ehf, the Arni Magnusson Institute, and the University of Iceland.

%d bloggers like this: