A large milestone in Named Entity Recognition for Icelandic!

Progress in NER celebrated with a suitable cake.

Last Tuesday Svanhvít and Ásmundur completed the first stage in Named Entity Recognition project for Icelandic. They finished the daunting task of labeling all named entities in a text corpus of 1 million tokens (MIM-GOLD), into the following categories: Person, Location, Organization, Micellaneous, Money, Percent, Time, and Date.

To assist with the task, they first preprocessed the corpus using regular expressions to catch some cases and then verified and completed the labeling using the brat rapid annotation tool. Their next task will be to create a few baseline NER tagging systems using the labelled dataset.

The dataset will be publicly available this spring.

Acceptances Galore

This has been a great summer for LVL. We have many conference acceptances: 8 papers, 3 conferences. It will also be a busy autumn, as all the conferences are in September.

Our first is Recent Advances in Natural Language Processing, a very competitive NLP conference. This year it is in Varna, Bulgaria. Steinþór, Örvar and Hrafn’s paper, “Augmenting a BiLSTM tagger with a Morphological Lexicon and a Lexical Category Identification Step” and Hrafn’s paper “A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System” will both be presented.

Next is Interspeech in Graz, Austria. We have four papers:

  • Yu-Ren Chien – “F0 Variability Measures Based on Glottal Closure Instants”
  • Inga Helgadóttir – “The Althingi ASR System”
  • Anna Rúnarsdóttir – “Lattice re-scoring during manual editing for automatic error correction of ASR transcripts”
  • Anna Nikulásdóttir – “Bootstraping a Text Normalization System for an Inflected Language. Numbers as a Test Case”
InterSpeech_2019_Althingi
poster describing “the Althingi ASR System”

To wrap up the month, we will be attending the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa) in Turku, Finland. We will be representing two papers by Svanhvít Ingólfsdóttir: “Nefnir: A high accuracy lemmatizer for Icelandic” and “Towards High Accuracy Named Entity Recognition for Icelandic.”

All these acceptances exemplify our successful push to increase language technology for Icelandic.

Note: For more details about the papers, please go to our Publications page.