Last Tuesday Svanhvít and Ásmundur completed the first stage in Named Entity Recognition project for Icelandic. They finished the daunting task of labeling all named entities in a text corpus of 1 million tokens (MIM-GOLD), into the following categories: Person, Location, Organization, Micellaneous, Money, Percent, Time, and Date.
To assist with the task, they first preprocessed the corpus using regular expressions to catch some cases and then verified and completed the labeling using the brat rapid annotation tool. Their next task will be to create a few baseline NER tagging systems using the labelled dataset.
The dataset will be publicly available this spring.
This has been a great summer for LVL. We have many conference acceptances: 8 papers, 3 conferences. It will also be a busy autumn, as all the conferences are in September.
Our first is Recent Advances in Natural Language Processing, a very competitive NLP conference. This year it is in Varna, Bulgaria. Steinþór, Örvar and Hrafn’s paper, “Augmenting a BiLSTM tagger with a Morphological Lexicon and a Lexical Category Identification Step” and Hrafn’s paper “A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System” will both be presented.
Next is Interspeech in Graz, Austria. We have four papers:
Yu-Ren Chien – “F0 Variability Measures Based on Glottal Closure Instants”
Inga Helgadóttir – “The Althingi ASR System”
Anna Rúnarsdóttir – “Lattice re-scoring during manual editing for automatic error correction of ASR transcripts”
Anna Nikulásdóttir – “Bootstraping a Text Normalization System for an Inflected Language. Numbers as a Test Case”