The Language and Voice Lab has specialists and doctoral students working in many different fields within speech and natural language processing and can accommodate students doing independent projects, final-year BSc projects, MSc projects and doctoral projects.  The following is a broad description of expert areas that students can define their projects within.

Text-to-speech synthesis

Contact: Jón

The group is working on creating Icelandic voices using the state-of-the-art deep neural network techniques.  For example, four male and four female voices are being created using the deep architecture  called FastSpeech2 using recordings from a single voice donor for each voice (eight in total).  Additional 40 voice donors have added a smaller amount of data to the collection.

Various research questions remain:

  • Is it possible to create a TTS voice that is trained on many different voice donors that is completely  novel and does not sound like any other voice?
  • What is the best way to control prosody for new TTS voices, so that we can emphasise certain words in the sentence, make the voice ask questions or express surprise, for example?
  • Can we generate TTS voices in real-time thus negating lag and creating the possibility of using the voices spontaneous conversations?
  • Can better signal processing methods applied in the back-end vocoding architecture improve voice quality in the synthesis?
  • How can we integrate TTS voices in applications that assist the blind and/or dyslexic?

Related research questions are welcomed and the group is happy to look at other languages if the resources can be obtained (which should be true for English in the right context, for example).

Automatic speech recognition

Contact: Jón, Hannes

Various different automatic speech recognisers for Icelandic are being developed within the group.  Large data collections for adults, adolescents, children and second language learners have been collected for Icelandic and domain specific speech has been collected from broadcasting, university lectures, online conversations and queries, for example.  The group is experimenting with four different state-of-the-art deep neural network architectures targeted at the adult speech data and the children voices remain challenging.

Project students could consider some of the following questions:

  • Can we adapt font-end feature extraction techniques to improve generative adversarial networks for speech recognition?
  • How can we make speech recognition be more robust with respect to noise?
  • What is the best way to build custom-made language models for topic specific speech recognition or voice command systems?
  • How does a natural language understanding unit handle errors from the speech recognition unit?
  • Can a speech recogniser learn on-the-fly during conversational speech processing?
  • How can speech recognition be used in applications that teach children to read?

The researchers in the group are happy to consider alternative research questions.  The focus of the group has mainly been on Icelandic but English and other languages are also of interest to the group.

Paralinguistic Speech Processing

Contact: Jón

Speech contains much more information than its linguistic content and researchers at the Language and Voice Lab have developed speech processing algorithms that can, for example, assess the cognitive workload of the speaker and detect voice qualities such as breathiness and hoarseness so to help people to use their voice properly.  Furthermore, the lab has created voice processing algorithms that create phenotypes that are matched against genetic information, thus relating the voice to potential underlying conditions or illnesses.  

Students might consider the following research questions:

  • What phonemes are best to use when determining cognitive workload in speech?
  • How is emotion expressed in Icelandic spoken language?
  • What features in specific phoneme types (e.g. fricatives, stops or diphthongs) can be used to characterise a speaker trait thus defining a phenotype for genetic research?
  • Can voice quality be tracked in continuous speech thus creating a possibility of monitoring the voice for vocal retraining?

Students are encourage to propose similar research questions if none of the above are of interest.

Machine Learning and Signal Processing

Contact: Jón

Many breakthroughs in machine learning and signal processing have happened in the context of speech and language processing.  The interplay between theory and practice works in both directions so practical problems sometimes provide the insight needed to progress theory.  The following examples characterise a larger set of unanswered questions in machine learning and signal processing :

  • What is the best way to represent a non-stationary signal like speech for machine learning?
  • What criteria are appropriate for embedding space when representing the speech signal?
  • What is the role of ensemble learning in speech processing?
  • Can phonetic information be used to stabilise back-propagation of the gradient in deep architecture?

This field is very open and students are encouraged to hone their research questions with the researchers in the group. 

Natural Language Processing

Contact: Hrafn

We work on several different areas of natural language processing: machine translation, automatic summarization, creating parallel corpora, and question answering. The field is very expansive and students are encourage to hone their research ideas within the lab.

If you are interested in working on any of the projects, email the people listed.

%d bloggers like this: