Page d'accueil // Recherche // FSTM // DCS // Projets de r... // A Semantic Search Engine for the Retrieve of Similar Patterns in Luxembourgish Texts

A Semantic Search Engine for the Retrieve of Similar Patterns in Luxembourgish Texts

Financement: University of Luxembourg External Organisation Funding
Date de début: 15 janvier 2018
Date de fin: 14 janvier 2021

Description

What is it about?

The aim of STRIPS is to develop a toolbox of semantic search algorithms for Luxembourgish. We want to implement search algorithms to retrieve and to monitor, e.g., temporal patterns of named entities in Luxembourgish texts. The term semantic, hereby, does not only refer to the usage of keywords or Bag-of-Words like names or geographic identifiers, but fosters also on more complex structures like, for example, on concepts (e.g., topics or themes) and a document’s sentiment (e.g., a positive or a negative polarity of the document). The main focus of STRIPS lies in the linguistic processing of texts written in Luxembourgish (particularly stemming, use of phonetic dictionaries and tagged word list for Luxembourgish; Part-of-speech-tagged text corpus), in similarity learning aspects to allow fuzziness in search queries, and in the identification of temporal cross-dependencies inside the Luxembourgish text corpus. To validate the project, we have given heterogeneous text sources (official news items and user-contributed comments) by RTL.

Project Members

  • Prof Dr Peter Gilles
  • Prof Dr Christoph Schommer
  • Dr Joshgun Sirajzade
  • Dr Christoph Purschke
  • MSc. Daniela Gierschek
  • Thanks to the students from the 1GSO-Abschlussklasse des Lycée Nic-Biever, Dudelange.
  • Thanks to the students from the école privée Sainte-Sophie, Luxembourg-Kirchberg.

Prospective students: Anna Felix (Master), Rosito Gerbo (Erasmus Mundus, Torino, Italy).

Former participants: Elisabeth Joy (Department of Computer Science) Elida van Nierop (Department of Mathematics) Rik Lamesch (Department of Mathematics)

Publications

  • Joshgun Sirajzyade, C. Schommer The LuNa Open Toolbox for the Luxembourgish Language. In Conference Proceedings Advances in Data Mining, Applications and Theoretical Aspects. New York (2019).
  • Joshgun Sirajzade, Daniela Gierschek, Christoph Schommer and Peter Gilles. Component analysis of adjectives in Luxembourgish for detecting sentiments. Computational Linguistics in the Netherlands (CLIN 29) (2019).
  • Daniela Gierschek. Automatic Detection of Sentiment in Luxembourgish User Comments. CL-Postersession at the 41st Annual Conference of the German Linguistic Society (2019).
  • Daniela Gierschek, Peter Gilles, Christoph Purschke, Christoph Schommer, Joshgun Sirajzade. A Temporal Warehouse for Modern Luxembourgish Text Collections. DH Benelux (2019).
  • Elida van Nierop. Improving LDA Topic Modelling using word embeddings. Master Thesis (2018).
  • Joshgun Sirajzade, Christoph Schommer. Mind and Language. AI in an Example of Similar Patterns of Luxembourgish Language. Proceedings International Conference on Artificial Intelligence and Humanities. Seoul, Korea (2018).
  • Daniela Gierschek. Automatic Detection of Emotions in Luxembourgish User Comments. PhD Forum at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2018.
  • Ekaterina Kamlovskaya, Christoph Schommer, Joshgun Sirajzade. A Dynamic Associative Memory for Distant Reading. Proceedings International Conference on Artificial Intelligence and Humanities. Seoul, Korea (2018).
  • Joshgun Sirajzade. Korpusbasierte Untersuchung der Wortbildungsaffixe im Luxemburgischen. Technische Herausforderungen und linguistische Analyse am Beispiel der Produktivität. Zeitschrift für Wortbildung = Journal of Word Formation (2018), 2(1).

In the press

  • Wéi si se geduecht: Positiv? Negativ? Neutral? RTL Kultur news (16 December 2019). Luxemburger Wort. 24 April 2019: Luxemburgish ganz Digital: Schnëssen und Strips: So funktioniert moderne Sprachforschung an der Universität Luxemburg. von Birgit Pfaus-Ravida

Membres