Contact details

name: Leon Derczynski

email: (public, subject to FOI requests)

twitter: @leonderczynski

telephone: +45 5157 4948

post: IT University of Copenhagen, Rued Langgaards Vej 7, 2300 Copenhagen, Denmark

Available Research projects

Working with me as a grad student

Grad students (cand.) working with me are expected to join the research team and do real research. This is how I keep supervision efficient and is the best way to make sure you do well.

I am a resource for you to manage. I will keep an eye on the big picture and help you get things in place on time. You should manage the day-to-day and be sure you're organised and that you ask me questions that get the most our of your supervision.

For the autumn semester research project, I expect a literature review, a prototype MVP, and a week-by-week plan of how you will address the problem. I hope to meet roughly weekly and to hear about progress each week.

You can read about the team & our work at

Advice for students doing research projects in ML/NLP

NLP: Danish Question Answering.

Question answering (QA) is where we ask a program a natural language question (like How high is the Eiffel tower?) and the program returns an answer (hopefully correct). This is done pretty well for English but not at all well for Danish. This project develops QA resources and models for Danish, using data harvesting and neural language models.

ML: Drift-independent Entity Recognition.

It's useful to recognise names of people, places, organisations etc. in text; this task is called "Named Entity Recognition". Unfortunately, machine learning can only ever use training data in the past, so detecting new entities is tough -- for example, a system using data from 2018 might have a tough time recognising COVID as a special name. This project uses technology like adversarial learning to develop time-independent recognisers.

NLP: Identifying targets of abuse.

Neural language models can detect abusive content reasonably well, but they aren't always good at working out who is being abused. That makes it harder to work out where community problems are, in cases where abuse damages the conversation. This project focuses on identifying targets of abuse, particularly in Danish.

NLP: Low-resource Machine Translation.

Automatic translation depends a lot on the amount of data available for the translation pair. This makes this technology hard to access for people using languages that have fewer speaker. The project investigates data-efficient machine translation for smaller languages, using existing parallel datasets. Case study languages include Faroese or Bornholmsk.

NLP: Air Traffic Control Speech Recognition.

Aircraft communication is pushes human understanding to the limit. Information is highly contextual, and given out sometimes in a broad range of accents and often over very noisy radio channels. Air traffic control communication is also very important to get right, with misunderstandings having potentially lethal impact. This project combines machine learning, natural language processing and automatic speech recognition to develop systems for automatic transcription of air traffic control audio.

NLP: Lightweight absuive language detection.

Online abuse harms individuals, breaks the law, and damages communities. It is difficult to moderate; recently, Facebook paid $52 million in a lawsuit about their content moderators developing PTSD; 68% of online Denmark avoids discussions due to the hard tone. This project combines contrastive example generation and lightly transformer models to develop general and fast abusive language detection tools (language of your choice).

NLP: Cross-lingual Fact Verification.

With billions of individual pages on the web providing information on almost every conceivable topic, we should have the ability to collect facts that answer almost every conceivable question. However, only a small fraction of this information is contained in structured sources. This thesis project addresses this problem, with an application to fake news.

This is linked to a challenge run by Amazon Research in Cambridge, FEVER.

Prior research projects

NLP: Identifying and analysing political quotes related to climate change

Data and tools are built to detect and analyse quotes from politicians related to climate change. A manually labelled dataset is used to train the climate-classifier, KlimaBERT, achieving an F1 of 0.97. This is used in combination with historical voting-data to train a model able to predict how the politicians vote on climate-related bills, achieving a macro F1 of 0.72.

Completed by: Jonathan Kristensen 2022

Survey: klimabert.pdf

NLP: Automatic Quote Selection in News Production

Selecting the most relevant quote to include in an article is an important part of the process of journalistic news writing. It is a process done by humans, and the result relies on a number of factors, including intuition and professional experience, knowledge of context, stylistic preference and narrative function among many others. This thesis introduces definition, data, annotations, model, and an interactive tool for journalist quote selection from transcripts.

Completed by: Lasse Funder Andersen 2021

Survey: news-quotes.pdf

NLP: Automatic Text Summarization For Danish Using BERT

Information overload is a prevailing feature in modern society. Automatic text summarization is a useful tool to address this by enabling the reduction of longer texts down to their most important content. For lower resource languages like Danish, this is not the case. In this thesis, we explore the use of the popular BERT architecture for improving research in automatic summarization for Danish. Furthermore, we introduce TSAuBERT (Textual Similarity Assessment using BERT), a BERT based evaluation metric used for assessing the quality of automatically generated summaries, attempting to address the pitfalls of the current summarization evaluation protocol.

Completed by: Lukas Christian Nielsen, Sebastian Lindegaard Veile 2020

Survey: danish-summarisation.pdf

NLP: Low-Carbon NLP

As state-of-the-art models within the field of Natural Language Processing push boundaries for better performance, their energy consumption grows. This study develops 154 different Transformer models, based on the pre- trained RoBERTa model with a masked language modeling head for this specific downstream task. The model hyperparameters are chosen through Bayesian optimisation using the Hyperopt framework, with a loss function given as energy consumption multiplied with perplexity, dubbed energy loss, that the optimiser will attempt to minimise. We then analyse the hyperparameters chosen by the optimiser for all the models, and the corresponding energy consumption and perplexity for these.

Completed by: Mads Guldborg Kjeldgaard Kongsbak, Lucas Høyberg Puvis de Chavannes, Timmie Mikkel Rantzau Lagermann 2021

Survey: low-carbon-nlp.pdf

NLP: Featherweight NLP: Tolerable Lower Limits for Size of Distilled Transformer Language Models

To examine the extent to which models can be compressed, we conduct experiments with knowledge distillation, pruning, and quantization. We distill the behavior from the RoBERTa language model to Long Short Term Memory (LSTM), Recurrent Neural Network, and Feed-Forward Network models. For each, we experiment with different embeddings types and sizes. We conduct experiments on three tasks from the General Language Understanding benchmarks (GLUE). Results show that using task-specific distillation allows for small models. One distilled and quantized bidirectional LSTM model surpasses the baseline performance presented by GLUE. Compared to the size of the RoBERTa teacher, this model achieves size reductions of 3528x for SST-2 and 963x for QQP and MNLI.

Completed by: Mikkel Hooge Sørensen, Magnus Malthe Jacobsen 2021

Survey: featherweight-nlp.pdf

ML: Smart-charging e-scooter fleets using marginal emission factors

E-scooters used by micromobility providers emit notably more CO2 while charging during times where electricity is mainly produced by fossil fuels. Smart-charging al- lows e-scooter batteries to be charged during times where electricity is relatively less carbon-intensive by shifting the charging session to a different time. This requires a forecast of the Marginal Emission Factors (MEFs). Coupled with forecasted MEFs we run simulations to determine savings of various charging strate- gies with various parameters. Results showed that for a fleet in Copenhagen, it is possible to reduce charging emissions by up to 13 percent over a month of operations with a surplus of batteries.

Completed by: Markus Killendahl, Felix Qvist, 2021

Report: smart-charging-mef.pdf

ML: Algorithmic Fairness in Automatization of the Public Sector

Automated decision making (ADM) is quickly finding its way into public sectors around the world, where machines replace caseworkers. A lot of these caseworkers deal with cases containing a high level of discretion and they cooperate with their colleagues in making decisions. In automating decisions, a lot of ethical questions arise in relation to the algorithmic fairness (AF) of these models and how fairness can be enhanced. We here present a literature review investigating the intersections between the two concepts discretion and AF. The investigation is divided into three sections. One explaining and identifying our literature. One analyzing the literature to define the terms discretion and AF and lastly a section discussing the overlaps and differences between discretion and AF in order to identify knowledge gaps for future research.

Completed by: Johanne Engel Aaen, Katrine Boberg Knudsen, Simone Etwil-Meyland, 2021

Survey: algorithmic-fairness-public-sector.pdf

NLP: Danish Clinical Event Extraction Developing a clinical event extraction system for electronic health records using deep learning and active learning

Danish electronic health records contain valuable information that is not fully utilized today. An active learning system is implemented for sentence querying to explore how the cost of annotation can be reduced. We find that both strategies lead to a higher F1-score than random sampling when trained on the same amount of sentences. However, in terms of the amount of annotated words, we find random sampling to be less costly than both active learning strategies.

Completed by: Frederik Wonsild, Mathias Giovanni Møller, 2020


Report: clin-events_frwo_mgmo.pdf

NLP: Danish Fact Verification: An End-to-End Machine Learning System for Automatic Fact-Checking of Danish Textual Claims

In this thesis we are addressing the challenge of automatic fact verification for Danish by developing the first end-to-end solution for this task. Accordingly, we present a new data set and a trained inference model for Danish-language fact verification. We assess the presence of unintended signals in our data set by adapting a neural probing method to the fact verification task. Further, we demonstrate our data set's feasibility for Danish fact verification by developing an end-to-end machine learning system that retrieves relevant evidence for a claim and predicts its veracity. This system achieves a micro f1 of 58.6% and macro f1 of 53.4% on our test set.

Completed by: Henri Schulte and Julie Christine Binau, 2020


Report: fever-da_jubi_hens.pdf

NLP: Developing a Cross-Lingual Named Entity Recognition Model

This paper explores to what extend language-specific features are incorporated in Transformer weights. We group languages by the absence and presence of certain features as well as arrange language-pairs by varying levels of similarity between fine-tuned and evaluation language. We find four features which, if present in training, improve results for evaluation languages containing them, indicating that those features are indeed learnt.

Completed by: Philine Zeinert and Jowita Julia Podołak, 2020


Report: xling-ner_jopo_phze.pdf

NLP: Multilingual Hate Speech Detection

This thesis aims to investigate abuse in online media, specializing in social media platforms. We set a special focus on experimenting with various multilingual hate speech detection settings and conduct five categories of experiments to tackle hate speech detection, namely: monolingual, bilingual, multilingual, knowledge transfer and zero shot or cross-lingual experiments. We also contribute with the creation of the largest annotated Albanian dataset for offensive and hate speech, which was annotated following the OffensEval schema.

Completed by: Jorgel Këci and Erida Nurçe, 2020


Report: xling-ablang_ernu_joke.pdf

NLP: Multilingual Detection Of Offensive Speech In Social Media

Offensive speech and its implications has become a growing concern for society, and systems that can reliably identify this type of content is in huge demand. While the task has been solved to a certain extent for languages where sufficient annotated data is available, the problem becomes harder to solve for languages where annotated data is scarce. We explore three different approaches by training models relying on multi-lingual embeddings, contextual embeddings and adversarial learning.

Completed by: Morten Laursen and Kasper Friis, 2020


Report: adversarial-hsd_mdla_kalo.pdf

NLP: Fact Extraction and Verification in Danish

Fact extraction and verification has gained interest due to the vast amount of accessible online information surrounding people on an everyday basis and the increasingly important task of navigating between correct and incorrect information, also known as information disorder. For this task, we fine-tuned a BERT model and created a data set of Danish claim and evidence entities to be used in the model. Our model achieved a weighted F1 score of 67.30 % and a weighted accuracy score of 63.18 %. With this work, we hope to create a basis for further development of fact extraction and verification models for the Danish language.

Completed by: Sidsel Latsch Jespersen and Mikkel Ekenberg Thygesen, 2020


Report: fever-da_slje_mekt.pdf

NLP: Discriminating between similar Nordic Languages using Machine Learning

This 7.5 ECTS project investigated the Discriminating between Similar Languages (DSL) task. It develop a machine learning based pipeline for automatic language identification for the Nordic languages. Concretely we will focus on discrimination between six similar Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic. Multiple neural and non-neural approaches were evaluated for this novel framing of a difficult task, across genres, leading to good results.

Completed by: René Haas, 2019


Report: NordicLang-ReneHaas.pdf

NLP: Offensive & hate speech detection

Catching offensive speech helps us measure the tone of a dialog. This is useful in many contexts, assisting content moderation, thus avoiding legal action and maintining uptime. This project involved building an offensiveness detection tool for Danish. Use of the technology was also featured in Mandag Morgen and Politiken.

Completed by: Guðbjartur Ingi Sigurbergsson, 2019


Masters' Thesis: Multilingual hate speech detection.pdf

Publication: LREC 2020, Marseilles pdf

NLP: Stance detection and veracity prediction for Danish

Fake news detection currently relies on knowing the attitude that people talking on social media are expressing towards and idea. Figuring this out is called stance detection. This project built a stance detection system for Danish over social media data and used the results to predict the veracity of claims on Reddit with over 80% accuracy.

Completed by: Anders Edelbo Lillie and Emil Refsgaard Middelboe, 2019


Masters' Thesis: arXiv:1907.01304 arXiv:1907.00181

Publication: NODALIDA 2019, Turku pdf

NLP: Political Stance detection

Knowing the attitudes that people express towards ideas, events, organisations and other targets helps us automatically measure their preferences and behaviour. This thesis project investigated how to measure those attitudes, or stances, in politicians, towards current issues, in the light of Danish politicians. The result was a tool for automatically monitoring political stance as well as an annotated dataset.

Completed by: Rasmus Lehmann, 2019


Masters' Thesis: Stance Detection in Danish Politics.pdf

Publication: NODALIDA 2019, Turku pdf

NLP: Clinical Information Extraction for Danish

Clinical records contain patient information, and that's often stored on a computer system. But not every fact about a patient has its own field on a form; the rest of the information gets written up in a clinical note. It's estimated that about 40% of patient information is stored only in the text. However, there were no tools for processing this information for Danish. This thesis project built an NLP toolkit for Danish NLP, as well as developing condition mention detection and linking to SKS, the Danish clinical ontology.

Completed by: Nichlas Berggrein and Mathias Rasmussen, 2019


Masters' Thesis: Named_Entity_Recognition_and_Disambiguation_MSc_Thesis 2019.pdf

NLP: Scalable Speech Recognition

This project describes an implementation of an Automatic Speech Recognition (ASR) system converting speech to text. It extracts Mel features, Log Mel features, and Mel-Frequency Cepstral Coefficients (MFCC) from sound and use them to train an Acoustic Model (AM) Deep Neural Network (DNN). The models are trained on two different hardware systems with four GPUs. The training process is benchmarked and optimized. Evaluation of the through- put, latency, and accuracy of the models is done and compared to other ASR systems. The best model implemented has a Word Error Rate (WER) of 10.5 and a latency shorter than the duration of the input making it appropriate for real-time applications.

Co-supervised with Pınar Tözün

Completed by: Sebastian Benjamin Wrede and Sebastian Baunsgaard, 2019


Masters' Thesis: Scalable speech recognition.pdf

© Leon Strømberg-Derczynski