Contact details
Funded projects

name: Leon Derczynski
status: Assistant Professor of Computer Science

email: (public, subject to FOI requests)
twitter: @leonderczynski
telephone: +45 5157 4948
post: IT University of Copenhagen, Rued Langgaards Vej 7, 2300 Copenhagen, Denmark

ClinRead: 2020-2021, Novo Nordisk Foundation, 543K DKK. Rapid Clinical Note Mining for New Languages.
Role: PI (sole applicant).

FaDa: 2020-2021, UFM, 146K DKK. Enabling language technology for the Faroe Islands and Denmark.
Role: official participant.

Verif-AI: 2020-2022, DFF, 2,9M DKK. Automatic multilingual misinformation detection and fact verification support.
Role: PI (sole applicant).

ErhvervsForsker: 2019-2022, InnovationsFonden PhD, 1,07M DKK. Deep Learning Generative Models for Content Structuring.
Role: PI for ITU.

NLPL: 2017-2020, NordForsk. Nordic Language Processing Laboratory. A cross-Nordic collaboration of high-performance computing resources and natural language processing resources, between universities and e-infrastructure organisations.
Role: PI for ITU.


COMRADES: 2016-2018, EC H2020 IA €2.0M. Collective platform for community resilience & social innovation during crises.
Role: Co-I for U.Sheffield.

PHEME: 2014-2017, EC FP7 CP €4.3M. Computing Veracity – the Fourth Challenge of Big Data. Pheme builds technology for finding how true claims made online are. This timely project was rated "excellent" at final evaluation.
Role: co-author, scientific co-ordinator.

uComp: 2013-2016, EC CHIST-ERA €1.25M. Embedded Human Computation for Knowledge Extraction and Evaluation. uComp built extensive resources for crowd-sourcing and social media processing, including an easy corpus construction tool integrated with GATE.
Role: named researcher.

TrendMiner: 2011-2014, EC FP7 CP €3.7M. Trendminer on CORDIS. Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams.

Upcoming & recent talks

Leon Derczynski

also: Leon Strømberg-Derczynski

View CV

Google Scholar: user=d8iwqa8AAAAJ

ORCID: 0000-0002-8656-3431

DBLP: Derczynski:Leon

3230 citations

24 h-index

50 i10-index

2020 09 03 at date

2018.08.2x: Program co-chair at COLING 2018, Santa Fe

2018.11.01: Dimensions of Variation in User-generated Text at the Workshop on Noisy User-generated Text (W-NUT), Brussels

2018.11.08: Fake News and Troll Detection at SLTC, Stockholm

2019.Q1: Guest lecturing ML & NLP at Innopolis University, Kazan, Russian Federation

2019.05.06: Opening keynote at Nordic Disinformation conference

2019.05.23: Automatic Detection of Fake News at PET, the Danish Security and Intelligence Service

2019.11.28: Sociolinguistics from data on the back of an envelope, at DIGHUMLAB's AI workshop Aarhus University

2020.05.22: Plenary talk at workshop on Trolling, Aggression and Cyberbullying

Selected Publications

See full publication list


PLoS One: Bertie Vidgen, Leon Derczynski. 2020. Directions in Abusive Language Training Data (to appear)

Nat Sci Rep: Anna Kolliakou, Ioannis Bakolis, David Chandran, Leon Derczynski, Nomi Werbeloff, David PJ Osborn, Kalina Bontcheva, Robert Stewart. 2020. Mental Health-Related Conversations on Social Media and Crisis Episodes: A Time-Series Regression Analysis


TTO: Leon Derczynski, Torben Oskar Albert-Lindqvist, Marius Venø Bendsen, Nanna Inie, Viktor Due Pedersen, Jens Egholm Pedersen. 2019. Misinformation on Twitter During the Danish National Election: A Case Study

NAACL: Manuel Ciosici, Leon Derczynski, Ira Assent. 2019. Quantifying the morphosyntactic content of Brown Clusters


COLING: Emily M. Bender, Leon Derczynski, Pierre Isabelle. 2018. Proceedings of the 27th International Conference on Computational Linguistics

ISCRAM: Leon Derczynski, Kenny Meesters, Kalina Bontcheva, Diana Maynard. 2018. Helping Crisis Responders Find the Informative Needle in the Tweet Haystack


book: Leon Derczynski. 2017. Automatically ordering events and times in text

CSL: Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva. 2017. Generalisation in Named Entity Recognition: A Quantitative Analysis


COLING: Leon Derczynski, Kalina Bontcheva, Ian Roberts. 2016. Broad Twitter Corpus: A Diverse Named Entity Recognition Resource

COLING: Leon Derczynski. 2016. Representation and Learning of Temporal Relations


AAAI: Leon Derczynski, Sean Chester. 2016. Generalised Brown Clustering and Roll-up Feature Generation


IPM: Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, Kalina Bontcheva. 2015. Analysis of Named Entity Recognition and Linking for Tweets

SemEval: Steven Bethard, Leon Derczynski, Guergana Savova, James Pustejovsky, Marc Verhagen. 2015. SemEval-2015 Task 6: Clinical TempEval

WEBIST: Martin Leginus, Leon Derczynski, Peter Dolog. 2015. Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping - best paper


EACL: Leon Derczynski, Kalina Bontcheva. 2014. Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets

EACL: Leon Derczynski, Kenneth S. Bøgh. 2014. DKIE: Open Source Information Extraction for Danish

EACL: Kalina Bontcheva, Ian Roberts, Leon Derczynski, Dominic Rout. 2014. The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy


ICTIR: Leon Derczynski, Robert Gaizauskas. 2013. Information Retrieval for Temporal Bounding

ACL: Leon Derczynski, Robert Gaizauskas. 2013. Temporal Signals Help Label Temporal Relations

SemEval: Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations.

Hypertext: Leon Derczynski, Diana Maynard, Niraj Aswani, Kalina Bontcheva. 2013. Microblog-Genre Noise and Impact on Semantic Annotation Accuracy

EDBT: Leon Derczynski, Bin Yang, Christian S. Jensen. 2013. Towards Context-Aware Search and Analysis on Social Media Data


LREC: Hector Llorens, Leon Derczynski, Robert Gaizauskas, Estela Saquete. 2012. TIMEN: An Open Temporal Expression Normalization Resource

Relative position of Starman (live data)

EARTH:km :mi VEL:km/s :km/h :mph (origin)

Student projects

I'm always open to supervising motivated and capable students, for thesis or other project work.

Open projects are described on a dedicated page:

View open research projects.


2020.02 We received global press coverage for our work on mental health analysis through twitter

2019.11 Our work on Bornholmsk language technology received national press coverage (Danish): »Ijn bruner katt«: Kunstig intelligens skal redde truet dansk dialekt

2019.05 I wrote a series of data-driven articles on the Danish national election for the press (Danish): Mandag Morgen

2018.10 Dagbladenes Bureau interviewed me on using AI to hire people (Danish): Ansat af en maskine

2018.08 I was program co-chair for COLING 2018 (we had 1018 full paper submissions)

2018.07 Read an interview with me in "Alt om Data" (Danish): Nye sandheder om falske nyheder

Mutual information
Group information

brown (generalised)

\[ MI(C_i,C_j)= p(\left< C_i,C_j\right>)\ \log_2{\frac{p(\left<\ C_i,C_j\right>)}{p(\left< C_i,*\right>)\ p(\left<*,C_j\right>)}} \]

\[ AMI(C) = \sum_{C_i,C_j\in C}{MI(C_i,C_j)} \]

\[ C_{i\leftarrow j} = \left( C \setminus \left\{C_i,C_j\right\} \right) \cup \left\{C_i \cup C_j \right\} \]

\[ 0 < a \leq ||C|| \]

\[ i,j \in [1..a] \]

\[ \DeclareMathOperator*{\argmax}{arg\,max} \hat{\pi}(C) = \argmax_{C_i,C_j\in C,i\neq j}{\ AMI(C) - AMI(C_{i\leftarrow j})}. \]

more: paper , code

I co-ordinate "NLP South" at the ITU University of Copenhagen, and am also affiliated within the Machine Learning group.

Available RA, PhD, postdoc and faculty positions are announced via the ITU job board, where you can also create alerts for new positions.

See more about the university at - also in [english]

Research interests
Awards and recognition

Natural Language Processing / Text Mining

Misinformation & stance - how do we determine truth of claims on the web? What behaviours exist around false news? How do we know that the data your system is processing is geniune and accurate?

Processing sparse + noisy data - including social media data, clinical text, and minority languages

Clinical text mining - and pre-clinical public health. Continuing previous associations with Mayo Clinic, NHS SLaM, and Harvard Children's hospital

Danish NLP - Improve the environment that you live in

keywords: natural language processing, machine learning, veracity, clinical nlp social media, artificial intelligence, dansk

  • 2016: University of Sheffield Engineering Development Opportunities grant (visiting NTNU Trondheim).
  • 2015: Martin Leginus wins WEBIST Best Student Paper award, co-authored and supervised.
  • 2014: University of Sheffield Exceptional Contribution Award (4% award rate).
  • 2013: University of Sheffield Exceptional Contribution Award (4% award rate).
  • 2013: Pheme, FP7 project, 36 months (co-author, named investigator); €4.3M.
  • 2013: Shortlisted for Ted Nelson ACM SIGWEB prize.
  • 2008: EPSRC Enhanced Doctoral Training Grant.
  • 2008: Douglas Lewin prize for best final-year exam performance (equiv. dux litterarum).
  • 2006, 2007: Nanjing University of Aeronautics and Astronautics' Best English language teacher.
5e / AL
© Leon Strømberg-Derczynski