Leon Derczynski
Contact details
Funded projects

name: Leon Derczynski

email: ld@itu.dk

twitter: @leonderczynski

telephone: +45 5157 4948

post: ITU Copenhagen, Rued Langgaards Vej 7, 2300 Copenhagen, Denmark

NLPL: 2017-2020, NordForsk. www.nlpl.eu. Nordic Language Processing Laboratory. A cross-Nordic collaboration of high-performance computing resources and natural language processing resources, between universities and e-infrastructure organisations.
Role: PI for ITU.

COMRADES: 2016-2018, EC H2020 IA €2.0M. www.comrades-project.eu. Collective platform for community resilience & social innovation during crises.
Role: Co-I for U.Sheffield.

PHEME: 2014-2017, EC FP7 CP €4.3M. www.pheme.eu. Computing Veracity – the Fourth Challenge of Big Data. Pheme builds technology for finding how true claims made online are. This timely project was rated "excellent" at final evaluation.
Role: co-author, scientific co-ordinator.

uComp: 2013-2016, EC CHIST-ERA €1.25M. www.ucomp.eu. Embedded Human Computation for Knowledge Extraction and Evaluation. uComp built extensive resources for crowd-sourcing and social media processing, including an easy corpus construction tool integrated with GATE.
Role: named researcher.


TrendMiner: 2011-2014, EC FP7 CP €3.7M. Large-scale, Cross-lingual Trend Mining and Summarisation of Real-time Media Streams.

Upcoming & recent talks

Leon Derczynski

also: Leon Strømberg-Derczynski

View CV

Google Scholar: user=d8iwqa8AAAAJ

ORCID: 0000-0002-8656-3431

DBLP: Derczynski:Leon

1881 citations

18 h-index

29 i10-index

2018 10 01 at date

2018.08.2x: Program chair at COLING 2018, Santa Fe

2018.11.01: Dimensions of Variation in User-generated Text at the Workshop on Noisy User-generated Text (W-NUT), Brussels

2018.11.08: Fake News and Troll Detection at SLTC, Stockholm

2019.03.xx: OpenITU evening on Clinical AI, Copenhagen

2019.Q1: Guest lecturing ML & NLP at Innopolis University, Kazan, Russian Federation

Selected Publications


COLING: Emily M. Bender, Leon Derczynski, Pierre Isabelle. 2018. Proceedings of the 27th International Conference on Computational Linguistics

ISCRAM: Leon Derczynski, Kenny Meesters, Kalina Bontcheva, Diana Maynard. 2018. Helping Crisis Responders Find the Informative Needle in the Tweet Haystack


book: Leon Derczynski. 2017. Automatically ordering events and times in text

CSL: Isabelle Augenstein, Leon Derczynski, Kalina Bontcheva. 2017. Generalisation in Named Entity Recognition: A Quantitative Analysis


COLING: Leon Derczynski, Kalina Bontcheva, Ian Roberts. 2016. Broad Twitter Corpus: A Diverse Named Entity Recognition Resource

COLING: Leon Derczynski. 2016. Representation and Learning of Temporal Relations

AAAI: Leon Derczynski, Sean Chester. 2016. Generalised Brown Clustering and Roll-up Feature Generation


IPM: Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp, Genevieve Gorrell, Raphaël Troncy, Johann Petrak, Kalina Bontcheva. 2015. Analysis of Named Entity Recognition and Linking for Tweets

SemEval: Steven Bethard, Leon Derczynski, Guergana Savova, James Pustejovsky, Marc Verhagen. 2015. SemEval-2015 Task 6: Clinical TempEval

WEBIST: Martin Leginus, Leon Derczynski, Peter Dolog. 2015. Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping best paper


EACL: Leon Derczynski, Kalina Bontcheva. 2014. Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets

EACL: Leon Derczynski, Kenneth S. Bøgh. 2014. DKIE: Open Source Information Extraction for Danish


EACL: Kalina Bontcheva, Ian Roberts, Leon Derczynski, Dominic Rout. 2014. The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy


ICTIR: Leon Derczynski, Robert Gaizauskas. 2013. Information Retrieval for Temporal Bounding

ACL: Leon Derczynski, Robert Gaizauskas. 2013. Temporal Signals Help Label Temporal Relations

SemEval: Naushad UzZaman, Hector Llorens, Leon Derczynski, James Allen, Marc Verhagen, James Pustejovsky. 2013. SemEval-2013 Task 1: TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations.

HT: Leon Derczynski, Diana Maynard, Niraj Aswani, Kalina Bontcheva. 2013. Microblog-Genre Noise and Impact on Semantic Annotation Accuracy

EDBT: Leon Derczynski, Bin Yang, Christian S. Jensen. 2013. Towards Context-Aware Search and Analysis on Social Media Data


LREC: Hector Llorens, Leon Derczynski, Robert Gaizauskas, Estela Saquete. 2012. TIMEN: An Open Temporal Expression Normalization Resource

Relative position of Starman (live data)

EARTH:km :mi VEL:km/s :km/h :mph (origin)

Masters projects

NLP: Polysynthetic morphology extraction. Greenlandic is a difficult language to work with because it forms words by adding many parts (called morphemes) together. This project builds a tool for identifying where those morphemes in a word in a sentence. It might use rules, or a deep learning based method, perhaps using character-level convolutional neural nets, or neural attention.

NLP: Clinical event recognition. Lots of information about patients and their health is kept in clinical notes. Clinical note technology is advanced for English but not so for Danish - a shame, because we have great digitalisation. This mini-project helps close that gap by building tools for automatically extracting clinical events (like surgeries, heart attacks, medication changes) from Danish clinical notes.

ML: Gemstone valuation. Gemstones are tough to value, due to their hugh variance. This project involves collecting a lot of data on one or two gem types, and building a model of gem stone value based on various features of the stone - e.g. color richness, cut depth, cut quality, mine of origin, weight, and so on. A working model will perform automatic estimation of gem value, an otherwise expensive service.

2018.08 I was program co-chair for COLING 2018 (we had 1018 full paper submissions)

2018.07 Read an interview with me in "Alt om Data" (Danish): Nye sandheder om falske nyheder


NLP: Time recognition. We should be able to connect dates and day mentions to calendars. This is a tougher task than it looks to automate (e.g. on what day is pinse 2014?), and needs to be done for each language. Here we'll use ISO-TimeML to process for Danish (or another non-English language) and then build a neural network for recognizing times automatically.

NLP: Stance detection. Fake news detection currently relies on knowing the attitude that people talking on social media are expressing towards and idea. Figuring this out is called stance detection. This project builds a stance detection system for Danish or a non-English language of your choice, over social media data.

NLP: Neural Eliza. There's an old program, Eliza, that lets you talk to the computer. It was far ahead of its time. This project builds a neural network that functions as Eliza does.

NLP: Offensive & hate speech detection. Catching potentially offensive speech helps us measure the tone of a dialog. This is useful in many contexts, including assisting content moderation, thus avoiding legal action and maintining uptime. This mini-project takes tools for English and involves experimenting to build an offensiveness (or politeness) tool for Danish.

Group information

brown (generalised)

\[ MI(C_i,C_j)= p(\left< C_i,C_j\right>)\ \log_2{\frac{p(\left<\ C_i,C_j\right>)}{p(\left< C_i,*\right>)\ p(\left<*,C_j\right>)}} \]

\[ AMI(C) = \sum_{C_i,C_j\in C}{MI(C_i,C_j)} \]

\[ C_{i\leftarrow j} = \left( C \setminus \left\{C_i,C_j\right\} \right) \cup \left\{C_i \cup C_j \right\} \]

\[ 0 < a \leq ||C|| \]

\[ i,j \in [1..a] \]

\[ \DeclareMathOperator*{\argmax}{arg\,max} \hat{\pi}(C) = \argmax_{C_i,C_j\in C,i\neq j}{\ AMI(C) - AMI(C_{i\leftarrow j})}. \]

more: paper , code

NLP group at the ITU University of Copenhagen

Four faculty

Group leads the Data Science program

For details and research specialisations, visit the webpage: nlp.itu.dk

ITU is an agile, non-traditional, expanding university. I think of it as the startup of unis. Projects are easy to take on and novel ideas have plenty of instituional support. It achieves one of the highest rates of external funding per faculty member in the country, and the highest rate of female:male student applicants (29% of Bachelor in Software Development course applicants were female in 2018). The building is situated just between KUA and DR, and has a non-academic, light and spacious feel.

See more about the university at itu.dk - also in [english]

Research interests
Awards and recognition

Natural Language Processing / Text Mining

Fake news, veracity & stance - continuing work from the PHEME project: how do we determin veracity of claims made on the web? What behaviours exist around news? How do we know that the data your system is processing is geniune and accurate?

Noisy text processing - including social media text, and clinical text (medics' notoriously bad handwriting does, in aggregate, sometimes map also to keyboard skills)

Clinical text mining - and pre-clinical public health. Continuing previous associations with Mayo Clinic, NHS SLaM, and Harvard Children's

Information extraction - who did what to whom, and where and when? This is explained in the text but harder to automatically extract and reason about, even though we can do a lot with that ability.

Spatio-temporal language - extracting timelines from text

keywords: artificial intelligence, machine learning, natural language processing, social media, time

  • 2016: NVIDIA Hardware Grant.
  • 2016: University of Sheffield Engineering Development Opportunities grant (visiting NTNU Trondheim).
  • 2016: Comrades, H2020 project, 36 months (co-investigator); €2.0m.
  • 2015: Martin Leginus wins WEBIST Best Student Paper award, co-authored and supervised.
  • 2014: University of Sheffield Exceptional Contribution Award (4% award rate).
  • 2013: University of Sheffield Exceptional Contribution Award (4% award rate).
  • 2013: Pheme, FP7 project, 36 months (co-author, named investigator); €4.3m.
  • 2013: Shortlisted for Ted Nelson ACM SIGWEB prize.
  • 2008: EPSRC Enhanced Doctoral Training Grant.
  • 2008: Douglas Lewin prize for best final-year exam performance (equiv. dux litterarum).
  • 2006, 2007: Nanjing University of Aeronautics and Astronautics' Best English language teacher.
5e / AL
© Leon Strømberg-Derczynski