I left the University of Sheffield in February 2018, for the IT University of Copenhagen (ITU); this site is an archive. Find me at www.derczynski.com.2>
Leon Derczynski
Pronounced "der-CHIN-ski"
Research interests
- Social media processing
- Information extraction
- Spatio-temporal language
Contact
I prefer phone call or email: +45 5157 4948, or leonderczynski@gmail.com
You may also find me on Twitter: @LeonDerczynski.
My Academic CV.
By post:
Dr. Leon Derczynski
Department of Computer Science
211 Portobello
Sheffield
S1 4DP
United Kingdom
At home: Langelandsgade 209, 2th, 8200 Aarhus N, Denmark
News
- I will start a new faculty job at ITU Copenhagen on March 1, 2018. We both feel very lucky to have found each other 💕.
- Our IPM paper giving a roadmap of NER for social media is the most-cited paper in that journal for '16, '17, and over the past five years.
- A new shared task: NER for emerging, novel entities. Shared task at WNUT 2017.
- I'll be teaching NLP & ML at Innopolis University again in Fall 2017.
- My article Pathology of a fake news story was Medium's lead frontpage story for the May Day weekend.
- Honoured to be Program Co-Chair for COLING 2018 in Santa Fe, with Emily M. Bender.
- Our shared task on helping spot fake news, RumourEval, has been a success.
- Get your library to buy my book on extracting temporal information from text (I mean, it's a lot for a personal purchase!). If you want to read a preprint copy for yourself, there's a PDF here.
- Matteo Magnani and I had our proposal accepted and so will give a course on "Networks and User-generated Content" at ESSLLI 2017.
- I'll be at UCSD in 2017, under the aegis of Julian McAuley.
- The AU Datalab has been founded at Aarhus University!
- In May 2016 I became permanent Research Fellow at the University of Sheffield.
- Our H2020 project, COMRADES, of which I'm Co-I for Sheffield, started in February 2016.
Talks
February 2018
- Digital Literacy Project, AU, Aarhus, Denmark.
October 2017
- Metric word similarity space in search, at Innopolis University, RF.
July 2017
- 27: Corpora need the crowd – and your social media, at the University of Washington, USA.
March 2017
- 23: Building in diversity makes better AI, at UC Irvine, USA.
December 2016
- 2nd: at a special worksop on Massive Cultural Data, Aarhus, Denmark.
- 1st: at Uppsala University, Sweden.
November 2016
- 30th: Social variable extraction lecture in Humanities Programming course, Aarhus, Denmark.
Current projects
- Comrades, H2020 (Co-I) - Collective Platform for Community Resilience and Social Innovation during Crises.
- GATE - General Architecture for Text Engineering, for NLP in Java
- NLTK - The Natural Language ToolKit, for NLP in Python
Selected Publications
See also my full publications list.
- Automatically ordering events and times in text.
- book, in Studies in Computational Intelligence, Springer.
. 2017. - Generalisation in Named Entity Recognition: A Quantitative Analysis.
- in Computer Speech and Language, Elsevier.
. 2017. - Broad Twitter Corpus: A Diverse Named Entity Recognition Resource.
- in 26th International Conference on Computational Linguistics (COLING) (full paper, acceptance rate 25%).
. 2016. - Representation and Learning of Temporal Relations.
- in 26th International Conference on Computational Linguistics (COLING) (full paper, acceptance rate 25%).
. 2016. - Generalised Brown Clustering and Roll-up Feature Generation.
- in 30th AAAI Conference on Artificial Intelligence (AAAI) (full paper, acceptance rate 26%). [code] [sample data]
. 2016. - Analysis of Named Entity Recognition and Linking for Tweets.
- in Information Processing and Management (IPM). [data]
. 2015. - SemEval-2015 Task 6: Clinical TempEval.
- in Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015).
. 2015. - Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping.
- in Proceedings of the 11th International Conference on Web Information Systems and Technologies (WEBIST) (full paper, acceptance rate 14%). Winner of Best Student Paper award.
. 2015. - Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets.
- In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (short, acceptance rate 23.1%). [slides] [video]
. 2014. - DKIE: Open Source Information Extraction for Danish.
- In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (demo).
. 2014. - The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy.
- In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (demo). [docs] [code in GATE]
. 2014. - Information Retrieval for Temporal Bounding.
- In Proceedings of the 4th International Conference on the Theory of Information Retrieval (ICTIR 2013) (acceptance rate 22%). [poster] [bib]
. 2013. - Temporal Signals Help Label Temporal Relations.
- In Proceedings of the 51st meeting of the Association for Computational Linguistics (ACL 2013) (short, acceptance rate 24%). [bib] [poster]
. 2013. - SemEval-2013 Task 1: TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations.
- In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013). [slides] [bib]
. 2013. - Microblog-Genre Noise and Impact on Semantic Annotation Accuracy.
- In Proceedings of the 24th ACM Conference on Hypertext and Social Media (HT 2013) (acceptance rate 16.7%). Shortlisted for SIGWEB Ted Nelson award. [slides] [bib]
. 2013. - Towards Context-Aware Search and Analysis on Social Media Data.
- In Proceedings of the 16th Conference on Extending Database Technology (EDBT 2013) (track acceptance rate 13.6%). [slides] [bib]
. 2013. - TIMEN: An Open Temporal Expression Normalization Resource.
- In Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012). [code] [slides] [bib]
. 2012.
2017
2016
2015
2014
2013
2012
- Google Scholar: Leon Derczynski
- DBLP: Leon Derczynski
- ACM Digital Library: Leon Derczynski (ACM pubs only, laggy)
Publications listed elsewhere
Awards and Grants
- 2016: NVIDIA Hardware Grant.
- 2016: University of Sheffield Engineering Development Opportunities grant (visiting NTNU Trondheim).
- 2016: Comrades, H2020 project, 36 months (co-investigator); €2.0m.
- 2015: Martin Leginus wins WEBIST Best Student Paper award, co-authored and supervised.
- 2014: University of Sheffield Exceptional Contribution Award (4% award rate).
- 2013: University of Sheffield Exceptional Contribution Award (4% award rate).
- 2013: Pheme, FP7 project, 36 months (co-author, named investigator); €4.3m.
- 2013: Shortlisted for Ted Nelson ACM SIGWEB prize.
- 2008: EPSRC Enhanced Doctoral Training Grant.
- 2008: Douglas Lewin prize for best final-year exam performance (equiv. dux litterarum).
- 2006, 2007: Nanjing University of Aeronautics and Astronautics' Best English language teacher.
Resources
- Emerging Entities dataset and source texts, on github: github.com/leondz/emerging_entities_17
- RCV corpus generalised Brown clusters, a/c=2560, min occur=1; with merge file rcv.a2560.tar.bz2
- Guide to tuning Brown clustering (including many sample clusterings)
- GHA Brown clusters: m=2000, 250M tokens of tweets. gha.250M-c2000.tar
- TB_sig: revised and extended TimeBank, including extra signals, and any events/links associated with them. tb_sig.tar.bz2
- Bootstrapped microblog part-of-speech dataset, 1.5M tokens. twitter_bootstrap_corpus.tar.gz
- Twitter entity linking dataset (and NE splits). ipm_nel.tar.gz
- High-accuracy Penn Treebank Twitter PoS tagger.
- Named Temporal Expressions: list, tools, annotations. named_timex.tar.bz2
- Szeged material: day 1 day 2
- Uppsala material: day 1
Past projects
- Pheme, FP7 (co-author) - finding and stopping rumours on the net. Press: BBC News, Times of London, Voice of Russia, Le Parisien, Berlingske, Bild
- uComp, CHIST-ERA - Closing the AI gap with unconventional computation
- Guest editor of an Information Processing and Management special issue on Time and Information Retrieval
- Generalised Brown Clustering
- TIMEN - Community temporal expression normalisation
- TempEval-3 - A SemEval/SIGSEM task for evaluation of the state of the art in temporal information extraction
- TrendMiner, FP7 - Large-scale cross-lingual trend mining and summarisation
- CAVaT - Corpus Analysis and Validation for TimeML (comand-line tool)
- TERNIP - High-precision high-recall timex annotation and normalisation
- Temporal and Spatial Information Extraction reading group
Related links
- How not to review a paper: The tools and techniques of the adversarial reviewer.
- Some Applications of Information and Learning to Philosophy... Or, Barwise Inverse Relation Principle, Bayesian Surprise, Boosting, and Other Things that Begin with the Letter B
- A Few Useful Things to Know about Machine Learning
- Computational linguistics journals sorted by mean 5y/1y impact, Thomson JCR 2012. [updated Sep 2013]
- Training for peer reviewers, from the BMJ.
Quotes
"The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn." -- Alvin Toffler"All models are wrong, but some are useful." -- George Box
"Peer review is boosting with three weak learners. Apply salt liberally." -- LD
"Human knowledge is expressed in language. So computational linguistics is very important." -- Mark Steedman
Personal
- ORCID 0000-0002-8656-3431
- Leon's gift wishlist, on Amazon UK.
- I have an Erdos number of 4.