- Social media processing
- Spatio-temporal information extraction
- Information retrieval
I prefer email: firstname.lastname@example.org
You may also find me on Twitter: @LeonDerczynski.
My Academic CV.
Dr. Leon Derczynski
Department of Computer Science
- In May 2016 I became permanent Research Fellow at the University of Sheffield. This is assistant professor level, with no teaching and not faculty.
- As of January 2016, I am on the editorial board of the Information Processing & Management journal.
- A PhD-level course in natural language processing and machine learning will be taught by me at Aarhus University during the third academic quarter.
- I will give a keynote talk at LT4VarDial 2015.
- My thesis will be adapted to a book with Springer, to appear in Studies in Computational Intelligence.
- Work with Martin Leginus on tag cloud generation and evaluation for social streams won the Best Student Paper at WEBIST!
- The special issue of IPM on Time and Information Retrieval has been published.
- 2nd: at the Interacting Minds Centre, Aarhus, Denmark.
- 10th: Handling and Mining Linguistic Variation in UGC, at the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects, Hissarya, Bulgaria. [slides]
- 9th: Efficient named entity annotation through pre-empting, at the conference on Recent Advances in Natural Language Processing, Hissarya, Bulgaria.
- 5th: NLP for Social Media, a tutorial at the conference on Recent Advances in Natural Language Processing, Hissarya, Bulgaria.
- 9th: GATE and Social Media, at the 8th GATE Text Mining Summer School, Sheffield, UK.
- 31st: Practical Annotation and Processing of Social Media with GATE, at the Extended Semantic Web Conference, Portoroz, Slovenia.
- Comrades (Co-I) - Collective Platform for Community Resilience and Social Innovation during Crises.
- Pheme (co-author) - finding and stopping rumours on the net. Press: BBC News, Times of London, Voice of Russia, Le Parisien, Berlingske, Bild
- uComp - Closing the AI gap with unconventional computation
- GATE - General Architecture for Text Engineering, for NLP in Java
- NLTK - The Natural Language ToolKit, for NLP in Python
See also my full publications list.
- Generalised Brown Clustering and Roll-up Feature Generation.
- in 30th AAAI Conference on Artificial Intelligence (AAAI) (full paper, acceptance rate 26%). [code] [sample data]
- Analysis of Named Entity Recognition and Linking for Tweets.
- in Information Processing and Management (IPM). [data]
- SemEval-2015 Task 6: Clinical TempEval.
- in Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015).
- Enhanced Information Access to Social Streams through Word Clouds with Entity Grouping.
- in Proceedings of the 11th International Conference on Web Information Systems and Technologies (WEBIST) (full paper, acceptance rate 14%). Winner of Best Student Paper award.
- Passive-Aggressive Sequence Labeling with Discriminative Post-Editing for Recognising Person Entities in Tweets.
- In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (short, acceptance rate 23.1%). [slides] [video]
- DKIE: Open Source Information Extraction for Danish.
- In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (demo).
- The GATE Crowdsourcing Plugin: Crowdsourcing Annotated Corpora Made Easy.
- In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2014) (demo). [docs] [code in GATE]
- Information Retrieval for Temporal Bounding.
- In Proceedings of the 4th International Conference on the Theory of Information Retrieval (ICTIR 2013) (acceptance rate 22%). [poster] [bib]
- Temporal Signals Help Label Temporal Relations.
- In Proceedings of the 51st meeting of the Association for Computational Linguistics (ACL 2013) (short, acceptance rate 24%). [bib] [poster]
- SemEval-2013 Task 1: TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations.
- In Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013). [slides] [bib]
- Microblog-Genre Noise and Impact on Semantic Annotation Accuracy.
- In Proceedings of the 24th ACM Conference on Hypertext and Social Media (HT 2013) (acceptance rate 16.7%). Shortlisted for SIGWEB Ted Nelson award. [slides] [bib]
- Towards Context-Aware Search and Analysis on Social Media Data.
- In Proceedings of the 16th Conference on Extending Database Technology (EDBT 2013) (track acceptance rate 13.6%). [slides] [bib]
- TIMEN: An Open Temporal Expression Normalization Resource.
- In Proceedings of the 8th Conference on International Language Resources and Evaluation (LREC 2012). [code] [slides] [bib]
- Google Scholar: Leon Derczynski
- DBLP: Leon Derczynski
- ACM Digital Library: Leon Derczynski (ACM pubs only, laggy)
Publications listed elsewhere
Awards and Grants
- 2016: Comrades, H2020 project, 36 months (co-investigator); €2.0m.
- 2015: Martin Leginus wins WEBIST Best Student Paper award, co-authored and supervised.
- 2014: University of Sheffield Exceptional Contribution Award (4% award rate).
- 2013: University of Sheffield Exceptional Contribution Award (4% award rate).
- 2013: Pheme, FP7 project, 36 months (co-author, named investigator); €4.3m.
- 2013: Shortlisted for Ted Nelson ACM SIGWEB prize.
- 2008: EPSRC Enhanced Doctoral Training Grant.
- 2008: Douglas Lewin prize for best final-year exam performance (equiv. dux litterarum).
- 2006, 2007: Nanjing University of Aeronautics and Astronautics' Best English language teacher.
- RCV corpus generalised Brown clusters, a/c=2560, min occur=1; with merge file rcv.a2560.tar.bz2
- Guide to tuning Brown clustering (including many sample clusterings)
- GHA Brown clusters: m=2000, 250M tokens of tweets. gha.250M-c2000.tar
- TB_sig: revised and extended TimeBank, including extra signals, and any events/links associated with them. tb_sig.tar.bz2
- Bootstrapped microblog part-of-speech dataset, 1.5M tokens. twitter_bootstrap_corpus.tar.gz
- Twitter entity linking dataset (and NE splits). ipm_nel.tar.gz
- High-accuracy Penn Treebank Twitter PoS tagger.
- Named Temporal Expressions: list, tools, annotations. named_timex.tar.bz2
- Szeged material: day 1 day 2
- Uppsala material: day 1
- Generalised Brown Clustering (incl. wcluster with working --threads option)
- TIMEN - Community temporal expression normalisation
- TempEval-3 - A SemEval/SIGSEM task for evaluation of the state of the art in temporal information extraction
- TrendMiner - Extracting and predicting trends from social media
- CAVaT - Corpus Analysis and Validation for TimeML (comand-line tool)
- TERNIP - High-precision high-recall timex annotation and normalisation
- Temporal and Spatial Information Extraction reading group
- How not to review a paper: The tools and techniques of the adversarial reviewer.
- Some Applications of Information and Learning to Philosophy... Or, Barwise Inverse Relation Principle, Bayesian Surprise, Boosting, and Other Things that Begin with the Letter B
- A Few Useful Things to Know about Machine Learning
- Computational linguistics journals sorted by mean 5y/1y impact, Thomson JCR 2012. [updated Sep 2013]
- Training for peer reviewers, from the BMJ.
Quotes"The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn." -- Alvin Toffler
"All models are wrong, but some are useful." -- George Box
"Peer review is boosting with three weak learners. Apply salt liberally." -- LD
"Human knowledge is expressed in language. So computational linguistics is very important." -- Mark Steedman