The University of Sheffield
Department of Computer Science

Teaching and courses

Natural Language Processing with Machine Learning - Innopolis University

This course gives a tour of data-intensive natural language processing (NLP) and machine learning (ML). We will study automated processing of the big data source that is social media, from an artificial intelligence perspective, using statistical machine learning (ML). The course will introduce the field, and demonstrate multiple complete practical examples in various real-world tasks. Covering basic structures and theory, the course will leave participants with knowledge of how to match machine learning tool with problem in the context of language processing. The intricacies and nuances involved in social media text are also covered, being a significantly different and more interesting text type than the prior research on newswire. Topics covered include entity recognition, sentiment analysis, discriminative and structured learning, and processing for indexing and retrieval. The machine learning skills required to complete the course are included (i.e. types of learner, representations, and evaluation).

Raw data - the course directory: teaching/inno/

1. Intro to AI, NLP, and ML

Day 1: NLP & AI basics; tokenisation and IR. practical exercise, data

Day 2: N-grams, language modelling, n-gram analysis, HMMs and PoS tagging.

Day 3: Word sense disambiguation and semantics.

Assignment: Search engine or Senseval-2 system. Due October 7. To get a B, build the system so it operates according to specification. To get an A, extend the system beyond the description.

2. Machine learning (classification, structured, unsupervised) and data mining

SVM, RNN, convolution, self-driving cars.

3. Advanced NLP

Word representations, profiling, social media streams, lie detection.

4. Applied advanced ML

Synthesising human commentary, neural dialogue, learning with annoying data, measuring accuracy and overfitting.

Natural language processing for social media - European conference of the Association for Computational Linguistics

Course reference page:

From a business and government point of view there is an increasing need to interpret and act upon information from large-volume, social media streams, such as Twitter, Facebook, and forum posts. While natural language processing from newswire has been very well studied in the past two decades, understanding social media content has only recently been addressed in NLP research.

Social media poses three major computational challenges, dubbed by Gartner the 3Vs of big data: volume, velocity, and variety. NLP methods, in particular, face further difficulties arising from the short, noisy, and strongly contextualised nature of social media. To address the 3Vs of social media, novel language technologies have emerged, e.g. using locality sensitive hashing to detect new stories in media streams (volume), predicting stock market movements from tweet sentiment (velocity), and recommending blogs and news articles based on users' own comments (variety).

The tutorial takes a detailed view of key NLP tasks (corpus annotation, linguistic pre-processing, information extraction and opinion mining) of social media content. After a short introduction to the challenges of processing social media, we will cover key NLP algorithms adapted to processing such content, discuss available evaluation datasets and outline remaining challenges.

Natural Language Processing for the Social Media - University of Szeged

Course reference page:

  • Szeged material: day 1 day 2
  • The University of Sheffield
    Western Bank
    Sheffield S10 2TN