RAU NLP
WEEK 2
Course Info + Contact
rau-nlp.github.io
goo.gl/74SKKF
goo.gl/BTEoBn
goo.gl/wBhydG #nlp
Who is missing?
Who did not join the mailing list, Telegram group or Slack channel?
Unix machines?
Linux or Mac
Python experience?
Linux or Mac
ML for NLP?
char-level RNN, word2vec, seq2seq
Other feedback?
How can we improve the lecture, website or communication?
HOMEWORK
norvig.com/spell-correct.html
How does it work?
Is it rules-based or is it learning?
How to improve it?
What are the pluses and minuses of Norvig’s approach?
Who is Norvig?
What is his contribution to Google? To natural language processing?
Is language structured?
THE STRUCTURE OF LANGUAGE
Raw: text
Sequences of chars, sequences of tokens, ...
Lexical: words
Language models, Zipfian distribution, stems and lemmata, n-grams...
Syntactic: phrases
Part-of-speech tags, syntax trees...
THE STRUCTURE OF LANGUAGE
Semantic: meaning
Sentiment, intents, ontologies...
Discourse: ...
???
How can we represent language numerically?
REPRESENTATIONS
Audio
Spectogram
[dense]
Images
Pixels
[dense]
Text
Word/sent/doc vec
[sparse]
[0, 0, 0.2, 0, 0, 0, 0.4, 0, 0, 0, 0.1, 0, … … ...]
If we use one-hot encoding to make every word a class...
REPRESENTATIONS
Audio
Spectogram
Images
Pixels
Text
Word/sent/doc vec
Why not?
How is English different than other languages?
Is English fundamentally easier?
break
WHAT ARE EXAMPLES OF NLP PROBLEMS?
NLP Problems
Building blocks
Applications
How are we progressing?
Industry
Datasets
Libs + APIs
Funding + Respect
aiindex.org/2017-report.pdf
AI Index
Opportunity
Can we have an impact in NLP?
Which areas have high barriers to entry?
ML + 3 languages + 3 scripts
RESEARCH HORIZON
Sentence vectors
Syntactic and semantic, not just averaging word vecs
Context resolution
Co-reference resolution across sentences
Mixed modes
Text, images and reasoning for tasks
HOMEWORK 2
Break a parser
1. Install spaCy
pip install spaCy, the Python library, including the data file for English or the language of your choice
2. Break it
Find an example where the parse is incorrect
3. Send it
Send the string and a screenshot to the email list, and explain what went wrong.
OFFICE HOURS
SATURDAY @ ISTC