1 of 99

CS 6120

Lecture 1: Introduction to Natural Language Processing

2 of 99

Natural Language Processing

Natural Language - the most compact and efficient way to convey an idea to one another. Distills the salient information, discarding the irrelevant ones. Leverages our knowledge in order to provide full comprehension.��
Processing - in order to effect a desired outcome for a specific purpose

“Two eggs on a plate”

2400000 Bytes (Compressed, JPG)

50000 Bytes (Compressed, MP3)

18 Bytes (Uncompressed)

3 of 99

Adoption Rates of ChatGPT in the United States

Usage Stats for ChatGPT

4 of 99

Natural Language Processing�

Section 0: A brief introduction to the course

Welcome and hello!
Applications and industries

Section 1: Administrative and logistics

Section 2: A lab to get you started

Section 3: Some historical perspectives

5 of 99

What is NLP? Why study language and automate it?

The most direct and compact representation of information, intended for communication and conceptual understanding�
Typically translate all forms of modality into linguistic constructs

Computer programs that analyze, understand and generate (in)formal human language

6 of 99

What can NLP do for you?

Why are you taking this course?�
Question & Answer
Machine Translation
Content Understanding
Text Summarization
Linguistic Analysis
Note Taking
Fraud and Cheating
Topic Discovery

primer.ai

7 of 99

What can you do for NLP?

Fundamental research traditionally led by Universities due to openness, collaboration, and a focus on the long term (10+ years).�
Industry contributing significant & ground-breaking advances in research, e.g., in Large Language Modeling, changing philosophies to open source, collaboration through funding, shorter horizons to application.

Due to resourcing and scale, they are more in the spotlight and better positioned.

primer.ai

watsonx.ai

8 of 99

Verticals that use NLP extensively

E-Commerce

Healthcare

Education

9 of 99

Majors and fields of study

What is your focus?�
Computational Linguistics
Recommendation Sciences
Information Retrieval
Data Mining
Applied Machine Learning
Applied Research

10 of 99

Computational linguistics - a case study on marketability

Computational linguistics (CL) is what powers anything in a machine or device that has to do with language—speaking, writing, reading, and listening. It is often linked with natural language processing (NLP), which is a subset of CL.

The usual tech giants: Google (including the NLP research group), Microsoft (including the NLP research group in Redmond), Verizon Media, Apple, etc.
Alelo, Appen Butler Hill, Chenope, Decooda, Expert System, Facebook, Intel, Lingsoft, Lionbridge, Mit re, Corporation, MultiLingual, North Side, Nuance, Oracle, SDL, SRI International, Systran, Vantage Linguistics, Verilogue, Voiceweb, VoxGen

11 of 99

On the topic of chatbots

12 of 99

Why is this so hard?

Ambiguous
Dialects
Accents
Listener
Humor, sarcasm, irony
Context, dependencies

13 of 99

Natural Language Processing�

Section 0: A brief introduction to the course

Section 1: Administrative and logistics

Course objectives
Syllabus and what you’ll learn
Keynote reading as part of your grading
Grading, homeworks, exams
Your open project and groups

Section 2: A lab to get you started

Section 3: Some historical perspectives

14 of 99

Welcome!

Karl Ni. Etsy, Google, LL / MIT 2009, UCSD 2008, UC Berkeley 2002��

��

How about you? Align? Industry? PT/FT Students? Taken any ML?

15 of 99

Survey of Courses Taken

What are the courses that you have taken?�

Course Title	NEU Course Number
Machine Learning	CS6140
Deep Learning	CS 7150
(Advanced) Algorithms	CS5800 / CS7800

16 of 99

Bella Chen: NLP Teaching Assistant

Bella
LCS -> UMN -> NEU
MSCS Align - Khoury College of Computer Science
Interests - Tennis, Skating, Dancing, Piano

17 of 99

Joy (Hsin-Yu) Guo: NLP Teaching Assistant

BS + MS in Molecular / Biochemistry, UCSD

Environmental remediation techniques / toxicity prevention in crops.
Investigated inhibitory responses using implanted electrodes to explore neural activity
DS/ ML intern at a biomedical informatics lab

3rd year MSCS Align
Climbing, reading, plants/ gardening

18 of 99

Raman: NLP Teaching Assistant

Raman
Bachelors in Mathematics And Computing from DTU, India
MSCS - Khoury College of Computer Science
Interests - Playing Cricket, Debating, Getting & Making Tattoos

19 of 99

About this course

This class is an elective (wonderful to see the interest!)

The field is rapidly advancing

Class is more heavily paper driven
Recent advances being leveraged in practice are not in books

ML is an advanced class: this is a very advanced class

20 of 99

Where can you find our material

Our class website (CS6120) is at:

https://course.ccs.neu.edu/cs6120s25

You can find our syllabus, reading, homeworks, project templates, etc. there.

21 of 99

Course Format

You will do well in this class if:�

Start on the homework early and come to TA office hours about any Q’s
You do the relevant reading and come to class with questions about it
During class, you work on practical engineering lab for skills surrounding NLP
Finish your project to create a single NLP delivery

Some suggestions to do excellent in the class and beyond:

Replicate papers in the homeworks and during TA office hours
Start on homeworks early and preview the lectures

22 of 99

Class Artifacts

Course Objectives�
Required Keynote Papers�
Project and Its Rubric�
In-Class Labs and Homework

23 of 99

Objectives of the course

By the end of this course, you will build competencies in your:

knowledge-base

implementation fluency

industry/academic skill

24 of 99

By the end of this course, you will have …

… built your foundational skills

You’ll have

intuition with first-hand experience of the inner workings of objectives, the building blocks of DNN architectures, and gradients in backpropagation�
literacy to understand over 80% of the modern papers in the area of natural language processing

the ability to teach yourself new skills

25 of 99

By the end of this course, you will have …

… improved your fluency and practical knowledge

You’ll be able to code with velocity by

leveraging the latest toolboxes, e.g., PyTorch and Tensorflow or common Python ones �
being able to debug NLP algorithms and recognize pitfalls and identify when things go wrong�
understanding where algorithms are useful with real-world practical applications

26 of 99

By the end of this course, you would have …

… built your accomplishments and resume

Your track record will either have:

a marketable product that can be scaled to real-traffic and with common PROD instruments�
contributions to the state of the art with a novel and impactful approach

(Expectations are that 90% of you would choose the product project vs the academic contribution)

27 of 99

You will do this by:

Keynote Papers - Build literacy and the ability to self-teach�
Homeworks and Labs - Practice with implementation and debugging�
Project - Build your real-world experience and track record

knowledge-base

implementation fluency

industry/academic skill

28 of 99

On Practice-Oriented Problems

Structured towards industry practice of using Natural Language Processing

Productionizing NLP (Lec 1,15 and Labs)
Named entity recognition (Lec 3-4)
Sentiment analysis (Lec 3, 6, 7, 12-14)
Group documents by topics (Lec 5)
Retrieving information from corpora (Lec 3-4, 8-14)
Machine Translation (Lec 8-14)
Text Summarization, e.g., for Reviews / Paragraphs (Lec 5, 10-14)
(Chatbots) Following instructions, e.g., How do you code with LLMs? (Lec 12-14)
(Chatbots) Ensuring compliance with your use case, e.g., “Tell me how to hack into neighbors WiFi?” (Lec 12-14)

Representations of text (Lec 6, 7)
Salient parts of text (Lec. 8, 9)
LLMs in and for practice (Lec. 12-14)

29 of 99

On Practice-Oriented Problems

Structured towards industry practice of using Natural Language Processing

Lectures build towards modern Large Language Modeling (LLM)

The origins and theory of modeling language
(Pre)processing language in practice
Light approaches (non-DNN) to modeling text
Neural Network fundamental building blocks (e.g.,embeddings & attention)
Heavy towards leveraging & gearing LLMs towards practice

Summarization (e.g., for Reviews / Paragraphs)
Following instructions (e.g., How do you code with LLMs?)
Ensuring compliance with your use case (e.g., “Tell me how to hack into neighbors WiFi?”)
Using them for information retrieval / reading text for you

30 of 99

Class Artifacts

Course Objectives�
Required Keynote Papers�
Project and Its Rubric�
In-Class Labs and Homework

31 of 99

Literature and Reading

Suggested Reading:

Speech and Language Processing, Jurafsky & Martin

Required Keynote Reading:

Will be math heavy, but well-resourced
Not expected to know the math off the bat
By the end of the semester, you should!

Open source, data proliferation and compute ⇒ �this field moves faster than textbooks can keep up

32 of 99

What are required “keynote papers”

Papers that are core curricula to this course are “keynote papers”

They are the most widely read and cited in the community
They are papers that are considered the fundamental elements of NLP
Comprehensively, the papers will make you fluent to read most others�

Signup Sheet Available Here

Later weeks may often be easier than earlier weeks. Difficulty is not ordered.

33 of 99

Use your resources! There are so many of them

Difficulty in Reading

Several attached resources, but these aren’t comprehensive
Because they’re fundamental elements, they are very well resourced

Use YouTube videos, blogs, and ChatGPT for understanding

34 of 99

Keynote Reading Roles

Discussion Leads (Facilitator): It is a bringing up interesting perspectives. Creating questions for your audience when there are none. There will be three facilitators. At the beginning of the lecture, you’ll each give your interpretation of the paper, salient points, and then start leading the discussion. You can use slides if you’d like. Discussion is expected to be around 30 minutes.

Scribe: Somewhat more labor intensive, but less in the limelight. Capture the topics and interesting notes. The objective is to provide anyone who wasn’t there a perspective on the discussion. ��It is not a summary of the paper (though there should be one at the beginning). It is a rundown of the oral discussion. The classroom should have read the paper.�
Summary: A layman’s summary of the paper. This should your interpretation to an easy-to-understand summarization of the paper.�
Classroom: Come with your questions.

35 of 99

How will we learn and discuss keynote papers?

Signup sheet (under readings) - on the class website��
Facilitator, Summary, and Scribe Notes Signup
Feel free to organize ahead of the class

Fill out your name as facilitator

Fill out your name as a scribe

36 of 99

Expectations for everyone in the classroom

Everyone is expected to have read the paper�
You are not expected to understand everything in the paper�
Everyone is expected to contribute to the conversation

Come with questions
Come with thoughts and interesting perspectives
Come with opinions
Come with what helped you understand the topic more

37 of 99

What to Read and How

Required keynote reading for the next week�

Videos for keynote paper

Blogs for keynote paper

Topic in the next week related to keynote paper

38 of 99

Current list of keynote papers

Week	Paper	Notes	Summary	Lead 1
6	Topic Modeling - Latent Dirichlet Allocation (Blei et al)	Name & Link	Name & Link	Name & Link
7	Distributed Representation of Words and Phrases and their Compositionality. (Mikolov et al)	Name & Link	Name & Link	Name & Link
8	A Survey of LLMs Including ChatGPT and GPT-4	Name & Link	Name & Link	Name & Link
9	Long Short-Term Memory (Hochreiter & Schmidhuber)	Name & Link	Name & Link	Name & Link
9	On the Difficulty of Training RNNs (Pascanu et al)	Name & Link	Name & Link	Name & Link
11	Learning Text Similarity with Siamese Recurrent Networks	Name & Link	Name & Link	Name & Link
12	Attention is All You Need (Vaswani et al)	Name & Link	Name & Link	Name & Link
12	BERT: Pre-training of Bidirectional Transformers for Language Understanding (Devlin et al)	Name & Link	Name & Link	Name & Link
13	GPT-4 Technical Report (OpenAI)	Name & Link	Name & Link	Name & Link
13	Training Language Models to Follow Instructions with Human Feedback (OpenAI)	Name & Link	Name & Link	Name & Link
14	Retrieval Augmented Generation for Knowledge Intensive NLP Tasks (Lewis et al)	Name & Link	Name & Link	Name & Link

39 of 99

Other things you can do to better understand the paper

Replicate the results (help in TA office hours)�
Explain it to yourself (maybe record it and make a vlog?)�
Summarize it in a blog post (there are so many; it’s why it’s easy to find resources)�
Discuss it with a friend before or after class

40 of 99

The scribe role

LaTeX template available here

Document is due a week after conversation

Purpose: absent students and prepares everyone; creates record on discussion

41 of 99

What’s included in the scribe notes?

A brief summary of the paper (in your own words)
A brief overview / summary of the conversation topics�
Detailed discussion notes

Oftentimes is just the flow of conversation�
If there’s something really interesting, feel�free to add here. Sometimes I get surprised�too, even after reading the paper several�times.

42 of 99

Class Artifacts

Course Objectives�
Required Keynote Papers�
Project and Its Rubric�
In-Class Labs and Homework

43 of 99

Two options for your project

Fully Packaged�NLP Product�

Works in real-time�
Front and back end

Can leverage cloud resources�
Containerized and can be run by TA�
Novel in some way (i.e., your data, etc.)

Academic Contribution��

Novel contribution�
Replicable research�
Implementation & comparisons to SotA�
Reviewed by TA / Instructor�
Submitted to EMNLP

Product Delivery to be Scaled

Academic Paper to be Submitted

44 of 99

Grade Breakdown - Traditional Project

Option 1: Business App

45 of 99

Grade Breakdown - Paper Project

Option 2: Paper

46 of 99

Class Project - Route 1

Option 1: Finished Product with Front End

Supplied: Streamlit templates in class labs
Requires:

Readme.md, includes how to replicate / run algorithms
Dockerized container for platform independent engineering
Technical documentation that clearly states principles leveraged from class
Code that is ChatGPT-free, run through automated checking

Advantage: 20% of grade, and does not require additional work beyond that taught in class
Disadvantage: Not a single shot to get an A in the class�

Easily achievable with front end examples in in-class laboratories�
Focus isn’t on novelty, but should have some elements of it (e.g., new data, new business use case, etc.)

Option 1: Business App

47 of 99

Example projects with front ends

Option 1: Business App

48 of 99

Class Project: Route 2

Option 2: Submitted academic paper submitted to EMNLP

Requires:

Replication of state of the art (hint: lookup papers with code)
Either novel data, application, or theory (hint: can be built off existing technology)
Compelling arguments for/against in format and with sufficient references

Supplied: Paper template and past articles
Advantage: Automatic “A” without a final, midterm. Deeper understanding of concepts. More support & attention from TA’s and Professor
Disadvantage: Significantly more work�

Difficult: significant amount of work, but with high pay-off

Focus is on novelty, but should have some justification of its importance

Option 2: Paper

49 of 99

Course Project(s)

Containerized application that can be scaled into production�
Criteria -

Must be replicable and containerized
Must be functional and operating with expected parameters
Must be robust to all inputs�

Deliverables

Technical report with README.md, and Github Repository with Docker environment

Option 1: Business App

50 of 99

Course Project(s)

Academic paper�
Criteria

References to recent literature
Novelty in your contribution, either through training methods, architectures, objectives, or other
Replicated state of the art and comparisons to it
The ability to replicate your work (e.g., on Papers with Code)�

Deliverables

Paper on ArXiv, Github Code with README.md

Option 2: Paper

51 of 99

Conferences

EMNLP Quality or Greater�
Other Conferences: WWW, Recsys, AACL, NeurIPS, ICML, etc.

Option 2: Paper

52 of 99

Class Artifacts

Course Objectives�
Required Keynote Papers�
Project and Its Rubric�
In-Class Labs and Homework

53 of 99

Homeworks and Labs

Random presentation of homework�
Labs in class: completed before the next lecture (time in class devoted)

54 of 99

Natural Language Processing�

Section 0: A brief introduction to the course

Section 1: Administrative and logistics

Section 2: A lab to get you started

Section 3: Some historical perspectives

55 of 99

Some Available Tools

LaTeX

Leverage for your scribe notes, homeworks and your project
Some tutorials with Overleaf at their site

Google Colab

Leverage for your daily labs and homework
Google’s Colab with NLTK

Google Cloud Platform

Leverage for your project
Console Dashboard

56 of 99

LaTeX and Overleaf

57 of 99

Benefits of using overleaf.com

Use templates for styles that make your documents look nice�
Equations can be coded up easily, and quickly�
You’re just a google search away from a nice looking document

58 of 99

Joint editing session

Get the link and let’s try some equations

Try some equations

\begin{equation}

X \in \mathbb{R}^{10}

\end{equation}

59 of 99

Google Colab

Granted $50 for the Fall; starting 9/1 until 2025

Course Start Date: 9/1/2024

Students’ email domain(s): @northeastern.edu, @ccs.neu.edu

Students can request coupons from the URL and redeem them until: 6/1/2025

Coupons Valid Through: 1/1/2025

Number of Coupons: 30

Face Value of Coupon(s): USD 50.00

Make sure you close instances & ensure you aren’t incurring costs!

60 of 99

Check how many credits you have

61 of 99

Natural Language Processing�

Section 0: A brief introduction to the course

Section 1: Administrative and logistics

Section 2: A lab to get you started

Section 3: Some historical perspectives

62 of 99

Where are we today?

We're currently in one of the longest periods of sustained interest in AI in history because:�

Today's distributed systems dwarf the computing power of the past and
there are vast troves of training data on which AI systems can cut their teeth

Many doubt AI's ability to pass the Turing Test and prove its ability to create systems that imitate human intelligence and behavior.

It's still an open question how far the technology can go…and how far you can push it.

63 of 99

How we got here?

AI “Summers” and “Winters”: comes down to $$�
Long term and fundamental research: typically government

Department of Defense grants
DARPA provides University grants
Has led to the founding of many
Criticisms are: beltway bandits

Neural Networks for NLP co-adapted with Neural Networks from other Domains�

64 of 99

Section 3: Some History and Where it Pertains to You

History of the Applications of NLP�
Timelines of Modern Technological Advances in NLP�
Applied Natural Language Processing Our Focus

65 of 99

A Coarse Timeline

1916	de Saussure develops General Lingustics Course
1950	Turing writes Computing machinery and Intelligence
1954	Georgetown experiments
1966	ELIZA is the first chatbot
1975	The First AI Winter
1980	If/Else computing revives AI research
1987	The Second AI Winter
1990	Statistical methods take the community by storm. SVMs are developed and become popular. VC dimension established
2001	The first neural language model is built
2012	ImageNet makes deep learning the de facto AI poster child
2013	Mikolov writes word2vec and uses skipgrams
2014	Sutskever writes about sequence-sequence models, popularizing RNNs
2015	Attention modeling is introduced
2017	Google creates Transformer neural networks
2019	OpenAI becomes for-profit enterprise
2022	OpenAI releases ChatGPT
2024	LLMs proliferate throughout the world

66 of 99

Almost Ending Before It Started (1916)

Course in General Lingusitics

Ferdinand de Saussure - Developed foundation for modeling language as systems (structural linguistics)�
Died in 1913 before publishing work�
Albert Sechehaye and Charles Bally (colleagues) - gathered notes of his students, wrote the book: Cours de Linguistique Générale, evolving into NLP. �
Published in 1916

From 1906 to 1912, Ferdinand de Saussure taught Indo-European linguistics, general linguistics, and Sanskrit at the University of Geneva. During this time he developed the foundation for a highly functional model of languages as systems.

Then, in 1913, he died, before organizing and publishing his work.

Fortunately, Albert Sechehaye and Charles Bally, two instructors who were also Saussure’s colleagues, recognized the potential of his concepts and decided they were important enough to save. The two instructors collected his notes for his future manuscript, and then made the effort to gather the notes of Saussure’s students. Based on these, they wrote Saussere’s book, titled Cours de Linguistique Générale (translated to Language as a Science, which eventually evolved into natural language processing (NLP), which was published in 1916.

From 1906 to 1912, Ferdinand de Saussure taught Indo-European linguistics, general linguistics, and Sanskrit at the University of Geneva. During this time he developed the foundation for a highly functional model of languages as systems.

Then, in 1913, he died, before organizing and publishing his work.

Fortunately, Albert Sechehaye and Charles Bally, two instructors who were also Saussure’s colleagues, recognized the potential of his concepts and decided they were important enough to save. The two instructors collected his notes for his future manuscript, and then made the effort to gather the notes of Saussure’s students. Based on these, they wrote Saussere’s book, titled Cours de Linguistique Générale (translated to Language as a Science, which eventually evolved into natural language processing (NLP), which was published in 1916.

Saussure's most influential work, Course in General Linguistics (Cours de linguistique générale), was published posthumously in 1916 by former students Charles Bally and Albert Sechehaye, based on notes taken from Saussure's lectures in Geneva.[33] The Course became one of the seminal linguistics works of the 20th century not primarily for the content (many of the ideas had been anticipated in the works of other 20th-century linguists) but for the innovative approach that Saussure applied in discussing linguistic phenomena.

Its central notion is that language may be analyzed as a formal system of differential elements, apart from the messy dialectics of real-time production and comprehension. Examples of these elements include his notion of the linguistic sign, which is composed of the signifier and the signified. Though the sign may also have a referent, Saussure took that to lie beyond the linguist's purview.[34]

Throughout the book, he stated that a linguist can develop a diachronic analysis of a text or theory of language but must learn just as much or more about the language/text as it exists at any moment in time (i.e. "synchronically"): "Language is a system of signs that expresses ideas". A science that studies the life of signs within society and is a part of social and general psychology. Saussure believed that semiotics is concerned with everything that can be taken as a sign, and he called it semiology

Maybe one of the only cases of being quoted but hardly publishing any work:�Ferdinand de Saussure is one of the world's most quoted linguists, which is remarkable as he hardly published anything during his lifetime. Even his few scientific articles are not unproblematic. Thus, for example, his publication on Lithuanian phonetics[19] is mostly taken from studies by the Lithuanian researcher Friedrich Kurschat, with whom Saussure traveled through Lithuania in August 1880 for two weeks and whose (German) books Saussure had read.[20] Saussure, who had studied some basic grammar of Lithuanian in Leipzig for one semester but was unable to speak the language, was thus dependent on Kurschat.

67 of 99

The Turing Test (1950)

The Imitation Game, a.k.a., the Turing Test

Alan Turing

Cryptics and codebreaking during World War II
Father of modern computing through the Turing Machine

Wrote 1950 Paper: Computing machinery and �Intelligence

Opening: “I propose to consider the question, �‘can machines think’”
Becomes the philosophy of modern artificial intelligence
Develops the Turing Test

Alan Turing revolutionized computing in several key ways:

The Turing Machine: This theoretical model of computation, proposed in 1936, laid the foundation for modern computer science.

It demonstrated that any mathematical computation can be performed by a simple machine following a set of rules.
This concept proved fundamental to the development of actual computers.

Breaking Enigma: During World War II, Turing played a crucial role in cracking the German Enigma code.

His work at Bletchley Park helped the Allies gain a significant intelligence advantage, contributing to the eventual victory.
This demonstrated the power of computational thinking for solving complex problems.

Artificial Intelligence: Turing explored the concept of artificial intelligence.

He proposed the Turing Test, a method for determining whether a machine can exhibit intelligent behavior indistinguishable from a human.
This laid the groundwork for the field of AI research.

Early Computer Design: Turing contributed to the design of early computers, such as the ACE (Automatic Computing Engine).

His ideas influenced the development of stored-program computers, where both data and instructions are stored in the computer's memory.

68 of 99

The Georgetown / IBM Experiments (1954)

“Within three or five years, machine translation will be a solved problem”

Purpose: attract governmental and public interest and funding by showing the possibilities of machine translation

The Georgetown–IBM experiment was an influential demonstration of machine translation, which was performed on January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into English.[1][2]

Conceived and performed primarily in order to attract governmental and public interest and funding by showing the possibilities of machine translation, it was by no means a fully featured system: It had only six grammar rules and 250 lexical items in its vocabulary (of stems and endings).[3] Words in the vocabulary were in the fields of politics, law, mathematics, chemistry, metallurgy, communications and military affairs. Vocabulary was punched onto punch cards.[4] This complete dictionary was never fully shown (only the extended one from Garvin's article). Apart from general topics, the system was specialized in the domain of organic chemistry.[citation needed] The translation was carried out using an IBM 701[4] mainframe computer (launched in April 1953).

69 of 99

The First Chatbot: ELIZA (1960s)

The Rogerian Arguments and Psychology

In 1958, the programming language LISP (Locator/Identifier Separation Protocol), a computer language still in use today, was released by John McCarthy. In 1964, ELIZA, a “typewritten” comment and response process, designed to imitate a psychiatrist using reflection techniques, was developed. (It did this by rearranging sentences and following relatively simple grammar rules, but there was no understanding on the computer’s part.) Also in 1964, the U.S. National Research Council (NRC) created the Automatic Language Processing Advisory Committee, or ALPAC, for short. This committee was tasked with evaluating the progress of natural language processing research.

https://www.dataversity.net/a-brief-history-of-large-language-models/

ELIZA Uses Natural Language Programming

In 1966, an MIT computer scientist, Joseph Weizenbaum, developed ELIZA, which is described as the first program using NLP. It could identify keywords from the input it received, and respond with a pre-programmed answer.

Weizenbaum was attempting to prove his assumption that the communications between humans and machines were fundamentally superficial, but things didn’t work out as planned. To simplify the experiment and minimize disputes, Weizenbaum developed a program using “active listening,” which did not require a database storing real-world information, but would reflect back a person’s statements to carry the conversation forward.

He was surprised (perhaps even horrified) that people, including Weizenbaum’s own secretary, described the computer program as having human-like feelings. Weizenbaum wrote: “My secretary, who had watched me work on the program for many months and therefore surely knew it to be merely a computer program, started conversing with it. After only a few interactions with it, she asked me to leave the room.” He later added, “I had not realized … that extremely short exposures to a relatively simple computer program could induce powerful delusional thinking in quite normal people.”

The original version of ELIZA recently became open-source and is available here.

70 of 99

The AI Winters

AI Winters - A Chill in the AI Enthusiasm��An AI winter refers to a period of reduced funding and interest in artificial intelligence (AI) research.

Causes of AI Winter

Unrealistic expectations: When AI systems fail to live up to exaggerated claims
Lack of tangible results: If AI research doesn't produce practical applications or significant breakthroughs
Economic downturns: General economic recessions can reduce funding for all research areas, including AI.

71 of 99

The Three Booms

72 of 99

The Advent of the First AI Winter (1974-1980)

1969: Mansfield Amendment limited military funding for research that lacked a direct or apparent relationship to a specific military function.�
1973: James Lighthill for British Science Research Council��“In no part of the field have discoveries made so far produced the major impact that was then promised.”�
DARPA withdrew its funding from many companies, including CMU’s $3M of annual grants for speech.

Lighthill Report

In 1966, the NRC and ALPAC initiated the first AI and NLP stoppage, by halting the funding of research on natural language processing and machine translation. After 12 years of research, and $20 million, machine translations were still more expensive than manual human translations, and there were still no computers that came anywhere near being able to carry on a basic conversation. In 1966, artificial intelligence and natural language processing (NLP) research was considered a dead end by many (though not all).

The first AI winter occurred from 1974 to 1980, triggered by a combination of factors including the publication of the "Lighthill Report," which criticized the overly ambitious promises of AI research, and the subsequent withdrawal of funding by major agencies like DARPA in the United States and similar actions in the United Kingdom. This period saw a significant decline in AI research activities as the initial excitement generated in the 1950s and 1960s waned.

73 of 99

Revival (1980s)

1980: Prof. McDermott writes XCON, an AI-powered expert system program saves Digital Equipment Corporation (DEC) 25 million dollars each year. �
1981: The Japanese government invested hundreds of millions of dollars in projects aimed at making rapid leaps in AI. �
1982: John Hopkins proved how a neural network could ‘learn.’ Geoffrey Hinton and David Rumelhart created backpropagation, reviving the field of connectionism.

Bullets from: https://www.appypie.com/history-of-artificial-intelligence

At the heart of the commercialization of AI were expert systems. These systems were handcrafted by surveying experts and creating “if-then” rule sets accordingly. This method is called the “top-down” approach to AI with many believing that expert knowledge was the best way to create AI. These systems were implemented in fields like financial planning, medical diagnosis, geological exploration, and microelectronic circuit design.

This massive real-world success of AI was instrumental in bringing back a sense of positivity. XCON worked by asking salespersons a series of questions and helped prepare accurate orders, mitigating risks of ordering wrong spare parts and cables.
This was a time where the Americans and the Britishers, were witnessing the Japanese take over the automotive and consumer electronics industries. They wanted to ensure that the computing industry didn’t follow suit.

—-------------------------------------------------------

It took nearly 14 years (until 1980) for natural language processes and artificial intelligence research to recover from the broken expectations created by extreme enthusiasts. In some ways, the AI stoppage had initiated a new phase of fresh ideas, with earlier concepts of machine translation being abandoned, and new ideas promoting new research, including expert systems. The mixing of linguistics and statistics, which had been popular in early NLP research, was replaced with a theme of pure statistics. The 1980s initiated a fundamental reorientation, with simple approximations replacing deep analysis, and the evaluation process becoming more rigorous.

—-------------------------------------------------------

Until the 1980s, the majority of NLP systems used complex, “handwritten” rules. But in the late 1980s, a revolution in NLP came about. This was the result of both the steady increase of computational power, and the shift to Machine Learning algorithms. While some of the early machine learning algorithms (decision trees provide a good example) produced systems similar to the old-school handwritten rules, research has increasingly focused on statistical models. These statistical models are capable of making soft, probabilistic decisions. Throughout the 1980s, IBM was responsible for the development of several successful, complicated statistical models.

74 of 99

The Second AI Winter

1984: John McCarthy criticized expert systems - lack of common sense and their inability to understand their own limitations.

1987: Apple and IBM producing general purpose computers & solving more real-world problems … much cheaper than any of the expensive AI-based systems.

John McCarthy

Late 1980s: DARPA and Strategic Computing Initiative cut AI �funding - did not trust the technology’s capability to deliver results.

DARPA Director Schwarz – “… very limited success in particular areas, followed immediately by failure to reach the broader goal at which these initial successes seem at first to hint…”.�

By 1991: Japan’s Fifth Generation Computer project had finished 10 years, spent $400 million, but hadn’t met even one of the original expectations of the project.

75 of 99

Intelligent Agents and Statistical Methods (1990s)

Statistical models for NLP analyses rose dramatically�
N-Grams are useful for clumps of linguistic data�
1997 LSTM RNNs are introduced (later 2007 found their niche)

https://ayubalam.medium.com/the-history-of-nlp-natural-language-processing-1495f65bc20

In the 1990s, the popularity of statistical models for natural language processes analyses rose dramatically. The pure statistics NLP methods have become remarkably valuable in keeping pace with the tremendous flow of online text. N-Grams have become useful, recognizing and tracking clumps of linguistic data, numerically. In 1997, LSTM recurrent neural net (RNN) models were introduced, and found their niche in 2007 for voice and text processing. Currently, neural net models are considered the cutting edge of research and development in the NLP’s understanding of text and speech generation.

https://www.techtarget.com/searchenterpriseai/definition/AI-winter#:~:text=This%20ushered%20in%20the%20first,then%2C%20rule%2Dbased%20reasoning.

76 of 99

Business Applications of NLP (2011)

SRI International (now unaffiliated with Stanford)

2007, SRI spins of Siri, Inc., raises $24M
2010, Apple acquires Siri, Inc.
2011, Siri becomes a feature in Apple iPhone 4S

77 of 99

Section 3: Some History and Where it Pertains to You

Prehistoric Natural Language Processing�
Natural Language Processing: New Timeline�
Applied Natural Language Processing Our Focus

78 of 99

Modern NLP Approaches (2000+)

79 of 99

The First Neural “Language” Model (2001)

Look up vector representations of n previous words (looked up in C)�
Embeddings concatenated and passed into DNN.�
Final output into prediction layer for the next word. .

80 of 99

Multi-task Learning (2008)

Sharing

81 of 99

Word Embeddings (2013)

Deep conv neural networks for images (2012) - takes the world by storm
Google - Mikolov creates word embeddings, terms it a DNN approach (2013)�
Derived technologies (i.e., negative sampling) find their way in optimization in modern information retrieval and other applications

82 of 99

Sequence to Sequence Modeling (2014)

Regains its footing for:

Translation�
Speech to Text�
Most all longer-term NLP

83 of 99

Attention Modeling

84 of 99

Wholesale Investments in Neural Language Processing (2015)

A major drawback of statistical methods is that they require elaborate feature engineering. Since 2015,[22] the statistical approach was replaced by the neural networks approach, using semantic networks[23] and word embeddings to capture semantic properties of words.

85 of 99

The Transformer (2017)

Attention is All You Need�
First innovation that started with NLP modeling�
Google and the BERT model�
Almost every modern neural network model

86 of 99

Transformer Modeling Engagements (2017+)

87 of 99

Pre-Trained Language Models

Pretrained language models: These methods use representations from language models for transfer learning.

88 of 99

OpenAI releases ChatGPT (2018)

While OpenAI is founded in 2015, it becomes for-profit in 2019�
In 2022, OpenAI creates ChatGPT�
In Rapid Succession, the GPT 3.5, GPT 4, GPT 4o, GPT o1 �
GPT-4o Advances and Multimodal Responses

89 of 99

Anthropic

Claud, 2023

Anthropic, Notion, Quora,

Claud 2

Context window size 9k → 100k

Claud 2.1
Claud 3, Haiku, Sonnet, & Opus (default)

Opus (default) - context window size 2k, expandable to 1M
Claude 3 drew attention for demonstrating an apparent ability to realize it is being artificially tested during needle in a haystack tests

Claud 3.5, June 2024

Sonnet: improvements in coding, multistep, chart interpretation, and text extraction from images

90 of 99

Open AI Evaluation Metrics

91 of 99

Claud’s Purported Performance Metrics

92 of 99

Claud’s Purported Performance Metrics

93 of 99

GPT-o1 (evaluation comparisons between 4o and o1 series)

94 of 99

Large Language Models

95 of 99

Large Language Models

96 of 99

LLMs proliferate throughout the world

Beyond ChatGPT, Gemini, Claud, Llama in the United States:

number of LLMs = 1,328 (as of July 2024)

QWEN - Alibaba’s Line of LLMs

Out of 81 large-scale AI models, 43 were developed by organizations based in the United States.

Around a quarter of these were from China

97 of 99

Progress of LLMs above 10²³ flops

98 of 99

Section 3: Some History and Where it Pertains to You

A Timeline of Natural Language Processing�
Natural Language Processing: New Timeline�
Applied Natural Language Processing Our Focus

99 of 99

Recent Presentations to Etsy

Applications of NLP and LLMs at Etsy�
🧊 Emile Contal, CTO @ Crossing Minds: �RAGSys: Item-Cold-Start Recommender as RAG System�
⚖️ Rebecca Qian, Co-Founder & CTO @ Patronous AI: �LLM Output Evaluation at Scale