1 of 12

A Whirlwind Tour of Linguistics and Linguists in Tech

Hannah Van Brunt

AVL Digital Nomads Tech Connect

May 1, 2024

2 of 12

Outline

1. My background

2. Linguistics in a nutshell

3. A traditional profession vs. recent shifts

4. What do linguists do in tech?

5. Linguist contributions to NLP / AI: Examples

6. Closing and language blog

3 of 12

1. My background

  • BA in French, MA in Linguistics
  • Linguist / Researcher on FrameNet lexical database
    • (International Computer Science Institute in Berkeley, CA)
  • Short stints as university lecturer of Linguistics, English
  • Last 6 years, worked on virtual assistants in big tech:
    • Linguist at Samsung (on Bixby)
    • Analytical Linguist at Google (on Google Assistant)

4 of 12

2. Linguistics in a nutshell (1)

Linguistics can be subdivided into the following basic branches:

          • PHONETICS / PHONOLOGY – The study �of sounds and sound inventory of a language
            • Ex: The consonants ‘r’ and ‘l’ are discrete sounds �in English, meaning they differentiate between �words – but there’s no clear distinction in �Japanese (‘r’ and ‘l’ are perceived as one sound)
  • MORPHOLOGY – The study of word formation and �structure
    • Ex: “Unhappiness” can be broken�into basic units: ‘un’ (not) + �‘happy’ (root word) + �‘ness’ (the state/condition of)

5 of 12

2. Linguistics in a nutshell (2)

  • SYNTAX – The structure of phrases and sentences (i.e. “grammar”)
    • Ex: “The boy chased the dog” has a subject noun phrase (‘the boy’), �a verb (‘chased’), and an object noun phrase �(‘the dog’). Order of this sentence is: �SVO (Subject-Verb-Object)

            • SEMANTICS – The study of meaning in language
              • Ex: We understand that “dog” refers to a domesticated animal with four legs and a tail.
  • PRAGMATICS – The study of language use in discourse / context
    • Ex: Two people in a room with an open window, one says “It’s cold in here” – likely an implicit �request for other person to close the window.

6 of 12

7 of 12

3. A traditional profession vs. recent shifts

  • Traditionally
    • Linguists worked in academia as professors and researchers
    • …Or as field researchers (documenting endangered /�dying languages around the world), or as lexicographers (dictionary writers)
    • Related disciplines: Psychology, Cognitive Science, Speech Language Pathology, �Anthropology, Sociology, Neurology
  • Recent shift
    • Past 10-15 years, a shift for linguists from academia to industry (and Tech specifically)
      • Academia becoming more & more unsustainable
      • Real desire/need for linguists in Tech
      • NLP, NLU, ASR, TTS, ML, voice assistants (Alexa, Siri, Google), VUI (voice user interface)

8 of 12

4. What do linguists do in tech?

  • Analytical Linguist / Computational Linguist / NLP Linguist / �Natural Language Analyst
    • Companies: Google, Meta, Amazon, Samsung, Grammarly
    • Tasks: Create semantics for intents; Clean, annotate & analyze natural �language data; Train & evaluate ML models; NLP
  • Conversation (AI) Designer
    • Companies: Rasa, Amazon, OpenDialog AI, Sensely, LivePerson
    • Tasks: Design conversational flows for chatbots; Handle E2E dev / product lifecycle
  • Knowledge (Graph) Engineer / Ontologist / Taxonomist
    • Companies: LinkedIn, Microsoft, Walmart, Airbnb, data.world
    • Tasks: Build ontologies, taxonomies, and knowledge representations for businesses and AI
  • Learning/Curriculum Designer in EdTech
    • Companies: (Language learning software) Duolingo, Memrise, Babbel, Busuu, Rosetta Stone
    • Tasks: Plan syllabi; Design digital language courses / app content; Conduct user research

9 of 12

5. Linguist contributions to NLP / AI (1)

EXAMPLE 1: Ensuring quality data and annotation

  • Issue: High-quality natural language data and annotation are crucial for NLP / ML / LLMs / AI (results are only as good as the data – “garbage in, garbage out”)
  • Linguists understand how language as a system works; are experts in designing annotation pipelines that result in quality training datasets
  • Ex. (A): Emotions data for sentiment analysis
  • Ex. (B): Verbosity / Style / Tone / Register
    • Analytical Linguists at Grammarly work on register (“formality”)�and tone (“domain”) for their AI writing assistant

10 of 12

5. Linguist contributions to NLP / AI (2)

EXAMPLE 2: Dealing with ambiguity - Grounding language in context

  • Issue: Natural language is inherently ambiguous and needs context to resolve meaning
  • Ex. (A): Linguists (as Knowledge Engineers and Ontologists) model semantic networks, knowledge graphs, and ontologies that serve as backbones for grounded NLP systems. Linguists also work on incorporating richer user context into NLP pipelines
        • For voice assistant to disambiguate user command like “Play ‘Message in a Bottle’” (🡪 could be song by The Police, song by Taylor Swift, 1999 film, or audiobook recording of Nicholas Spark’s novel!), we need extra context:
            • (i) KG / ontology of music, movies, & other media;
            • (ii) metadata on user devices (screens for video vs. audio-only);
            • (iii) metadata (history) on user preferences (like Taylor Swift or The Police?)

11 of 12

5. Linguist contributions to NLP / AI (3)

EXAMPLE 3: Internationalization and Translation

  • Issue: Many nuances do not translate across languages in straightforward ways
  • Linguists have rich understanding of cross-linguistic generalities and differences, can bring this knowledge to bear on:
    • Ex. (A): Lexical-semantic problems. Languages encode concepts differently – strict 1:1 word translations often inaccurate
      • User commands for smart devices – “Open the dishwasher.” ‘Open’ & ‘turn on’ are same word �in some languages
          • Ex. (B): ML, LLM, and translation engine problems with lesser-spoken languages
            • African languages (e.g. Eritrean); even more widely-spoken languages like Arabic (+ dialects)

12 of 12

6. Closing and language blog

Personal project: Language / Linguistics blog since 2017, linguamonium.com

THANK YOU!