| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Date | Day | Topic | Short Overview | Readings/Prep | Assignment | Assignment | |||||||||||||||||||||
2 | 01/27/2026 | Tuesday | First day of class was cancelled – snow day! | |||||||||||||||||||||||||
3 | 01/29/2026 | Thursday | Yikes: second day of class was cancelled too! | |||||||||||||||||||||||||
4 | 02/03/2026 | Tuesday | What is “understanding”? | Course overview (what we'll cover and why, flavor of how we’ll approach things, syllabus, etc). Then: what do we mean by “understanding”? The readings ahead of the first class are intended to whet your appetite -- Sagan's piece is an inspiring reflection on what scientific understanding is about, and the Bisson short story is a brilliant, quirky opening for reflection on language and intelligence. | Carl Sagan, "Can We Know the Universe?: Reflections on a Grain of Salt;" from Broca's Brain: Reflections on the Romance of Science, New York: Random House, 1979, pp. 13-18. http://w3.inf.fu-berlin.de/lehre/WS05/19616-K/materials/KnowTheUniverse.pdf Terry Bisson. "They're Made out of Meat". Omni, 1990. https://web.archive.org/web/20190501130711/ OR http://www.terrybisson.com/theyre-made-out-of-meat-2/ | Here and below, the assignment each week is the readings/reactions assignment described in the syllabus *unless* otherwise specified. | Here and below, the assignment each week is the readings/reactions assignment described in the syllabus *unless* otherwise specified. | |||||||||||||||||||||
5 | 02/05/2026 | Thursday | Scientific understanding from a computational perspective: Marr | Scientific understanding. In this class we're talking about two kinds of systems: the human cognitive system and AI systems. Whatever "language understanding" means, we can agree that the human cognitive system does it; whether AI systems do or not, or to what extent, is a debate we'll be considering as the semester moves along. As a first question to ask: what does it even mean for us to understand what a system is doing, and what does it mean to improve our understanding of that system? There's an enormous literature in the philosophy of science on the question of what scientific explanations are, far too much to even reference. What we'll be doing is focusing on computational ways of thinking about what it means to understand from a scientific perspective, particularly for scientific questions that involve the processing of information. This begins with Marr’s levels of explanation, which, while imperfect, constitute a widely used framework in cognitive science. | Marr (1982) Chapter 1, Section 1.2 (up to but not including the subsection on Gibson, ~9 pages). https://direct.mit.edu/books/book/3299/chapter-standard/105803/The-Philosophy-and-the-Approach Section 1.2 of Marr is the widely cited primary source. Section 6.2 of The Computational Theory of Mind entry in the Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/entries/computational-mind/ Section 6.2 is incredibly short, roughly a page or two. Benedikt Grothe, How the Barn Owl Computes Auditory Space, Trends in Neurosciences, Volume 41, Issue 3, 2018, Pages 115-117, ISSN 0166-2236, https://doi.org/10.1016/j.tins.2018.01.004. The way the barn owl figures out the location of its prey is a lovely example of scientific understanding taking place at each of Marr’s three levels. Optional: Herbert Simon, The Sciences of the Artificial, Chapter 1: Understanding the Natural and Artificial Worlds. | |||||||||||||||||||||||
6 | 02/10/2026 | Tuesday | Scientific understanding from a computational perspective: prediction | We’re focusing on two computationally motivated aspects of what it means to understand scientifically: the idea of *prediction*, and the idea of *generalization.* We’ll begin by thinking of prediction in terms of conditional probability, reviewing some basic probability concepts as necessary (e.g. joint probability, conditional probability, what “independent” means, marginalization). This is going to lead us to one of the most important and most deceptively simple tools in science, Bayes’ Rule (or Law, or Theorem, take your pick), which tells you how to improve your predictions when you get more evidence. Note that we’re actually going to be considering two different senses of “prediction” here, both of which are relevant when we’re thinking about scientific understanding. One is a sense of the word where we’re talking about what conclusions (hypotheses) you should predict are true (i.e. believe, and to what extent) given the evidence you’ve seen. The other is a sense of the word that involves the prediction of observable data in some scientific object of study, e.g., if I’ve got a scientific model of how people process sentences, what predictions do I make about when they’ll encounter processing difficulty? One of the themes you’ll see come out of the readings has to do with differences in the way we reason intuitively, as compared to the rigor of Bayes’ Theorem as a rational, well founded way of arriving at beliefs. This has important implications for the conduct of science. | Note that the topic this week involves a progression from core ideas in probability to Bayes’ Theorem. If you’re already familiar with the probability concepts and the first reading is just a refresher, great. If not, then you’ll probably want to focus more on Pinker Chapter 4 for Tuesday, rather than trying to load up on both Chapter 4 and Chapter 5 ahead of the first class this week. Either way, remember the “read lightly, go to class, re-read in an informed way” strategy that I emphasized in the syllabus. Chapter 4, “Probability and Randomness”. In Steven Pinker, *Rationality*, Viking, 2021. This is a very accessible intro/refresher that does a nice job introducing (or refreshing) the most fundamental ideas in probability. Chapter 5, “Beliefs and Evidence”. In Steven Pinker, *Rationality*, Viking, 2021. This is a very accessible intro/refresher to Bayes’ Theorem. Grant Sanderson, “Bayes’ Theorem” [15min video or text adaptation]. https://www.3blue1brown.com/lessons/bayes-theorem. Grant Sanderson’s 3Blue1Brown (3B1B) is an absolutely fantastic resource for intuitive, visual explanations of mathematical concepts. Optional readings: Chapter 10, “Are You There, God? It’s Me, Bayesian Inference”. In Jordan Ellenberg, *How Not to Be Wrong: The Power of Mathematical Thinking*, Penguin, 2014. Chapter 4, “In All Probability”. In Anil Ananthaswamy, *Why Machines Learn: The Elegant Math Behind Modern AI*, Dutton, 2024. If you are comfortable with ideas presented in a math-y kind of way, you might like this. It’s written in an engaging way for a math-comfortable general audience, and it’s articulating the same basic concepts with more explicit mathematical notation and more direct connections to the machine learning ideas underlying current AI. | |||||||||||||||||||||||
7 | 02/12/2026 | Thursday | Scientific understanding from a computational perspective: prediction | This will be a continuation of the previous class. | No new readings | |||||||||||||||||||||||
8 | 02/17/2026 | Tuesday | Scientific understanding from a computational perspective: generalization (the rationalist version) | The ability to predict well is necessary for scientific understanding: if your theory isn’t making the right predictions about relevant observations, it’s either wrong or incomplete or both. But that’s not enough. Scientific theories are not about individual datapoints, they’re about *phenomena* – that is, scientific understanding requires generalizations across different observables and the ability to make predictions at that level. In classical approaches to science, dating back to antiquity, generalization involves rational consideration of how categories should be defined. Modern “data driven” approaches take an empiricist angle, which also dates to antiquity, that argues for bottom up discovery (induction) of generalizations. Over the next few classes we’re going to look at both approaches. In this process, we’ll begin to see that this dichotomy has a mirror image in the ways that computer scientists have approached the goal of creating machines that “understand”, i.e. AI. | Borges, Jorge Luis. "Funes the Memorious." Translated by James E. Irby, *Labyrinths: Selected Stories and Other Writings*, New Directions, 1962, pp. 149-154. https://vigeland.caltech.edu/ist4/lectures/funes%20borges.pdf This short piece of fiction nicely captures today’s topic: “To think is to forget a difference, to generalize, to abstract.” Chapter 10, *A Priori* Philosophical Languages. In Umberto Eco, *The Search for the Perfect Language*, trans. James Fentress (London: Fontana Press, 1997). This chapter, while dense and historical, does a nice job capturing classical ideas about what it meant to organize information in a structured system that would capture abstractions, generalizations in our understanding of the world. As such it’s also going to serve as a useful predecessor for our later discussions of “Good Old Fashioned AI”, which is to say symbolic, knowledge-based approaches. DEFINITELY DO NOT read this chapter in full for detail; focus on the setup (pp. 209-210); read lightly for a general sense of what the specific philosophers like Bacon, Descartes, etc. were after (pp. 211-221), and the most interesting/relevant discussion for our interests will be in “Primitives and Organization of Content” (pp. 221-227). Miller, George A. "Nouns in WordNet: a lexical inheritance system." *International journal of Lexicography* 3.4 (1990): 245-264. See https://cs.brown.edu/courses/csci2952d/readings/lecture4-miller.pdf pages 10-25/. Miller’s WordNet was the first computer-readable organization of (lexicalized) conceptual knowledge to be employed in tandem with data-driven approaches to natural language processing (Resnik 1998, WordNet and class-based probabilities, in C. Fellbaum (Ed.), WordNet: An Electronic Lexical Database, MIT Press). Borges, Jorge Luis. "El idioma analítico de John Wilkins." *Otras inquisiciones* 2 (1952). https://sites.evergreen.edu/wp-content/uploads/sites/226/2016/09/jorge-luis-borges-the-analytical-language-of-john-wilkins-1.pdf. This is a short piece in which Borges rather scathingly highlights flaws in the approach discussed in the Eco chapter, and it’s worth thinking about whether and how the Miller chapter represents progress. Optional: Chapter 14, From Leibniz to the *Encyclopédie*. In Umberto Eco, *The Search for the Perfect Language*, trans. James Fentress (London: Fontana Press, 1997). Most people today know about Leibniz primarily as the “other inventor” of the calculus. But among his many other intellectual endeavors, he aspired to the creation of a formal system for encoding and expressing all of human knowledge (cf. Eco, Chapter 10) – one in which it would be possible to infer knew knowledge from existing knowledge and literally impossible to express things that are not true. | |||||||||||||||||||||||
9 | 02/19/2026 | Thursday | Scientific understanding from a computational perspective: generalization (an empiricist version where we fit models) | It’s probably not fair to say “empiricist version” here, nor to say that this is just about generalization, because the paper we’re going to cover today is really talking about prediction *and *generalization, and it’s also not as purely empiricist as some of the other work we’ll talk about when we move to neural networks. Still, it’s a good way to introduce a transition from a strictly symbolic, rationalist way of approaching scientific understanding to an approach – common in psychology and many other sciences – where you have a “rational” element to scientific explanation that involves your choice of a space of possible models (machine learning people, you should thinking about the term “inductive bias” here!), where a particular instantiation in that space is realized by “fitting” the model. The Yarkoni and Westfall paper is chock-full of important concepts and, perhaps most important, it captures the idea of a *tradeoff* between the goal of prediction and the goal of generalization/explanation. | [WordNet discussion may slip to this class] Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. *Perspectives on Psychological Science*, *12*(6), 1100-1122. https://talyarkoni.org/pdf/Yarkoni_PPS_2017.pdf | |||||||||||||||||||||||
10 | 02/24/2026 | Tuesday | Continuation | I’m leaving this date clear for the moment anticipating the possibility that we won’t have gotten as far as originally planned owing to winter weather, to spending more time in class than originally planned on the topics so far, or both. If not, I’ll move things up in the schedule. [Update: we did indeed have Yarkoni and Westfall discussion slip to this class] | ||||||||||||||||||||||||
11 | 02/26/2026 | Thursday | Scientific understanding from a computational perspective: generalization (the empiricist version, with neural networks) | In contrast to the idea of rationally and manually constructing generalizations, there’s a philosophically very different angle on how to go about it. An empiricist approach to generalization involves representing individual observations in a way that allows generalizations to *emerge*. We’re going to use the famous *word2vec* as a springboard for introducing vector-space representations in general and neural network approaches in particular. | Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. “Distributed Representations of Words and Phrases and their Compositionality.” *Advances in Neural Information Processing Systems 26 (NeurIPS 2013)*, 2013. https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf. Word2vec was a true milestone in the transition from older AI approaches to what we have today. It rocked many people’s worlds by showing that, using vector-space representations and a very simple learning model, you could capture not only category-based generalizations (already accomplished decades earlier, e.g. via Hinrich Schuetze’s WordSpace model), but also logical relationships, like the fact that the terms *man *and *king* are related to each other in the same way as *woman* and *queen*. Grant Sanderson, “But what is a Neural Network?” [18min video or text adaptation]. https://www.3blue1brown.com/?v=neural-networks Grant Sanderson, “Gradient descent, how neural networks learn” [20min video or text adaptation]. https://www.3blue1brown.com/?v=gradient-descent | |||||||||||||||||||||||
12 | 03/03/2026 | Tuesday | Empirically-driven generalization, continued | We’ll continue making sure we understand the fundamental concepts behind the neural networks underlying current AI, with particular attention to the ways in which they generalize from data. We’ll also touch on the ideas – but not the detailed math – involved in other data-driven approaches to discovering generalizations. | Readings to be determined | |||||||||||||||||||||||
13 | 03/05/2026 | Thursday | Evaluating scientific hypotheses: statistical significance | We’ve seen enough to realize that the essence of scientific progress involves competition of ideas, and we’ve discussed a couple of ways of thinking about this. We’ve talked about Bayesian approaches, where you update your belief (probability) that something is true based on what you believed before plus new evidence. We’ve also discussed the idea of scientific hypotheses expressed as models, and evaluating how well a model fits observed data. Neither of these, by itself, is actually the dominant paradigm in science for evaluating whether a proposed claim should be thought of as a true or false. The most widespread way of doing so is called “null hypothesis significance testing” – if you’ve ever seen “p < .05” in a research paper, or simply a claim that some finding is “statistically significant”, that’s what they’re talking about. This paradigm is widely misunderstood, misused, and even abused, so it’s worth taking some time to discuss it. | Chapter 6, “The Baltimore Stockbroker and the Bible Code”. In Jordan Ellenberg, *How Not to Be Wrong: The Power of Mathematical Thinking*, Penguin, 2014. This and the following chapter provide a very engaging discussion of the role of probability – and the pitfalls – when it comes to deciding what claims to believe or not believe. Chapter 7, “Dead Fish Don’t Read Minds”. In Jordan Ellenberg, *How Not to Be Wrong: The Power of Mathematical Thinking*, Penguin, 2014. This chapter continues the discussion of probability and inference. Optional reading: Chapter 7, “Hits and False Alarms”. In Steven Pinker, *Rationality*, Viking, 2021. Pinker is covering some of the same ground as Ellenberg here, but in a slightly more formal (but still not deeply math-y) way. One thing that he’s adding here is the importance of thinking about the *consequences* of coming to the wrong conclusion, and about the *magnitude* of the results you’re describing – the difference between “statistically significant” and “meaningful”. | |||||||||||||||||||||||
14 | 03/10/2026 | Tuesday | The behavioral perspective on understanding: the Turing test and chatbots (Part 1: the Turing Test) | Class up to this point has given us a computationally oriented grounding for what it means to “understand’, in the sense of understanding the world or observable phenomena in it. Now we’ll shift to the question of what it means to understand *someone*, with linguistic communication at the heart of the matter. In one of the most influential papers in AI (indeed, in intellectual history), in 1950, Alan Turing proposed replacing the ill-defined question “Can machines think?” with a different question. Roughly, the idea is this: we can presumably agree that *people* think, so if a machine is behaving indistinguishably from a person, we must conclude that it is “thinking”, for all practical purposes, at least. So the ill-defined question is replaced with a structured exercise – the “imitation game” – where the question is well defined: if a machine and a person are both being interrogated by someone to figure out which is which, will the interrogator be able to tell them apart? Turing’s approach is focusing not at all on the *how* but entirely on the *what* of what a system is doing, a radically extensional view of Marr's computational level focused entirely on input and output behavior. | Turing, A. M. (1950). Computing Machinery and Intelligence. *Mind*, 59(236), 433-460. https://turing.academicwebsite.com/publications/21-computing-machinery-and-intelligence. We will focus on Sections 1-2 and the "Imitation Game" concept to establish the behavioral criterion for intelligence., though his comments on learning are interesting and often ignored. | |||||||||||||||||||||||
15 | 03/12/2026 | Thursday | Interlude: Planning Class Projects | Discussion of what class projects could look like. | ||||||||||||||||||||||||
16 | 03/17/2026 | Tuesday | Spring Break | Have fun! | Have fun! | |||||||||||||||||||||||
17 | 03/19/2026 | Thursday | Spring Break | Have fun! | Have fun! | |||||||||||||||||||||||
18 | 03/24/2026 | Tuesday | The behavioral perspective on understanding: the Turing test and chatbots (Part 2: ELIZA and chatbots) | The Turing Test turns out to be more than a matter for philosophers. We’re going to consider Turing’s behavioral approach in light of two important stories. The first is the story of ELIZA, the first chatbot. The second is the story of Sewell Setzer III, a teenager who developed an obsessive and tragic attachment to a Character.AI chatbot modeled on Daenerys Targaryen from *Game of Thrones*. The progression we’re following here is an arc across multiple perspectives. Turing (1950) was a huge *philosophical* step arguing (ironically) that it might be better to sidestep the thorny philosophical issues in asking “Can machines think/understand?” and instead consider what we should instead be asking for practical purposes. Weizenbaum’s ELIZA was a landmark in thinking about conversational technology – chatbots – where (perhaps again ironically) the really important part turned out not to be the technology, but rather new insights about the real-world *implications of* such technology. The Head article brings this question of implications quite viscerally into the present: the Setzer story, and increasing number of others like it, really foregrounds Weizenbaum’s concerns about anthropomorphism and, more generally, the relationship between technological progress and human values. | Weizenbaum, J. (1976). *Computer power and human reason: From judgement to calculation*. W. H. Freeman and Company. Preface and Introduction. Berry, D. M. (2023). "The Limits of Computation: Joseph Weizenbaum and the ELIZA Chatbot." In *Joseph Weizenbaum: Heritage and Prospects*, edited by Ulrich Gehmann et al. Springer. https://sussex.figshare.com/articles/journal_contribution/The_Limits_of_Computation_Joseph_Weizenbaum_and_the_ELIZA_Chatbot/24512008?file=43053739 OR https://www.weizenbaum-library.de/server/api/core/bitstreams/627985d1-be60-4e9a-8230-f41aa34f2a73/content Head, K. R. (2025). "Minds in Crisis: How the AI Revolution is Impacting Mental Health." *Journal of Mental Health and Clinical Psychology*, 9(3), 34-44. https://www.mentalhealthjournal.org/articles/minds-in-crisis-how-the-ai-revolution-is-impacting-mental-health.pdf Optional additional material: Weizenbaum, J. (1966). ELIZA—A Computer Program for the Study of Natural Language Communication Between Man and Machine. *Communications of the ACM*, 9(1), 36-45. https://doi.org/10.1145/357980.357991 The original and widely cited ELIZA paper. Quinn, M. J. (2025). Teens and Screens: AI Companions Should Be Off-limits to Minors. *Ubiquity*, *2025*(September), 1-7. https://dl.acm.org/doi/epdf/10.1145/3760266. | |||||||||||||||||||||||
19 | 03/26/2026 | Thursday | Computational theory of mind and the physical symbol systems hypothesis | One of the most durable debates related to understanding has to do with the idea of computational understanding systems as manipulators of information that is not connected directly to the world. We'll consider two very important discussions. One is about the idea of understanding (and more generally intelligence) involving manipulation of symbolic information, articulated by Newell and Simon's Physical Symbols Systems Hypothesis. | Newell, Allen; Simon, H. A. (1976), "Computer Science as Empirical Inquiry: Symbols and Search", Communications of the ACM, 19 (3): 113–126, https://dl.acm.org/doi/10.1145/360018.360022 [designated sections to be specified] | |||||||||||||||||||||||
20 | 03/31/2026 | Tuesday | Evaluating LLM capabilities | A contrast to the behavioral approaches we’ve discussed, in assessing whether systems “understand”, is to evaluate specific capabilities that we believe are necessarily parts of what it means to understand. The two guest lectures this week, by CS PhD students Neha Srikanth (Tuesday) and Rupak Sarkar (Thursday), will discuss recent work conducting such evaluations, related to natural language inferences and to common ground in conversation. | Poliak, A., Naradowsky, J., Haldar, A., Rudinger, R., & Van Durme, B. (2018, June). Hypothesis only baselines in natural language inference. In *Proceedings of the seventh joint conference on lexical and computational semantics* (pp. 180-191). https://arxiv.org/pdf/1805.01042 Clark, H. H., & Schaefer, E. F. (1987). Collaborating on contributions to conversations. *Language and cognitive processes*, *2*(1), 19-41. https://web.stanford.edu/~clark/1980s/Clark,%20H.H.%20_%20Schaefer,%20E.F.%20_Collaborating%20on%20contributions%20to%20conversations_%201987.pdf Optional (more linguistics background): Stalnaker, R. (2002). Common Ground. *Linguistics and Philosophy*, 25(5/6), 701–721. [Sections 1-2, 3 optional] URL: https://semantics.uchicago.edu/kennedy/classes/f07/pragmatics/stalnaker02.pdf Optional (overly long list of relevant computational papers, mainly for your reference): Srikanth, N., & Rudinger, R. (2022). Partial-input baselines show that NLI models can ignore context, but they don't. In *Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies* (pp. 4753–4763). Association for Computational Linguistics. URL: https://aclanthology.org/2022.naacl-main.350.pdf Srikanth, N., & Rudinger, R. (2024). How Often Are Errors in Natural Language Reasoning Due to Paraphrastic Variability?. *Transactions of the Association for Computational Linguistics*, 12, 1143–1162. URL: https://aclanthology.org/2024.tacl-1.63.pdf Srikanth, N., Wanner, A., Pezzelle, S., & Rudinger, R. (2025). NLI under the Microscope: What Atomic Hypothesis Decomposition Reveals. In *Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics* (pp. 2573–2589). Association for Computational Linguistics. URL: https://aclanthology.org/2025.naacl-long.130.pdf Gardner, M., Artzi, Y., Basmova, V., Berant, J., Bogin, B., Chen, S., Dasigi, P., Dua, D., Elazar, Y., Gottumukkala, A., Gupta, N., Hajishirzi, H., Ilharco, G., Khashabi, D., Lin, K., Jiang, J., Liu, N., Min, S., Peng, P., … Zettlemoyer, L. (2020). Evaluating Models’ Local Decision Boundaries via Contrast Sets. In *Findings of the Association for Computational Linguistics: EMNLP 2020* (pp. 1307–1323). Association for Computational Linguistics. URL: https://aclanthology.org/2020.findings-emnlp.117.pdf Poliak, A., Naradowsky, J., Haldar, A., Rudinger, R., & Van Durme, B. (2018). Hypothesis Only Baselines in Natural Language Inference. In *Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics (*SEM 2018)* (pp. 180–191). Association for Computational Linguistics. URL: https://aclanthology.org/S18-2023.pdf | |||||||||||||||||||||||
21 | 04/02/2026 | Thursday | Evaluating LLM capabilities, continued | |||||||||||||||||||||||||
22 | 04/07/2026 | Tuesday | Searle’s Chinese Room argument | Searle's widely debated Chinese Room argument is one of the most famous thought experiments in AI, challenging the idea of symbol manipulation (Newell and Simon) as a means to achieving artificial intelligence and challenging behavioral indistinguishability (Turing) as a basis for deciding it’s been achieved. | Searle, "Minds, Brains, and Programs", Behavioral and brain sciences 3.3 (1980): 417-424. (Main paper; reading commentaries is not required) https://web.archive.org/web/20180513202016id_/https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S0140525X00005756 Optional: Cole, David, "The Chinese Room Argument", *The Stanford Encyclopedia of Philosophy *(Spring 2026 Edition), Edward N. Zalta & Uri Nodelman (eds.), URL = <https://plato.stanford.edu/archives/spr2026/entries/chinese-room/>. | |||||||||||||||||||||||
23 | 04/09/2026 | Thursday | Bender & Koller’s Octopus argument | A more recent and also widely debated paper by Bender and Koller also offers a thought experiment to argue that behavioral competence is not sufficient evidence of genuine understanding, now in the context of modern AI. But where Searle is making an in-principle argument about mere symbol-manipulation (with at most an internal semantics) being the wrong kind of computation to achieve intentionality (an external semantics), no matter what the inputs and outputs, Bender and Koller’s more practically oriented argument takes place in a modern, learning-focused context and centers on the nature of the training data: they’re arguing that training on text (“form”) alone can’t yield a system capable of recognizing communicative intent, which requires real-world context. | Bender, Emily M., and Alexander Koller. "Climbing towards NLU: On meaning, form, and understanding in the age of data." In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5185-5198. 2020. (Optional: the ACL 2020 video -- talk ~12min, +~40min Q&A) https://aclanthology.org/2020.acl-main.463/ Strongly recommended ( (b) Julian Michael (blog post Kuly 23, 2020), To Dissect an Octopus: Making Sense of the Form/Meaning Debate. https://julianmichael.org/blog/2020/07/23/to-dissect-an-octopus.html | |||||||||||||||||||||||
24 | 04/14/2026 | Tuesday | How tranformers work | One of the most basic ideas when it comes to understanding is the fact that we communicate “who did what to whom”: the thoughts that we communicate are not just sets or lists of things, they involve *relationships* among things, even if we’re communicating these through a (basically) sequential channel, one word after another. This means that any language understanding device must somehow manage to deal with hierarchical structures, even if input and output take the form of sequences. We’ll lay the foundations for a discussion of what we know about hierarchy in language, starting with a discussion of how current AI architectures (based on “transformers”) work, which will then move to human syntactic knowledge, which will then move to how we assess tranformers’ syntactic knowledge and what that tells us. | Grant Sanderson, Large Language Models explained briefly [~8min], 3Blue1Brown, https://www.3blue1brown.com/?v=mini-llm. Grant Sanderson, Transformers, the tech behind LLMs. [~30min], 3Blue1Brown, https://www.3blue1brown.com/lessons/gpt. Chapter 4, “How Language Works”. In Pinker, S. (2007). *The Language Instinct (1994/2007)*. New York, NY: Harper Perennial Modern Classics. Although the specifics of linguistic theory have changed a lot since 1994, Pinker’s *The Language Instinct *is the first, and in my opinion still the best, general-audience introduction to the modern scientific study of language. This gets key ideas in syntax across in a very readable way. OPTIONAL Grant Sanderson, Linear transformations and matrices [~11min], 3Blue1Brown, https://www.3blue1brown.com/?v=linear-transformations. This short video provides a really nice way about thinking of matrix-vector multiplication as transforming one representational space into another representational space, which I find very useful in getting at the intuitions behind what neural networks do. (You could watch the required two videos first, which are the most important, and then decide whether going back to this one would be valuable.) Grant Sanderson, Attention in transformers, step by step [~25min], https://www.3blue1brown.com/?v=attention. The main video (“Transformers, the tech behind LLMs”) is all you need get the essence of what the attention mechanism in transformers is doing. This goes into more depth, if you want to get a better sense of *how* the attention mechanism works. | |||||||||||||||||||||||
25 | 04/16/2026 | Thursday | Hierarchy and human syntactic knowledge | Last class was about how transformers work. This session focuses about how *people* work, in terms of the fundamental property of hierarchy in human language. We’ll take a computational angle, asking, “what kind of computational device is needed in order to ‘get’ the relationship between essentially sequential input/output and underlying hierarchy in human linguistic representations?”. | OPTIONAL Sections 1-3 [~5 pages] of Linzen, Tal and Baroni, Marco, Syntactic Structure from Deep Learning (January 2021). Annual Review of Linguistics, Vol. 7, Issue 1, pp. 195-212, 2021, SSRN: or . This paper has a nice, not technical review of core ideas in work looking at the extent to which deep learning systems are capturing key elements of syntactic knowledge, written for linguists. The first half gets at the most important points. | |||||||||||||||||||||||
26 | 04/21/2026 | Tuesday | Syntactic knowledge and processing in AI systems | Given that AI does a remarkably good job in tasks that involve human language, and given that hierarchical syntax is central to human language, what knowledge of syntax exists in these AI systems, and how is it related to human syntactic knowledge? We’ll talk about a number of ways that researchers have been tackling this question and some key things that have been found so far. The Linzen et al. (2016) paper is a classic in terms of one way of asking the question, with optional von der Malsburg and Padó (2026) being a very current update; we’ll also talk about other ways of asking the question (and I can provide readings if you’re interested). | Linzen, T., Dupoux, E., & Goldberg, Y. (2016). Assessing the ability of LSTMs to learn syntax-sensitive dependencies. *Transactions of the Association for Computational Linguistics*, *4*, 521-535. https://tallinzen.net/media/papers/linzen_dupoux_goldberg_2016_tacl.pdf | |||||||||||||||||||||||
27 | 04/23/2026 | Thursday | Classical approaches to parsing and ambiguity | Now that we have talked about the nature of human syntactic knowledge and the way that transformers work, we’re going to begin a multi-part arc looking at the way that knowledge (mainly but not exclusively focused on syntax) is *deployed* in sentence understanding. Highlighting relevant pieces of literature and central concepts as we go, we’ll discuss early processing models in the knowledge-based tradition, then the shift to “statistical NLP” and probabilistic grammar approaches, moving then to surprisal theory. Today we begin with early processing models: Frazier and Fodor’s parsing heuristics, Marcus’s deterministic parsing proposal and garden path sentences, and Resnik’s argument for left-corner parsing with composition as a psychologically plausible alternative to purely top-down and purely bottom-up strategies. We’ll also briefly discuss combinatory categorial grammar (CCG) as an alternative grammatical framework that inherently includes left-to-right processing and composition. | None for today, though if you’re interested in pointers to the papers we’re discussing let me know. | |||||||||||||||||||||||
28 | 04/28/2026 | Tuesday | Surprisal and language processing | Today we move from what humans and AI *know* about syntax to their *use* of syntax. There surprisal has been a real workhorse. Although the core surprisal proposal comes from a 2001 paper by John Hale, the relevant sections of Levy (2008) are actually a more accessible way to get the core ideas. For those interested, the optional, very recent von der Malsburg and Padó article uses surprisal in a comprehensive and up to date analysis of the kinds of agreement-related effects that were use by Linzen, T., Dupoux, E., & Goldberg as a diagnostic of systems’ ability to learn syntactic dependencies. | Levy, R. (2008). Expectation-based syntactic comprehension. *Cognition*, *106*(3). https://www.mit.edu/~rplevy/papers/levy-2008-cognition.pdf. There is a lot in this paper – here’s a guide to what’s important: Abstract + Introduction , Section 2.1 (opening paragraph and the paragraph after Eq. 14 only; skip the proof), Sections 2.2–2.4, Section 3, first and last paragraphs only, Section 4.1, Section 5.1, Section 6, Section 9. OPTIONAL von der Malsburg, T., & Padó, S. (2026). Diverging Transformer Predictions for Human Sentence Processing: A Comprehensive Analysis of Agreement Attraction Effects. *arXiv preprint arXiv:2603.16574*. https://www.scilit.com/publications/280c8954e6987c89e95ba27792876437 | |||||||||||||||||||||||
29 | 04/30/2026 | Thursday | Is surprisal all you need? | Since Levy (2008), surprisal has been a central tool in studying human language processing. Because LLMs are remarkably good probability estimators, and because LLM-based surprisal has strong correlations with measures of human processing effort, some researchers have gone so far as to make an argument one might sum up as “surprisal is all you need”*, at least at the Marr computational-theory level. The idea is that, regardless of what might be happening at the algorithmic and implementation levels, well-estimated surprisal captures all of the relevant factors that go into characterizing human processing difficulty and therefore there is no need to seek a more finely articulated story that distinguishes among the different factors (e.g. syntax, semantics, etc.) that play into the computational-level theory – from the perspective of the *computational* theory, distinguishing among them doesn’t make a difference. If that’s true, then one could justifiably describe an LLM as a Marr computational-level *model of* human processing, not just a tool for studying it. Today we consider that argument, and we look at the Arehalli et al. (2022) paper as a potential challenge to that argument. (Optionally, for a more comprehensive and up to date picture of the “causal bottleneck” argument, see Staub 2025.) [*N.B. I’ve stolen the “surprisal is all you need” phrasing – which cleverly refers back to the famous “Attention is All you Need” transformers paper – from Sathvik Nair.] | Arehalli, S., Dillon, B. W., & Linzen, T. (2022, December). Syntactic surprisal from neural models predicts, but underestimates, human processing difficulty from syntactic ambiguities. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL) (pp. 301-313). https://aclanthology.org/2022.conll-1.20.pdf OPTIONAL Staub, Adrian. "Predictability in language comprehension: Prospects and problems for surprisal." Annual Review of Linguistics 11, no. 1 (2025): 17-34. https://www.annualreviews.org/content/journals/10.1146/annurev-linguistics-011724-121517 | |||||||||||||||||||||||
30 | 05/05/2026 | Tuesday | Predictive processing | The concept of prediction is fundamental in both current AI (e.g. transformers’ training is fundamentally about next-word prediction), and in current psycholinguistics (e.g. surprisal is fundamentally measuring how well or poorly a probabilistic language model has predicted the actual word you see next). And, the top-down and bottom-up strategies that we discussed in the context of syntactic processing are also fundamentally about prediction: a purely top-down strategy lets you predict a syntactic constituent without having seen any evidence for it yet (e.g. we conjecture the sentence is going to start with a noun phrase, an NP can start with a determiner like “the”, so let’s look to see if we’ve got a “the”); a purely bottom-up strategy says you can’t do any prediction, you need 100% of the evidence to conclude a constituent is there (if we’re going to use the rule NP->Det Adj N, we have to completely process all three pieces before we can conclude there’s an NP); and a left-corner strategy is a hybrid that uses a bit of bottom-up evidence as the basis for then making top-down predictions (if we have a Det, and there’s a rule NP->Det Adj N, we can hypothesize that we’re building an NP and predict that we’ll now see an Adj and an N). Despite the centrality of “prediction” in these discussions of language understanding, current AI architectures do not operate at runtime in a fashion where they make top-down predictions and verify them bottom-up: when processing language input they’re fundamentally a “feed forward” architecture. In contrast, there is a large and growing body of evidence that human cognition and perception involve a constant process of prediction, comparison with input/observation/evidence, and then updating their internal models based on that comparison. This has led to proposals of “predictive coding” architectures that implement that process. This week we’re going to look at the ideas in predictive coding, including both sentence processing and Feldman Barrett’s theory of constructed emotion, concluding with a classic piece of science fiction that brings together processing, emotion, and the idea of “alignment”. | Don’t worry that there are four readings listed for this week. The first two are very short, and the second two are fiction pieces – they won’t be hard to read, though it’s worth taking some time after you read them to think about the implications and how they connect with our discussions of AI as noted in the “short overview”. Lisa Feldman Barrett and Jolie Wormwood, When a Gun Is Not a Gun, The New York Times Sunday Review, April 17, 2015. https://clbb.mgh.harvard.edu/when-a-gun-is-not-a-gun/. Although we’re going to focus on sentence processing in class today, this very brief piece sets up the general idea of predictive processing and its implications with a compelling real-world example. Michael Rucker, Thought Leader Interviews: Interview with Lisa Feldman Barrett about Emotion and Affect, Dec 23, 2020. https://michaelrucker.com/thought-leader-interviews/lisa-feldman-barrett-emotion-affect/. Rather than aiming for one of Feldman Barrett’s academic pieces to lay out the ideas in greater detail, I think this interview does a nice job of getting the core ideas across in Questions 5a and 5b (although the earlier parts are interesting, too). Asimov, Isaac. "Runaround." *Astounding Science Fiction*, vol. 29, no. 1, March 1942, pp. 94–103. https://web.williams.edu/Mathematics/sjmiller/public_html/105Sp10/handouts/Runaround.html. This is the story where Asimov’s famous Three Laws of Robotics are stated in their full, canonical form. Asimov, Isaac. "Liar!" *Astounding Science Fiction*, vol. 27, no. 3, May 1941, pp. 43–55. https://archive.org/details/Astounding_v27n03_1941-05/page/n41/mode/2up. The PDF is also in the course readings folder and in that file I’ve left in some of the fun surrounding context that gives a sense of the era in which it was published. Like most of Asimov’s early science fiction, the story is a product of its time – who could possibly imagine a future in which men were not in charge of research or where people didn’t smoke cigars?! But, also like most of Asimov’s fiction, it contains deep insight into the human condition and it also anticipates (~85 years ago!) current AI issues involving things like over-trust (cf. the Eliza effect), alignment, and sycophancy. OPTIONAL Clark, Andy. "Whatever next? Predictive brains, situated agents, and the future of cognitive science." *Behavioral and brain sciences* 36, no. 3 (2013): 181-204. This is a widely cited classic paper that introduces the idea of predictive coding and (2013-era) evidence for it. It’s non-technical, though the prose can be kind of dense. Up through Section 1.5 is what matters most. Lisa Feldman Barrett, You aren't at the mercy of your emotions -- your brain creates them. TED talk, ~18min. https://www.ted.com/talks/lisa_feldman_barrett_you_aren_t_at_the_mercy_of_your_emotions_your_brain_creates_them | |||||||||||||||||||||||
31 | 05/07/2026 | Thursday | Emotion and understanding | Most of what we’ve talked about this semester has approached understanding in fairly dry, scientific terms. But as human beings we are not just rational creatures, we are emotional creatures, and we’ve also seen – e.g. in the Eliza effect – the extent to which people’s “understanding” is also tied up with their emotional responses, which, as we’ve all surely experienced, can sometimes lead us to different understandings than we would arrive at through pure reason. Lisa Feldman Barrett’s theory of constructed emotion is a nice bridge to this topic, because her neuroscience-grounded claims about how emotions work (which can be quite counter-intuitive) are solidly grounded in just the kind of predictive architecture we talked about last class. | OPTIONAL Barrett, L. F. (2017). The theory of constructed emotion: an active inference account of interoception and categorization. *Social cognitive and affective neuroscience*, *12*(1), 1-23. https://affective-science.org/wp-content/uploads/2024/04/barrett-tce-scan-2017.pdf | |||||||||||||||||||||||
32 | ||||||||||||||||||||||||||||
33 | ||||||||||||||||||||||||||||
34 | ||||||||||||||||||||||||||||
35 | ||||||||||||||||||||||||||||
36 | ||||||||||||||||||||||||||||
37 | ||||||||||||||||||||||||||||
38 | ||||||||||||||||||||||||||||
39 | ||||||||||||||||||||||||||||
40 | ||||||||||||||||||||||||||||
41 | ||||||||||||||||||||||||||||
42 | ||||||||||||||||||||||||||||
43 | ||||||||||||||||||||||||||||
44 | ||||||||||||||||||||||||||||
45 | ||||||||||||||||||||||||||||
46 | ||||||||||||||||||||||||||||
47 | ||||||||||||||||||||||||||||
48 | ||||||||||||||||||||||||||||
49 | ||||||||||||||||||||||||||||
50 | ||||||||||||||||||||||||||||
51 | ||||||||||||||||||||||||||||
52 | ||||||||||||||||||||||||||||
53 | ||||||||||||||||||||||||||||
54 | ||||||||||||||||||||||||||||
55 | ||||||||||||||||||||||||||||
56 | ||||||||||||||||||||||||||||
57 | ||||||||||||||||||||||||||||
58 | ||||||||||||||||||||||||||||
59 | ||||||||||||||||||||||||||||
60 | ||||||||||||||||||||||||||||
61 | ||||||||||||||||||||||||||||
62 | ||||||||||||||||||||||||||||
63 | ||||||||||||||||||||||||||||
64 | ||||||||||||||||||||||||||||
65 | ||||||||||||||||||||||||||||
66 | ||||||||||||||||||||||||||||
67 | ||||||||||||||||||||||||||||
68 | ||||||||||||||||||||||||||||
69 | ||||||||||||||||||||||||||||
70 | ||||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||||
100 |