A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | AE | AF | AG | AH | AI | AJ | AK | AL | AM | AN | AO | AP | AQ | AR | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | topic | notes | link | sponsor | date | Votes | CC | MW | ME | YC | TK | SC | AL | MC | Dr. CAS | BD | MH | MB | TC | NJ | JY | DP | WC | Dr. EJ | PHL | AL | SA | LJ | DLK | TM | NER | |||||||||||||
2 | Nominated | So sad, but at least I get to sit next to Cory | ||||||||||||||||||||||||||||||||||||||||||
3 | Liu et al 2023 | Lost in the Middle: How Language Models Use Long Contexts | "We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts" | https://arxiv.org/pdf/2307.03172.pdf | Sara | 2 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
4 | Hu et al 2024 | Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale | https://arxiv.org/pdf/2403.08293.pdf | Christian | 0 | |||||||||||||||||||||||||||||||||||||||
5 | Papadimitriou and Jurafsky 2023 | Injecting structural hints: Using language models to study inductive biases in language learning | They pretrain transformers to develop different inductive biases (e.g. recursive structure, Zipfian distributions) and test the effects on downstream perplexity | https://aclanthology.org/2023.findings-emnlp.563.pdf | Christian | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
6 | Madusanka et al. 2023 | Not all quantifiers are equal: Probing transformer-based language models’ understanding of generalised quantifiers | Uses a new evaluation based on model checking in natural language | https://aclanthology.org/2023.emnlp-main.536.pdf | Christian | 0 | ||||||||||||||||||||||||||||||||||||||
7 | Timkey and Linzen 2023 | A Language Model with Limited Memory Capacity Captures Interference in Human Sentence Processing | The first author is a nice guy; this work extends looking for memory-based effects in Transformers a la Ryu and Lewis (2021) and Oh and Schuler (2022) | https://arxiv.org/pdf/2310.16142.pdf | Christian | 0 | 1 | |||||||||||||||||||||||||||||||||||||
8 | Portelance et al. 2023 | Predicting Age of Acquisition for Children's Early Vocabulary in Five Languages Using Language Model Surprisal | Tests whether predictability in context (i.e. surprisal) helps with children's word learning, above and beyond frequency and concreteness | https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13334?campaign=woletoc | Christian | 1 | 0 | 1 | ||||||||||||||||||||||||||||||||||||
9 | Evanson et al. 2023 | Language acquisition: do children and language models follow similar learning stages? | Uses probing tasks from BIG-Bench etc to evaluate how syntactic/semantic abilities emerge over the course of training GPT-2 | https://arxiv.org/pdf/2306.03586.pdf | Christian | 0 | 0 | 0 | 0 | |||||||||||||||||||||||||||||||||||
10 | McCoy et al. arXiv 2023 | Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve | Title to be contrasted with "Sparks of AGI"; basically shows differential performance of GPT-* as a function of frequency | https://arxiv.org/pdf/2309.13638.pdf | Byung-Doh | 1 | 0 | 0 | 1 | |||||||||||||||||||||||||||||||||||
11 | Kauf et al. PsyArXiv 2023 | Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language network | Might explain why BiW hasn't been observed on fMRI | https://www.biorxiv.org/content/10.1101/2023.05.05.539646v1.full | Byung-Doh | 1 | 0 | 1 | ||||||||||||||||||||||||||||||||||||
12 | Hosseini et al. Neurobiology of Language 2024 | Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training | https://direct.mit.edu/nol/article/doi/10.1162/nol_a_00137/119156/Artificial-Neural-Network-Language-Models-Predict | Byung-Doh | 0 | 0 | 0 | |||||||||||||||||||||||||||||||||||||
13 | Schaeffer et al. NeurIPS 2023 | Are Emergent Abilities of Large Language Models a Mirage? | https://openreview.net/pdf?id=ITw9edRDlD | Byung-Doh | 0 | |||||||||||||||||||||||||||||||||||||||
14 | von Oswald et al. ICML 2023 | Transformers Learn In-Context by Gradient Descent | Seems highly related to paper above (I might recommend this one instead of Dai et al. 2023) | https://arxiv.org/pdf/2212.07677.pdf | Byung-Doh | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
15 | Jelassi et al. arXiv 2024 | Repeat After Me: Transformers are Better than State Space Models at Copying Transformers are Better than State Space Models at Copying | Cool paper name; almost makes me think e.g. Mamba surprisal is worth comparing against e.g. GPT-2 surprisal | https://arxiv.org/abs/2402.01032 | Byung-Doh | 1 | 0 | 1 | 0 | |||||||||||||||||||||||||||||||||||
16 | Ezquerro et al. EACL 2024 | From Partial to Strictly Incremental Constituent Parsing | Pulling parses out of incremental LMs; the partial parses seemed extremely similar to those from left-corner parsers, but the first author didn't seem to know what left-corner parsers were when wm asked | https://aclanthology.org/2024.eacl-short.21.pdf | Byung-Doh | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
17 | Sakana AI arXiv 2024 | Evolutionary Optimization of Model Merging Recipes | Method for merging LLMs (didn't even know merging models was a thing) | https://arxiv.org/pdf/2403.13187.pdf | Byung-Doh | 0 | 0 | |||||||||||||||||||||||||||||||||||||
18 | Mahabadi et al. EACL 2024 | TESS: Text-to-Text Self-Conditioned Simplex Diffusion | Interested in how diffusion models might be applied to language modeling | https://aclanthology.org/2024.eacl-long.144.pdf | Byung-Doh | 2 | 0 | 1 | 1 | |||||||||||||||||||||||||||||||||||
19 | Isono Cognition 2024 | Category Locality Theory: A unified account of locality effects in sentence comprehension | Apparently better DLT with CCG (on Natural Stories) | https://www.sciencedirect.com/science/article/pii/S0010027724000520 | Byung-Doh | 0 | ||||||||||||||||||||||||||||||||||||||
20 | Google arXiv 2024 | RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | LM based on Google's new Griffin architecture (https://arxiv.org/abs/2402.19427) Are animals the new muppets now? Actually, the Griffin paper might be a better read | https://arxiv.org/abs/2404.07839 | Byung-Doh | 0 | ||||||||||||||||||||||||||||||||||||||
21 | Pasquiou et al. (2023) | Information-restricted neural language models reveal different brain regions’sensitivity to semantics, syntax and context | correlate large language model encodings tohuman reading times and fMRI data, respectively,and find that smaller models provide an equal (orbetter) fit to the human data. | https://arxiv.org/abs/2302.14389 | William | 0 | 1 | |||||||||||||||||||||||||||||||||||||
22 | Eisape et al. (2022) | Probing for incremental parsestates in autoregressive language models | probe LLM representations to arrive at incremental unlabeled dependency analyses | https://arxiv.org/abs/2211.09748 | William | 0 | ||||||||||||||||||||||||||||||||||||||
23 | Hoover et al 2022 | The Plausibility of Sampling as an Algorithmic Theory of Sentence Processing | Some thoughts about this paper: 1. This paper has a good review of work in surprisal theory and the functional relationship between surprisal and reading times. 2. Their main claims are that 1) the relationship between surprisal and RT is superlinear and therefore 2) sampling algorithms are promising as their time complexity scales exponentially as a function of surprisal. No concrete implementation of 2) is provided though. 3. The first claim supported by a GAM analysis which shows that the fitted curves are superlinear, especially for the larger PLMs. The authors seem to assume that larger PLMs are "better" and therefore provide stronger evidence wrt the relationship between surprisal and reading times. Oh and Schuler shows that this assumption is incorrect. 4. Using GAM instead of LMER is unlikely to change the conclusions of Oh and Schuler, since linearity vs. superlinearity makes the most different predictions at high-surprisal points, but the larger-gets-worse behavior of PLM surprisal is primarily driven by low-surprisal points. | https://files.ca-1.osf.io/v1/resources/qjnpv/providers/osfstorage/6351ab810ecb420e5e2eb105?format=pdf&action=download&direct&version=1 | Mike | 1 | 1 | |||||||||||||||||||||||||||||||||||||
24 | Bhagavatula et al ACL 2023 | I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation | Use of self-imitation very similar to self-training for NLG | https://aclanthology.org/2023.acl-long.535/ | Mike | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
25 | McCoy et al 2023 | How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN🐦⬛ | "we introduce RAVEN, a suite of analyses for assessing the novelty of generated text, focusing on sequential structure (n-grams) and syntactic structure. We apply these analyses to four neural language models trained on English (an LSTM, a Transformer, Transformer-XL, and GPT-2)." | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00567/116616 | Yi-Chien | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
26 | Dziri et al 2024 | Faith and fate: Limits of transformers on compositionality | "We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills." | https://proceedings.neurips.cc/paper_files/paper/2023/file/deb3c28192f979302c157cb653c15e90-Paper-Conference.pdf | Yi-Chien | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
27 | Munkhdalai et al 2024 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. | https://arxiv.org/pdf/2404.07143.pdf | Yi-Chien | 0 | ||||||||||||||||||||||||||||||||||||||
28 | Total: | Total: | 21 | 2 | 3 | 4 | 3 | 1 | 3 | 3 | 2 | #REF! | 3 | 1 | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | #REF! | |||||||||||||||
29 | History | Err check: | 19 | |||||||||||||||||||||||||||||||||||||||||
30 | Dai et al. ACL Findings 2023 | Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers | https://aclanthology.org/2023.findings-acl.247.pdf | Sara (Byung-Doh) | 4-18 | 0 | 0 | |||||||||||||||||||||||||||||||||||||
31 | Bietti et al 2023 | Birth of a Transformer: A Memory Viewpoint | analysis of how simple transformers do cued association | https://arxiv.org/pdf/2306.00802.pdf#page5 | Byung-Doh (William) | 4-11 (in oxley 102) | 3 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
32 | Murty et al. 2023 | Pushdown Layers: Encoding Recursive Structure in Transformer Language Models | New kind of transformer self-attention layer that helps with syntactic generalization | https://aclanthology.org/2023.emnlp-main.195/ | Christian | 4/4 | 4 | 1 | 1 | 1 | 0 | 1 | ||||||||||||||||||||||||||||||||
33 | Li et al ACL 2023 | Contrastive Decoding: Open-ended Text Generation as Optimization | Clever decoding (from clever folks) using the difference between a smart and dumb model | https://aclanthology.org/2023.acl-long.687/ | Mike (Yi-Chien) | 3/28 | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
34 | Chen et al. arXiv 2023 | Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs | Apparently syntactic heads appear at around 1000 training steps | https://arxiv.org/abs/2309.07311 | Byung-Doh | 3/7 | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
35 | Yamaki et al. ACL 2023 | Holographic CCG Parsing | Uses holographic embeddings, which allow for compositional operations in a continuous vector space | https://aclanthology.org/2023.acl-long.15.pdf | Christian | 2/29 | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
36 | Patel and Pavlick 2022 | Mapping Language Models to Grounded Conceptual Spaces | Predecessor to the Pavlick 2023 paper we read last semester that Mike shared with us (model learns relationships that aren't directly tied to the space) | https://openreview.net/pdf?id=gJcEM8sxHK | Sara | 2/22 | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
37 | Gu and Dao arXiv 2023 | Mamba: Linear-Time Sequence Modeling with Selective State Spaces | New architecture | https://arxiv.org/pdf/2312.00752.pdf | Byung-Doh | 2/15 | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
38 | Mahowald et al. 2023 | Dissociating language and thought in large language models: a cognitive perspective | Reviews linguistic vs functional competence in humans; argues LLMs need more non-linguistic cognitive capacities. | https://arxiv.org/abs/2301.06627 | Yi-Chien (Mike) | 2/8 | 6 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||
39 | Futrell (2023) | An information-theoretic account of availability effects in language production | Model of language production (incremental selection at the word level) based in information theory, cognitive science, and neuroscience Objective maximizes "communicative value" subject to an information theoretic constraint | https://escholarship.org/uc/item/23q9k7pc | Sara | 2/1 | 3 | 0 | 1 | 1 | 0 | 1 | ||||||||||||||||||||||||||||||||
40 | Piñango 2023 | Solving the elusiveness of word meanings: two arguments for a continuous meaning space for language | Model explains words that have multiple interdependent meanings ("smoke") or a large family of meanings ("have") | https://www.frontiersin.org/articles/10.3389/frai.2023.1025293/full | Christian | 1/25 | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
41 | Deepmind people arXiv 2023 | Reinforced Self-Training (ReST) for Language Modeling | Apparently more efficient version of RLHF | https://arxiv.org/pdf/2308.08998.pdf | Byung-Doh | 2024-01-18 | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
42 | Webb et al. Nature 2023 | Emergent analogical reasoning in large language models | Alex Petrov knows the authors; convinces him that LLMs are not just stochastic parrots | https://www.nature.com/articles/s41562-023-01659-w | Yi-Chien (Mike) | 2023-11-30 | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
43 | Lake and Baroni Nature 2023 | Human-like systematic generalization through a meta-learning neural network | https://www.nature.com/articles/s41586-023-06668-3 | Byung-Doh | 2023-11-16 | 0 | ||||||||||||||||||||||||||||||||||||||
44 | Pavlick 2023 | Semantic structure in deep learning | one of several recent papers looking at whether real-world semantic structure can be learned just from language data (earlier paper: https://www.annualreviews.org/doi/abs/10.1146/annurev-linguistics-031120-122924) | https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2022.0041 | Mike | 2023-11-02 | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
45 | Wang et al. 2023 | Finding Structure in One Child's Linguistic Experience | https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13305 | Christian | 10/19 | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
46 | Bailly et al. ACL 2023 | Syntax and Geometry of Information | "We study syntactic generalization from the perspective of the capacity to disentangle semantic and structural information" | https://aclanthology.org/2023.acl-long.590.pdf | Byung-Doh | 9/21 | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
47 | Li & Lu ACL 2023 | Contextual Distortion Reveals Constituency: Masked Language Models are Implicit Parsers | Tree reconstruction from MLMs | https://aclanthology.org/2023.acl-long.285.pdf | Christian | 9/14 | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
48 | UniLM people arXiv 2023 | Retentive Network: A Successor to Transformer for Large Language Models | New architecture! | https://arxiv.org/pdf/2307.08621.pdf | Byung-Doh | 9/7 | 2 | 1 | 1 | |||||||||||||||||||||||||||||||||||
49 | Piantadosi & Hill arXiv 2022 | Meaning without reference in large language models | https://arxiv.org/pdf/2208.02957.pdff | Mike/Christian | 8/31 | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
50 | Hahn et al. 2022 | A resource-rational model of human processing of recursive linguistic structure | https://www.pnas.org/doi/10.1073/pnas.2122602119 | Christian (from Byung-Doh) | 4/20 | 0 | ||||||||||||||||||||||||||||||||||||||
51 | Piantadosi LingBuzz 2023 | Modern language models refute Chomsky’s approach to language | Cited during Casillas talk; there's also a reply to this https://lingbuzz.net/lingbuzz/007190 | https://ling.auf.net/lingbuzz/0071800 | Byung-Doh | 4/13 | 0 | |||||||||||||||||||||||||||||||||||||
52 | Yedetore et al 2023 | How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech | Transformers and LSTMs trained on CHILDES don't pick up hierarchical structure | https://arxiv.org/abs/2301.11462 | Christian | 3/30 | 0 | |||||||||||||||||||||||||||||||||||||
53 | Meister and Cotterrell 2021 | Language Model Evaluation Beyond Perplexity | . | https://aclanthology.org/2021.acl-long.414.pdf | Christian | 1 | 1 | 0 | ||||||||||||||||||||||||||||||||||||
54 | Sinclair et al. TACL 2022 | Structural Persistence in Language Models: Priming as a Window into Abstract Language Representations | Priming language models (TACL) | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00504/113019/Structural-Persistence-in-Language-Models-Priming | Byung-Doh | 2/2 | 2 | 1 | 1 | |||||||||||||||||||||||||||||||||||
55 | Yang et al 2022 | Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars | Unsupervised parsing that can handle extraposition, wh-movement, etc | https://arxiv.org/pdf/2212.09140.pdf | Christian | 1/26 | 2 | 1 | 1 | |||||||||||||||||||||||||||||||||||
56 | Warstadt and Bowman 2022 | What Artificial Neural Networks Can Tell Us About Human Language Acquisition | https://arxiv.org/pdf/2208.07998.pdf | Christian | 11/17 | 5 | 1 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
57 | Prange et al. NAACL 2022 | Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling | Conditioning on syntax/semantic subgraphs improves GPT-2 perplexity, probably makes surprisal less humanlike though | https://aclanthology.org/2022.naacl-main.325.pdf | Byung-Doh | 12/1 | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||
58 | Li and Liang, 2021 | Prefix-Tuning: Optimizing Continuous Prompts for Generation | alternative to fine-tuning | https://aclanthology.org/2021.acl-long.353.pdf | Ash | 11/3 | 3 | 1 | 0 | 1 | 1 | |||||||||||||||||||||||||||||||||
59 | Niu and Penn 2020 | Grammaticality and Language Modelling | point biserial correlation for comparing NN output to human judgments, and some other improvements / tests | https://aclanthology.org/2020.eval4nlp-1.11/ | Willy | 10/27 | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
60 | Dettmers et al. Neurips 2022 | LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Personally interested to learn more about "emergent outliers" rather than the quantization technique | https://arxiv.org/pdf/2208.07339.pdf | Byung-Doh | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
61 | Tran et al 22 | Plex: Towards Reliability Using Pretrained Large Model Extensions | Google paper looking at reliability of LLMs including few-shot uncertainty; blog post: https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html | https://arxiv.org/pdf/2207.07411.pdf | Willy (Mike) | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
62 | Srivastava et al 2022 | Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | BIG-bench (set of 204 LM evaluation tasks) | https://arxiv.org/abs/2206.04615 | Christian | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
63 | Goldstein et al. NatNeurosci 2022 | Shared computational principles for language processing in humans and deep language models | GPT-2 embeddings X ECoG | https://www.nature.com/articles/s41593-022-01026-4 | Byung-Doh | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
64 | Caucheteux et al 2021 | Decomposing lexical and compositional syntax and semantics with deep language models | https://arxiv.org/pdf/2103.01620.pdf | Christian | 3 | 0 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
65 | Schuster and Linzen 2022 | When a sentence does not introduce a discourse referent, transformer-based models still sometimes refer to it | https://arxiv.org/pdf/2205.03472.pdf | Willy | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
66 | Jiang et al 2021 | How can we know when Language Models know? On the calibration of Language Models for Question Answering | Looking at probability estimates of T5, BART, GPT2 on QA task | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00407/107277/How-Can-We-Know-When-Language-Models-Know-On-the | Willy | 4/21 | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
67 | Ryu and Lewis 2021 | Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention | Also appeared in CMCL 2021 (https://aclanthology.org/2021.cmcl-1.6/) | https://arxiv.org/abs/2104.12874 | Christian | 4/7 | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
68 | Xu et al. ACL 2021 | Syntax-Enhanced Pre-trained Model | https://aclanthology.org/2021.acl-long.420.pdf | Byung-Doh | 3/31 | 2 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
69 | Davis & van Schijndel 2020 | Discourse structure interacts with reference but not syntax in neural language models | https://arxiv.org/pdf/2010.04887.pdf | Willy | 3/10 | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
70 | Stengel-Eskin et al 2021 | Joint Universal Syntactic and Semantic Parsing | Compares several model architectures for joint syntactic and semantic parsing on rich annotations from Universal Decompositional Semantics dataset | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00396/106796/Joint-Universal-Syntactic-and-Semantic-Parsing | Christian | 3/3 | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
71 | Mao et al 2021 | Grammar-Based Grounded Lexicon Learning | A method for learning lexical entries from grounded data like images and texts. Entries include syntactic types and "neuro-symbolic" semantic programs that combine lambda calculus expressions with neural network embeddings | https://proceedings.neurips.cc/paper/2021/file/4158f6d19559955bae372bb00f6204e4-Paper.pdf | Byung-Doh | 2/24 | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
72 | Elazar et al 2021 | Measuring and Improving Consistency in Pretrained Language Models | small paraphrase adversarial dataset with BERT based models | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00410/107384/Measuring-and-Improving-Consistency-in-Pretrained | Willy | 2/17 | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||
73 | Yang & Piantadosi 2022 | One model for the learning of language | https://www.pnas.org/content/119/5/e2021865119 | Christian | 2/10 | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
74 | Anthropic people 2021 | A Mathematical Framework for Transformer Circuits | GPT-2-ology (0-layer, 1-layer models) | https://transformer-circuits.pub/2021/framework/index.html | Byung-Doh | 2/3 | 0 | |||||||||||||||||||||||||||||||||||||
75 | Belinkov and Glass 2019 | Analysis methods in neural language processing: a survey | interested in exploring ways to test ling data with neural models, plus focuses on some perspectives not mentioned in other similar papers | https://doi.org/10.1162/tacl_a_00254 | Willy | 1/27 | 1 | 0 | ||||||||||||||||||||||||||||||||||||
76 | Guest and Martin 2021 | On logical inference over brains, behaviour, and artificial neural networks | Questions how much we can infer about the mind and brain from the behavior of neural network models ("if NN reproduces the pattern seen in brain activity, the brain must work like the NN") | https://psyarxiv.com/tbmcg/ | Christian | 1/20 | 0 | 0 | 0 | |||||||||||||||||||||||||||||||||||
77 | Li et al. ACL 2021 | How is BERT surprised? Layerwise detection of linguistic anomalies | https://aclanthology.org/2021.acl-long.325.pdf | Byung-Doh | 1 | 1 | 0 | |||||||||||||||||||||||||||||||||||||
78 | Stanojević, Steedman 2021 | Formal Basis of a Language universal | https://direct.mit.edu/coli/article/47/1/9/97333/Formal-Basis-of-a-Language-Universal | Nanjiang | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
79 | Stanojević et al 2021 | Modeling incremental language comprehension in the brain with Combinatory Categorial Grammar | https://aclanthology.org/2021.cmcl-1.3.pdf | Christian | 4 | 1 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
80 | Sanh et al 2021 | Multitask Prompted Training Enables Zero-Shot Task Generalization | to be discussed on 10/28 by popular demand | https://arxiv.org/pdf/2110.08207.pdf | Willy (from Mike) | |||||||||||||||||||||||||||||||||||||||
81 | Kuribayashi et al. ACL 2021 | Lower Perplexity is Not Always Human-Like | https://aclanthology.org/2021.acl-long.405.pdf | Byung-Doh | 0 | |||||||||||||||||||||||||||||||||||||||
82 | White and Cotterell 2021 | Examining the Inductive Bias of Neural Language Models with Artificial Languages | Nanjiang | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||||
83 | Aghajanyan et al 2021 | Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning | https://aclanthology.org/2021.acl-long.568/ | Christian | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
84 | Linzen and Baroni 2021 | Syntactic Structure from Deep Learning | 9/23 | https://www.annualreviews.org/doi/abs/10.1146/annurev-linguistics-032020-051035?cookieSet=1 | Willy (Christian) | 3 | 1 | 1 | 1 | |||||||||||||||||||||||||||||||||||
85 | Shen et al. ACL 2021 | StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling | 9/16 | https://aclanthology.org/2021.acl-long.559.pdf | Byung-Doh | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
86 | Press et al 2021 | Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | 9/9 | https://arxiv.org/abs/2108.12409 | Christian (Mike) | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
87 | Lewis and Bastiansen 2015 | A predictive coding framework for rapid neural dynamics during sentence-level language comprehension | https://www.sciencedirect.com/science/article/abs/pii/S0010945215000714 | Evan | 2 | 1 | 1 | |||||||||||||||||||||||||||||||||||||
88 | Beres 2017 | Time is of the Essence: A Review of Electroencephalography (EEG) and Event-Related Brain Potentials (ERPs) in Language Research | overview of ERPs in linguistic research | https://core.ac.uk/download/pdf/206525297.pdf | Willy | 4 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
89 | Li et al. AACL 2020 | Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads | https://www.aclweb.org/anthology/2020.aacl-main.43.pdf | Byung-Doh | 3 | 1 | 1 | 1 | 0 | |||||||||||||||||||||||||||||||||||
90 | Brothers & Kuperberg 2021 | Word predictability effects are linear, not logarithmic: Implications for probabilistic models of sentence comprehension | https://www.sciencedirect.com/science/article/pii/S0749596X20300887?casa_token=eP1ih9VvgCYAAAAA:b8PPt-3KkCybH56c6jOFyVqnVrC1xI4j1BGiKLtexmPziQaJ0HPxPkSZx7kSus1OJ37u_iHwbA | Cory | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
91 | 3/4 - CUNY day: https://www.cuny2021.io | |||||||||||||||||||||||||||||||||||||||||||
92 | Wilcox et al 20 | On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior | https://arxiv.org/pdf/2006.01912.pdf | Christian | 3 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
93 | Caplan et al 2020 | Miller's Monkey Updated: Communicative Efficiency and the Statistics of Words in Natural Language | https://ling.auf.net/lingbuzz/004660/current.pdf?_s=6GvkvSUSdQZc_66K | Cory | 1 | 0 | 0 | 0 | 1 | |||||||||||||||||||||||||||||||||||
94 | Steinert-Threlkeld and Szymanik 2020 | Ease of Learning Explains Semantic Universals | https://semanticsarchive.net/Archive/zM5ZGIxM/EaseLearning.pdf | Nanjiang | 3 | 1 | 0 | 1 | 0 | 1 | ||||||||||||||||||||||||||||||||||
95 | Meister et al. EMNLP 2020 | If Beam Search is the Answer, What was the Question? | https://www.aclweb.org/anthology/2020.emnlp-main.170.pdf | Byung-Doh | 2 | 1 | 1 | 0 | ||||||||||||||||||||||||||||||||||||
96 | Lopopolo et al 20 | Distinguishing syntactic operations in the brain: Dependency and phrase-structure parsing | https://www.mitpressjournals.org/doi/abs/10.1162/nol_a_00029 | Willy (from Cory) | 2 | 0 | 1 | 1 | ||||||||||||||||||||||||||||||||||||
97 | Venhuizen et al 19 | Expectation-based Comprehension: Modeling the Interaction of World Knowledge and Linguistic Experience | https://www.tandfonline.com/doi/pdf/10.1080/0163853X.2018.1448677 | Cory | 4 | 0 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
98 | Li et al 2019 | Specializing Word Embeddings (for Parsing) by Information Bottleneck | https://www.aclweb.org/anthology/D19-1276.pdf | Nanjiang | 3 | 1 | 0 | 1 | 1 | 0 | ||||||||||||||||||||||||||||||||||
99 | Kodner & Gupta ACL 2020 | Overestimation of Syntactic Representation in Neural Language Models | https://www.aclweb.org/anthology/2020.acl-main.160.pdf | Byung-Doh | 5 | 1 | 1 | 1 | 1 | 1 | ||||||||||||||||||||||||||||||||||
100 | Kuperberg and Jaeger 2016 | What do we mean by prediction in language comprehension? | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4850025/pdf/nihms-754635.pdf | Evan | 2 | 1 | 1 |