ABCDEFGHIJKLMNOPQRSTUVWXYZAAABACADAEAFAGAHAIAJAKALAMANAOAPAQAR
1
topic noteslinksponsordateVotesCCMWMEYCTKSCALMCDr. CASBDMHMBTCNJJYDPWCDr. EJPHLALSALJDLKTMNER
2
Nominated
So sad, but at least I get to sit next to Cory
3
Liu et al 2023Lost in the Middle: How Language Models Use Long Contexts"We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts"
https://arxiv.org/pdf/2307.03172.pdf
Sara2111
4
Hu et al 2024Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale
https://arxiv.org/pdf/2403.08293.pdf
Christian0
5
Papadimitriou and Jurafsky 2023Injecting structural hints: Using language models to study inductive biases in language learningThey pretrain transformers to develop different inductive biases (e.g. recursive structure, Zipfian distributions) and test the effects on downstream perplexity
https://aclanthology.org/2023.findings-emnlp.563.pdf
Christian111
6
Madusanka et al. 2023Not all quantifiers are equal: Probing transformer-based language models’ understanding of generalised quantifiersUses a new evaluation based on model checking in natural language
https://aclanthology.org/2023.emnlp-main.536.pdf
Christian0
7
Timkey and Linzen 2023A Language Model with Limited Memory Capacity Captures Interference in Human Sentence ProcessingThe first author is a nice guy; this work extends looking for memory-based effects in Transformers a la Ryu and Lewis (2021) and Oh and Schuler (2022)
https://arxiv.org/pdf/2310.16142.pdf
Christian01
8
Portelance et al. 2023Predicting Age of Acquisition for Children's Early Vocabulary in Five Languages Using Language Model SurprisalTests whether predictability in context (i.e. surprisal) helps with children's word learning, above and beyond frequency and concreteness
https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13334?campaign=woletoc
Christian101
9
Evanson et al. 2023Language acquisition: do children and language models follow similar learning stages?Uses probing tasks from BIG-Bench etc to evaluate how syntactic/semantic abilities emerge over the course of training GPT-2
https://arxiv.org/pdf/2306.03586.pdf
Christian0000
10
McCoy et al. arXiv 2023Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to SolveTitle to be contrasted with "Sparks of AGI"; basically shows differential performance of GPT-* as a function of frequency
https://arxiv.org/pdf/2309.13638.pdf
Byung-Doh1001
11
Kauf et al. PsyArXiv 2023Lexical semantic content, not syntactic structure, is the main contributor to ANN-brain similarity of fMRI responses in the language networkMight explain why BiW hasn't been observed on fMRI
https://www.biorxiv.org/content/10.1101/2023.05.05.539646v1.full
Byung-Doh101
12
Hosseini et al. Neurobiology of Language 2024Artificial Neural Network Language Models Predict Human Brain Responses to Language Even After a Developmentally Realistic Amount of Training
https://direct.mit.edu/nol/article/doi/10.1162/nol_a_00137/119156/Artificial-Neural-Network-Language-Models-Predict
Byung-Doh000
13
Schaeffer et al. NeurIPS 2023Are Emergent Abilities of Large Language Models a Mirage?
https://openreview.net/pdf?id=ITw9edRDlD
Byung-Doh0
14
von Oswald et al. ICML 2023Transformers Learn In-Context by Gradient DescentSeems highly related to paper above (I might recommend this one instead of Dai et al. 2023)
https://arxiv.org/pdf/2212.07677.pdf
Byung-Doh211
15
Jelassi et al. arXiv 2024Repeat After Me: Transformers are Better than State Space Models at Copying Transformers are Better than State Space Models at CopyingCool paper name; almost makes me think e.g. Mamba surprisal is worth comparing against e.g. GPT-2 surprisal
https://arxiv.org/abs/2402.01032
Byung-Doh1010
16
Ezquerro et al. EACL 2024From Partial to Strictly Incremental Constituent ParsingPulling parses out of incremental LMs; the partial parses seemed extremely similar to those from left-corner parsers, but the first author didn't seem to know what left-corner parsers were when wm asked
https://aclanthology.org/2024.eacl-short.21.pdf
Byung-Doh211
17
Sakana AI arXiv 2024Evolutionary Optimization of Model Merging RecipesMethod for merging LLMs (didn't even know merging models was a thing)
https://arxiv.org/pdf/2403.13187.pdf
Byung-Doh00
18
Mahabadi et al. EACL 2024TESS: Text-to-Text Self-Conditioned Simplex DiffusionInterested in how diffusion models might be applied to language modeling
https://aclanthology.org/2024.eacl-long.144.pdf
Byung-Doh2011
19
Isono Cognition 2024Category Locality Theory: A unified account of locality effects in sentence comprehensionApparently better DLT with CCG (on Natural Stories)
https://www.sciencedirect.com/science/article/pii/S0010027724000520
Byung-Doh0
20
Google arXiv 2024RecurrentGemma: Moving Past Transformers for Efficient Open Language ModelsLM based on Google's new Griffin architecture (https://arxiv.org/abs/2402.19427)
Are animals the new muppets now? Actually, the Griffin paper might be a better read
https://arxiv.org/abs/2404.07839
Byung-Doh0
21
Pasquiou et al. (2023)Information-restricted neural language models reveal different brain regions’sensitivity to semantics, syntax and contextcorrelate large language model encodings tohuman reading times and fMRI data, respectively,and find that smaller models provide an equal (orbetter) fit to the human data.
https://arxiv.org/abs/2302.14389
William01
22
Eisape et al. (2022)Probing for incremental parsestates in autoregressive language modelsprobe LLM representations to arrive at incremental unlabeled dependency analyses
https://arxiv.org/abs/2211.09748
William0
23
Hoover et al 2022The Plausibility of Sampling as an Algorithmic Theory of Sentence ProcessingSome thoughts about this paper:
1. This paper has a good review of work in surprisal theory and the functional relationship between surprisal and reading times.
2. Their main claims are that 1) the relationship between surprisal and RT is superlinear and therefore 2) sampling algorithms are promising as their time complexity scales exponentially as a function of surprisal. No concrete implementation of 2) is provided though.
3. The first claim supported by a GAM analysis which shows that the fitted curves are superlinear, especially for the larger PLMs. The authors seem to assume that larger PLMs are "better" and therefore provide stronger evidence wrt the relationship between surprisal and reading times. Oh and Schuler shows that this assumption is incorrect.
4. Using GAM instead of LMER is unlikely to change the conclusions of Oh and Schuler, since linearity vs. superlinearity makes the most different predictions at high-surprisal points, but the larger-gets-worse behavior of PLM surprisal is primarily driven by low-surprisal points.
https://files.ca-1.osf.io/v1/resources/qjnpv/providers/osfstorage/6351ab810ecb420e5e2eb105?format=pdf&action=download&direct&version=1
Mike11
24
Bhagavatula et al ACL 2023I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-ImitationUse of self-imitation very similar to self-training for NLG
https://aclanthology.org/2023.acl-long.535/
Mike3111
25
McCoy et al 2023How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN🐦‍⬛"we introduce RAVEN, a suite of analyses for assessing the novelty of generated text, focusing on sequential structure (n-grams) and syntactic structure. We apply these analyses to four neural language models trained on English (an LSTM, a Transformer, Transformer-XL, and GPT-2)."
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00567/116616
Yi-Chien211
26
Dziri et al 2024Faith and fate: Limits of transformers on compositionality"We formulate compositional tasks as computation graphs to systematically quantify the level of complexity, and break down reasoning steps into intermediate sub-procedures. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching, without necessarily developing systematic problem-solving skills."
https://proceedings.neurips.cc/paper_files/paper/2023/file/deb3c28192f979302c157cb653c15e90-Paper-Conference.pdf
Yi-Chien211
27
Munkhdalai et al 2024Leave No Context Behind:
Efficient Infinite Context Transformers with Infini-attention
This work introduces an efficient method to scale Transformer-based Large
Language Models (LLMs) to infinitely long inputs with bounded memory
and computation.
https://arxiv.org/pdf/2404.07143.pdf
Yi-Chien0
28
Total:Total:2123431332#REF!31#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!#REF!
29
HistoryErr check:19
30
Dai et al. ACL Findings 2023Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers
https://aclanthology.org/2023.findings-acl.247.pdf
Sara (Byung-Doh)
4-1800
31
Bietti et al 2023Birth of a Transformer: A Memory Viewpointanalysis of how simple transformers do cued association
https://arxiv.org/pdf/2306.00802.pdf#page5
Byung-Doh (William)
4-11 (in oxley 102)
31111
32
Murty et al. 2023Pushdown Layers: Encoding Recursive Structure in Transformer Language ModelsNew kind of transformer self-attention layer that helps with syntactic generalization
https://aclanthology.org/2023.emnlp-main.195/
Christian4/4411101
33
Li et al ACL 2023Contrastive Decoding: Open-ended Text Generation as OptimizationClever decoding (from clever folks) using the difference between a smart and dumb model
https://aclanthology.org/2023.acl-long.687/
Mike (Yi-Chien)3/2841111
34
Chen et al. arXiv 2023Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMsApparently syntactic heads appear at around 1000 training steps
https://arxiv.org/abs/2309.07311
Byung-Doh3/7511111
35
Yamaki et al. ACL 2023Holographic CCG ParsingUses holographic embeddings, which allow for compositional operations in a continuous vector space
https://aclanthology.org/2023.acl-long.15.pdf
Christian2/293111
36
Patel and Pavlick 2022Mapping Language Models to Grounded Conceptual SpacesPredecessor to the Pavlick 2023 paper we read last semester that Mike shared with us (model learns relationships that aren't directly tied to the space)
https://openreview.net/pdf?id=gJcEM8sxHK
Sara2/22511111
37
Gu and Dao arXiv 2023Mamba: Linear-Time Sequence Modeling with Selective State SpacesNew architecture
https://arxiv.org/pdf/2312.00752.pdf
Byung-Doh2/1541111
38
Mahowald et al. 2023Dissociating language and thought in large language models: a cognitive perspectiveReviews linguistic vs functional competence in humans; argues LLMs need more non-linguistic cognitive capacities.
https://arxiv.org/abs/2301.06627
Yi-Chien (Mike)2/86111111
39
Futrell (2023)An information-theoretic account of availability effects in language productionModel of language production (incremental selection at the word level) based in information theory, cognitive science, and neuroscience
Objective maximizes "communicative value" subject to an information theoretic constraint
https://escholarship.org/uc/item/23q9k7pc
Sara2/1301101
40
Piñango 2023Solving the elusiveness of word meanings: two arguments for a continuous meaning space for languageModel explains words that have multiple interdependent meanings ("smoke") or a large family of meanings ("have")
https://www.frontiersin.org/articles/10.3389/frai.2023.1025293/full
Christian1/25511111
41
Deepmind people arXiv 2023Reinforced Self-Training (ReST) for Language ModelingApparently more efficient version of RLHF
https://arxiv.org/pdf/2308.08998.pdf
Byung-Doh2024-01-18511111
42
Webb et al. Nature 2023Emergent analogical reasoning in large language modelsAlex Petrov knows the authors; convinces him that LLMs are not just stochastic parrots
https://www.nature.com/articles/s41562-023-01659-w
Yi-Chien (Mike)2023-11-3041111
43
Lake and Baroni Nature 2023Human-like systematic generalization through a meta-learning neural network
https://www.nature.com/articles/s41586-023-06668-3
Byung-Doh2023-11-160
44
Pavlick 2023Semantic structure in deep learningone of several recent papers looking at whether real-world semantic structure can be learned just from language data (earlier paper: https://www.annualreviews.org/doi/abs/10.1146/annurev-linguistics-031120-122924)
https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2022.0041
Mike2023-11-02511111
45
Wang et al. 2023Finding Structure in One Child's Linguistic Experience
https://onlinelibrary.wiley.com/doi/full/10.1111/cogs.13305
Christian10/1941111
46
Bailly et al. ACL 2023Syntax and Geometry of Information
"We study syntactic generalization from the perspective of the capacity to disentangle semantic and structural information"
https://aclanthology.org/2023.acl-long.590.pdf
Byung-Doh9/213111
47
Li & Lu ACL 2023Contextual Distortion Reveals Constituency: Masked Language Models are Implicit ParsersTree reconstruction from MLMs
https://aclanthology.org/2023.acl-long.285.pdf
Christian9/143111
48
UniLM people arXiv 2023Retentive Network: A Successor to Transformer for Large Language ModelsNew architecture!
https://arxiv.org/pdf/2307.08621.pdf
Byung-Doh9/7211
49
Piantadosi & Hill arXiv 2022Meaning without reference in large language models
https://arxiv.org/pdf/2208.02957.pdff
Mike/Christian8/313111
50
Hahn et al. 2022A resource-rational model of human processing of recursive linguistic structure
https://www.pnas.org/doi/10.1073/pnas.2122602119
Christian (from Byung-Doh)
4/200
51
Piantadosi LingBuzz 2023Modern language models refute Chomsky’s approach to language
Cited during Casillas talk; there's also a reply to this https://lingbuzz.net/lingbuzz/007190
https://ling.auf.net/lingbuzz/0071800
Byung-Doh4/130
52
Yedetore et al 2023How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech
Transformers and LSTMs trained on CHILDES don't pick up hierarchical structure
https://arxiv.org/abs/2301.11462
Christian3/300
53
Meister and Cotterrell 2021Language Model Evaluation Beyond Perplexity.
https://aclanthology.org/2021.acl-long.414.pdf
Christian110
54
Sinclair et al. TACL 2022Structural Persistence in Language Models: Priming as a Window into Abstract Language RepresentationsPriming language models (TACL)
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00504/113019/Structural-Persistence-in-Language-Models-Priming
Byung-Doh2/2211
55
Yang et al 2022Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars
Unsupervised parsing that can handle extraposition, wh-movement, etc
https://arxiv.org/pdf/2212.09140.pdf
Christian1/26211
56
Warstadt and Bowman 2022What Artificial Neural Networks Can Tell Us About Human Language Acquisition
https://arxiv.org/pdf/2208.07998.pdf
Christian11/17511111
57
Prange et al. NAACL 2022Linguistic Frameworks Go Toe-to-Toe at Neuro-Symbolic Language Modeling
Conditioning on syntax/semantic subgraphs improves GPT-2 perplexity, probably makes surprisal less humanlike though
https://aclanthology.org/2022.naacl-main.325.pdf
Byung-Doh12/1511111
58
Li and Liang, 2021Prefix-Tuning: Optimizing Continuous Prompts for Generationalternative to fine-tuning
https://aclanthology.org/2021.acl-long.353.pdf
Ash11/331011
59
Niu and Penn 2020Grammaticality and Language Modelling
point biserial correlation for comparing NN output to human judgments, and some other improvements / tests
https://aclanthology.org/2020.eval4nlp-1.11/
Willy10/273111
60
Dettmers et al. Neurips 2022LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Personally interested to learn more about "emergent outliers" rather than the quantization technique
https://arxiv.org/pdf/2208.07339.pdf
Byung-Doh3111
61
Tran et al 22Plex: Towards Reliability Using Pretrained Large Model Extensions
Google paper looking at reliability of LLMs including few-shot uncertainty; blog post: https://ai.googleblog.com/2022/07/towards-reliability-in-deep-learning.html
https://arxiv.org/pdf/2207.07411.pdf
Willy (Mike)3111
62
Srivastava et al 2022Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language modelsBIG-bench (set of 204 LM evaluation tasks)
https://arxiv.org/abs/2206.04615
Christian3111
63
Goldstein et al. NatNeurosci 2022Shared computational principles for language processing in humans and deep language modelsGPT-2 embeddings X ECoG
https://www.nature.com/articles/s41593-022-01026-4
Byung-Doh3111
64
Caucheteux et al 2021Decomposing lexical and compositional syntax and semantics with deep language models
https://arxiv.org/pdf/2103.01620.pdf
Christian30111
65
Schuster and Linzen 2022When a sentence does not introduce a discourse referent, transformer-based models still sometimes refer to it
https://arxiv.org/pdf/2205.03472.pdf
Willy3111
66
Jiang et al 2021How can we know when Language Models know? On the calibration of Language Models for Question Answering
Looking at probability estimates of T5, BART, GPT2 on QA task
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00407/107277/How-Can-We-Know-When-Language-Models-Know-On-the
Willy4/213111
67
Ryu and Lewis 2021Accounting for Agreement Phenomena in Sentence Comprehension with Transformer Language Models: Effects of Similarity-based Interference on Surprisal and Attention
Also appeared in CMCL 2021 (https://aclanthology.org/2021.cmcl-1.6/)
https://arxiv.org/abs/2104.12874
Christian4/741111
68
Xu et al. ACL 2021Syntax-Enhanced Pre-trained Model
https://aclanthology.org/2021.acl-long.420.pdf
Byung-Doh3/31211
69
Davis & van Schijndel 2020Discourse structure interacts with reference but not syntax in neural language models
https://arxiv.org/pdf/2010.04887.pdf
Willy3/1041111
70
Stengel-Eskin et al 2021Joint Universal Syntactic and Semantic Parsing
Compares several model architectures for joint syntactic and semantic parsing on rich annotations from Universal Decompositional Semantics dataset
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00396/106796/Joint-Universal-Syntactic-and-Semantic-Parsing
Christian3/341111
71
Mao et al 2021Grammar-Based Grounded Lexicon Learning
A method for learning lexical entries from grounded data like images and texts. Entries include syntactic types and "neuro-symbolic" semantic programs that combine lambda calculus expressions with neural network embeddings
https://proceedings.neurips.cc/paper/2021/file/4158f6d19559955bae372bb00f6204e4-Paper.pdf
Byung-Doh2/243111
72
Elazar et al 2021Measuring and Improving Consistency in Pretrained Language Models
small paraphrase adversarial dataset with BERT based models
https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00410/107384/Measuring-and-Improving-Consistency-in-Pretrained
Willy2/1741111
73
Yang & Piantadosi 2022One model for the learning of language
https://www.pnas.org/content/119/5/e2021865119
Christian2/1041111
74
Anthropic people 2021A Mathematical Framework for Transformer CircuitsGPT-2-ology (0-layer, 1-layer models)
https://transformer-circuits.pub/2021/framework/index.html
Byung-Doh2/30
75
Belinkov and Glass 2019Analysis methods in neural language processing: a survey
interested in exploring ways to test ling data with neural models, plus focuses on some perspectives not mentioned in other similar papers
https://doi.org/10.1162/tacl_a_00254
Willy1/2710
76
Guest and Martin 2021On logical inference over brains, behaviour, and artificial neural networks
Questions how much we can infer about the mind and brain from the behavior of neural network models ("if NN reproduces the pattern seen in brain activity, the brain must work like the NN")
https://psyarxiv.com/tbmcg/
Christian1/20000
77
Li et al. ACL 2021How is BERT surprised? Layerwise detection of linguistic anomalies
https://aclanthology.org/2021.acl-long.325.pdf
Byung-Doh110
78
Stanojević, Steedman 2021Formal Basis of a Language universal
https://direct.mit.edu/coli/article/47/1/9/97333/Formal-Basis-of-a-Language-Universal
Nanjiang3111
79
Stanojević et al 2021Modeling incremental language comprehension in the brain with Combinatory Categorial Grammar
https://aclanthology.org/2021.cmcl-1.3.pdf
Christian41111
80
Sanh et al 2021Multitask Prompted Training Enables Zero-Shot Task Generalizationto be discussed on 10/28 by popular demand
https://arxiv.org/pdf/2110.08207.pdf
Willy (from Mike)
81
Kuribayashi et al. ACL 2021Lower Perplexity is Not Always Human-Like
https://aclanthology.org/2021.acl-long.405.pdf
Byung-Doh0
82
White and Cotterell 2021Examining the Inductive Bias of Neural Language Models with Artificial LanguagesNanjiang3111
83
Aghajanyan et al 2021Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
https://aclanthology.org/2021.acl-long.568/
Christian511111
84
Linzen and Baroni 2021Syntactic Structure from Deep Learning9/23
https://www.annualreviews.org/doi/abs/10.1146/annurev-linguistics-032020-051035?cookieSet=1
Willy (Christian)3111
85
Shen et al. ACL 2021StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling9/16
https://aclanthology.org/2021.acl-long.559.pdf
Byung-Doh41111
86
Press et al 2021Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation9/9
https://arxiv.org/abs/2108.12409
Christian (Mike)41111
87
Lewis and Bastiansen 2015A predictive coding framework for rapid neural dynamics during sentence-level language comprehension
https://www.sciencedirect.com/science/article/abs/pii/S0010945215000714
Evan211
88
Beres 2017Time is of the Essence: A Review of Electroencephalography (EEG) and Event-Related Brain Potentials (ERPs) in Language Researchoverview of ERPs in linguistic research
https://core.ac.uk/download/pdf/206525297.pdf
Willy41111
89
Li et al. AACL 2020Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads
https://www.aclweb.org/anthology/2020.aacl-main.43.pdf
Byung-Doh31110
90
Brothers & Kuperberg 2021Word predictability effects are linear, not logarithmic: Implications for probabilistic models of sentence comprehension
https://www.sciencedirect.com/science/article/pii/S0749596X20300887?casa_token=eP1ih9VvgCYAAAAA:b8PPt-3KkCybH56c6jOFyVqnVrC1xI4j1BGiKLtexmPziQaJ0HPxPkSZx7kSus1OJ37u_iHwbA
Cory3111
91
3/4 - CUNY day: https://www.cuny2021.io
92
Wilcox et al 20On the Predictive Power of Neural Language Models for Human Real-Time Comprehension Behavior
https://arxiv.org/pdf/2006.01912.pdf
Christian3111
93
Caplan et al 2020Miller's Monkey Updated: Communicative Efficiency and the Statistics of Words in Natural Language
https://ling.auf.net/lingbuzz/004660/current.pdf?_s=6GvkvSUSdQZc_66K
Cory10001
94
Steinert-Threlkeld and Szymanik 2020 Ease of Learning Explains Semantic Universals
https://semanticsarchive.net/Archive/zM5ZGIxM/EaseLearning.pdf
Nanjiang310101
95
Meister et al. EMNLP 2020If Beam Search is the Answer, What was the Question?
https://www.aclweb.org/anthology/2020.emnlp-main.170.pdf
Byung-Doh2110
96
Lopopolo et al 20Distinguishing syntactic operations in the brain: Dependency and phrase-structure parsing
https://www.mitpressjournals.org/doi/abs/10.1162/nol_a_00029
Willy (from Cory)2011
97
Venhuizen et al 19Expectation-based Comprehension: Modeling the Interaction of World Knowledge and Linguistic Experience
https://www.tandfonline.com/doi/pdf/10.1080/0163853X.2018.1448677
Cory401111
98
Li et al 2019Specializing Word Embeddings (for Parsing) by Information Bottleneck
https://www.aclweb.org/anthology/D19-1276.pdf
Nanjiang310110
99
Kodner & Gupta ACL 2020Overestimation of Syntactic Representation in Neural Language Models
https://www.aclweb.org/anthology/2020.acl-main.160.pdf
Byung-Doh511111
100
Kuperberg and Jaeger 2016What do we mean by prediction in language comprehension?
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4850025/pdf/nihms-754635.pdf
Evan211