ABCDEFGHIJKLMNOPQRSTUVWXYZAA
1
Index nameFormula NameFormula Variable derived fromReferences
2
Variable (i)Coefficient
3
Flesch-Reading-EaseFlesch Reading Ease FormulaConstant206.835N/AFlesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221.
4
Average number of words per sentence - (i x 1.015)spaCy
5
Average number of syllables per word- (i x 84.6)custom Python code
6
Flesch-Kincaid-ReadabilityFlesch Kincaid Grade Level FormulaConstant-15.59N/AKincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
7
Average number of words per sentence i x 0.39spaCy
8
Average number of syllables per wordi x 11.8custom Python code
9
Automated-Reading-IndexAutomated Readability IndexConstant-21.43N/AKincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
10
Average number of words per sentence i x 0.5spaCy
11
Average number of characters per wordi x 4.71custom Python code
12
SMOG Readability FormulaSMOG GradingConstant3N/AMc Laughlin, G. H. (1969). SMOG grading-a new readability formula. Journal of reading, 12(8), 639-646.
13
Square root of pollysyllabic words per 30 sentencei x 1spaCy
14
New Dale-Chall Readability FormulaNew Dale-Chall Readability FormulaAverage number of words per sentence i x 0.0496spaCyChall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.
15
Percentage of difficult wordsi x 0.1579ReaderBench
16
CARECCrowdsourced algorithm of reading comprehensionConstant1.811N/A
Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: new methods and new models. Journal of Research in Reading, 42(3-4), 541-561.
17
Average age of acquisition (Kuperman) for all content wordsi x 0.022TAALES (Kuperman_AoA_CW)
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior research methods, 44(4), 978-990.
18
Average bigram range score (COCA) for all wordsi x 0.746TAALES (COCA_Academic_Bigram_Range)
Davies, M. (2009). The Corpus of Contemporary American English (COCA): 400+ million words, 1990-present (2008). Available online at http://www. americancorpus. org.
19
Average trigram proportion score (BNC-written) for all words- (i x 0.742)TAALES (BNC_Written_Trigram_Proportion)
BNC Consortium. (2007). The British national corpus, version 3 (BNC XML Edition). Distributed by Oxford University Computing Services on behalf of the BNC Consortium, 5(65), 6.
20
Average imageability score (MRC) for all content words- (i x 0.001)TAALES (MRC_Imageability_CW)Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
21
Average frequency score (Brown) for all wordsi x 0.0000625TAALES (Brown_Freq_AW)
Brown, G. D. (1984). A frequency count of 190,000 words in theLondon-Lund Corpus of English Conversation. Behavior research methods, instruments, & computers, 16(6), 502-532.
22
Average type token ratio of lemma trigrams for all trigrams- (i x 0.699)TAACO (trigram_lemma_ttr)Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
23
Proportion of lemma types that occur in the next paragraph for all paragraphs- (i x 0.111)TAACO (adjacent_overlap_all_para)
24
Number of temporal connectives divided by number of words in text- (i x 2.067)TAACO (all_temporal)
25
Proportion of noun lemma types that occur in the next paragraph for all paragraphsi x 0.035TAACO (adjacent_overlap_noun_sent_div_seg)
26
Number of content word lemma typesi x 0.002TAACO (nlemma_content_types)
27
Positive adjective scores derived from 4 different corpora- (i x 0.08)SEANCE (positive_adjectives_component)
Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social order analysis. Behavior Research Methods 49(3), pp. 803-821. doi:10.3758/s13428-016-0743-z.
28
Average standard deviation of word length for all wordsi x 0.047ReaderBench (RB.WdLettStdDev)Dascalu, M., Dessus, P., Trausan-Matu, Ş., Bianco, M., & Nardy, A. (2013, July). ReaderBench, an environment for analyzing text complexity and reading strategies. In International Conference on Artificial Intelligence in Education (pp. 379-388). Springer, Berlin, Heidelberg.
29
Average character entropy for all characters- (i x 0.395)ReaderBench (RB.CharEnt)
30
CAREC_MCrowdsourced algorithm of reading comprehension modifiedConstant1.811N/A
Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: new methods and new models. Journal of Research in Reading, 42(3-4), 541-561.
31
Average age of acquisition (Kuperman) for all content wordsi x 0.022TAALES (Kuperman_AoA_CW)
Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior research methods, 44(4), 978-990.
32
Average bigram range score (COCA) for all wordsi x 0.746TAALES (COCA_Academic_Bigram_Range)
Davies, M. (2009). The Corpus of Contemporary American English (COCA): 400+ million words, 1990-present (2008). Available online at http://www. americancorpus. org.
33
Average trigram proportion score (BNC-written) for all words- (i x 0.742)TAALES (BNC_Written_Trigram_Proportion)
BNC Consortium. (2007). The British national corpus, version 3 (BNC XML Edition). Distributed by Oxford University Computing Services on behalf of the BNC Consortium, 5(65), 6.
34
Average imageability score (MRC) for all content words- (i x 0.001)TAALES (MRC_Imageability_CW)Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
35
Average frequency score (Brown) for all wordsi x 0.0000625TAALES (Brown_Freq_AW)
Brown, G. D. (1984). A frequency count of 190,000 words in theLondon-Lund Corpus of English Conversation. Behavior research methods, instruments, & computers, 16(6), 502-532.
36
Average type token ratio of lemma trigrams for all trigrams- (i x 0.699)TAACO (trigram_lemma_ttr)Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
37
Proportion of lemma types that occur in the next paragraph for all paragraphs- (i x 0.111)TAACO (adjacent_overlap_all_para)
38
Number of temporal connectives divided by number of words in text- (i x 2.067)TAACO (all_temporal)
39
Proportion of noun lemma types that occur in the next paragraph for all paragraphsi x 0.035TAACO (adjacent_overlap_noun_sent_div_seg)
40
Number of content word lemma types divided by number of content wordsi x 0.2TAACO (nlemma_content_types)
41
Positive adjective scores derived from 4 different corpora- (i x 0.08)SEANCE (positive_adjectives_component)
Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social order analysis. Behavior Research Methods 49(3), pp. 803-821. doi:10.3758/s13428-016-0743-z.
42
Average standard deviation of word length for all wordsi x 0.047ReaderBench (RB.WdLettStdDev)Dascalu, M., Dessus, P., Trausan-Matu, Ş., Bianco, M., & Nardy, A. (2013, July). ReaderBench, an environment for analyzing text complexity and reading strategies. In International Conference on Artificial Intelligence in Education (pp. 379-388). Springer, Berlin, Heidelberg.
43
Average character entropy for all characters- (i x 0.395)ReaderBench (RB.CharEnt)
44
CARESCrowdsourced algorithm of reading speedConstant-0.862N/A
Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: new methods and new models. Journal of Research in Reading, 42(3-4), 541-561.
45
Average word naming response time for all wordsi x 0.003TAALES (WN_Mean_RT)
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., ... & Treiman, R. (2007). The English lexicon project. Behavior research methods, 39(3), 445-459.
46
Average concreteness score (MRC) for all words- (i x 0.001)TAALES (MRC_Concreteness_AW)Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
47
Average semantic distinctiveness scores for all words- (i x 0.461)TAALES (Sem_D)
Hoffman, P., Ralph, M. A. L., & Rogers, T. T. (2013). Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words. Behavior research methods, 45(3), 718-730.
48
Number of content word lemmasi x 0.004TAACO (nlemma_content_words)Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
49
Number of function words i x 0.002TAACO (nfunction_words)
50
Complex nominals per T-uniti x 0.011TAASSC (CN_T)Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4):474-496.
51
Number of dependents per direct objecti x 0.023TAASSC (av_dobj_deps)
Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication (Doctoral Dissertation).
52
Average number of sentences per paragraph- (i x 0.015)ReaderBench (RB.BlStDevSen)Dascalu, M., Dessus, P., Trausan-Matu, Ş., Bianco, M., & Nardy, A. (2013, July). ReaderBench, an environment for analyzing text complexity and reading strategies. In International Conference on Artificial Intelligence in Education (pp. 379-388). Springer, Berlin, Heidelberg.
53
Average number of characters per wordi x 0.062ReaderBench (RB.WdLettStdDev)
54
CML2RICoh-Metrix L2 Readability Index (approximated)Constant-43.142N/ACrossley, S. A., & McNamara, D. S. (2008). Assessing L2 reading texts at the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy & McNamara (2007). Language Teaching, 41(3), 409-429.
55
Number of sentences in texti x 0.642spaCy
56
Average frequency score (SUBTLEXus) for all content words loggedi x 12.671TAALES (SUBTLEXus_Freq_CW_Log)Brysbaert, M., & New, B. (2009). Subtlexus: American word frequencies. Http:/Subtlexus. Lexique. Org.
57
Proportion of noun and pronoun lemma types that occur in the next two sentences for all sentencesi x 29.619TAACO (adjacent_overlap_2_argument_sent)
Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100