ARTE_index description sheet

	A	B	C	D	E	F
1	Index name	Formula Name	Formula		Variable derived from	References
2	Index name	Formula Name	Variable (i)	Coefficient	Variable derived from	References
3	Flesch-Reading-Ease	Flesch Reading Ease Formula	Constant	206.835	N/A	Flesch, R. (1948). A new readability yardstick. Journal of applied psychology, 32(3), 221.
4			Average number of words per sentence	- (i x 1.015)	spaCy
5			Average number of syllables per word	- (i x 84.6)	custom Python code
6	Flesch-Kincaid-Readability	Flesch Kincaid Grade Level Formula	Constant	-15.59	N/A	Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
7			Average number of words per sentence	i x 0.39	spaCy
8			Average number of syllables per word	i x 11.8	custom Python code
9	Automated-Reading-Index	Automated Readability Index	Constant	-21.43	N/A	Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L., & Chissom, B. S. (1975). Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel.
10			Average number of words per sentence	i x 0.5	spaCy
11			Average number of characters per word	i x 4.71	custom Python code
12	SMOG Readability Formula	SMOG Grading	Constant	3	N/A	Mc Laughlin, G. H. (1969). SMOG grading-a new readability formula. Journal of reading, 12(8), 639-646.
13	SMOG Readability Formula	SMOG Grading	Square root of pollysyllabic words per 30 sentence	i x 1	spaCy
14	New Dale-Chall Readability Formula	New Dale-Chall Readability Formula	Average number of words per sentence	i x 0.0496	spaCy	Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.
15	New Dale-Chall Readability Formula	New Dale-Chall Readability Formula	Percentage of difficult words	i x 0.1579	ReaderBench
16	CAREC	Crowdsourced algorithm of reading comprehension	Constant	1.811	N/A	Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: new methods and new models. Journal of Research in Reading, 42(3-4), 541-561.
17			Average age of acquisition (Kuperman) for all content words	i x 0.022	TAALES (Kuperman_AoA_CW)	Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior research methods, 44(4), 978-990.
18			Average bigram range score (COCA) for all words	i x 0.746	TAALES (COCA_Academic_Bigram_Range)	Davies, M. (2009). The Corpus of Contemporary American English (COCA): 400+ million words, 1990-present (2008). Available online at http://www. americancorpus. org.
19			Average trigram proportion score (BNC-written) for all words	- (i x 0.742)	TAALES (BNC_Written_Trigram_Proportion)	BNC Consortium. (2007). The British national corpus, version 3 (BNC XML Edition). Distributed by Oxford University Computing Services on behalf of the BNC Consortium, 5(65), 6.
20			Average imageability score (MRC) for all content words	- (i x 0.001)	TAALES (MRC_Imageability_CW)	Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
21			Average frequency score (Brown) for all words	i x 0.0000625	TAALES (Brown_Freq_AW)	Brown, G. D. (1984). A frequency count of 190,000 words in theLondon-Lund Corpus of English Conversation. Behavior research methods, instruments, & computers, 16(6), 502-532.
22			Average type token ratio of lemma trigrams for all trigrams	- (i x 0.699)	TAACO (trigram_lemma_ttr)	Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
23			Proportion of lemma types that occur in the next paragraph for all paragraphs	- (i x 0.111)	TAACO (adjacent_overlap_all_para)
24			Number of temporal connectives divided by number of words in text	- (i x 2.067)	TAACO (all_temporal)
25			Proportion of noun lemma types that occur in the next paragraph for all paragraphs	i x 0.035	TAACO (adjacent_overlap_noun_sent_div_seg)
26			Number of content word lemma types	i x 0.002	TAACO (nlemma_content_types)
27			Positive adjective scores derived from 4 different corpora	- (i x 0.08)	SEANCE (positive_adjectives_component)	Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social order analysis. Behavior Research Methods 49(3), pp. 803-821. doi:10.3758/s13428-016-0743-z.
28			Average standard deviation of word length for all words	i x 0.047	ReaderBench (RB.WdLettStdDev)	Dascalu, M., Dessus, P., Trausan-Matu, Ş., Bianco, M., & Nardy, A. (2013, July). ReaderBench, an environment for analyzing text complexity and reading strategies. In International Conference on Artificial Intelligence in Education (pp. 379-388). Springer, Berlin, Heidelberg.
29			Average character entropy for all characters	- (i x 0.395)	ReaderBench (RB.CharEnt)
30	CAREC_M	Crowdsourced algorithm of reading comprehension modified	Constant	1.811	N/A	Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: new methods and new models. Journal of Research in Reading, 42(3-4), 541-561.
31			Average age of acquisition (Kuperman) for all content words	i x 0.022	TAALES (Kuperman_AoA_CW)	Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior research methods, 44(4), 978-990.
32			Average bigram range score (COCA) for all words	i x 0.746	TAALES (COCA_Academic_Bigram_Range)	Davies, M. (2009). The Corpus of Contemporary American English (COCA): 400+ million words, 1990-present (2008). Available online at http://www. americancorpus. org.
33			Average trigram proportion score (BNC-written) for all words	- (i x 0.742)	TAALES (BNC_Written_Trigram_Proportion)	BNC Consortium. (2007). The British national corpus, version 3 (BNC XML Edition). Distributed by Oxford University Computing Services on behalf of the BNC Consortium, 5(65), 6.
34			Average imageability score (MRC) for all content words	- (i x 0.001)	TAALES (MRC_Imageability_CW)	Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
35			Average frequency score (Brown) for all words	i x 0.0000625	TAALES (Brown_Freq_AW)	Brown, G. D. (1984). A frequency count of 190,000 words in theLondon-Lund Corpus of English Conversation. Behavior research methods, instruments, & computers, 16(6), 502-532.
36			Average type token ratio of lemma trigrams for all trigrams	- (i x 0.699)	TAACO (trigram_lemma_ttr)	Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
37			Proportion of lemma types that occur in the next paragraph for all paragraphs	- (i x 0.111)	TAACO (adjacent_overlap_all_para)
38			Number of temporal connectives divided by number of words in text	- (i x 2.067)	TAACO (all_temporal)
39			Proportion of noun lemma types that occur in the next paragraph for all paragraphs	i x 0.035	TAACO (adjacent_overlap_noun_sent_div_seg)
40			Number of content word lemma types divided by number of content words	i x 0.2	TAACO (nlemma_content_types)
41			Positive adjective scores derived from 4 different corpora	- (i x 0.08)	SEANCE (positive_adjectives_component)	Crossley, S. A., Kyle, K., & McNamara, D. S. (2017). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social order analysis. Behavior Research Methods 49(3), pp. 803-821. doi:10.3758/s13428-016-0743-z.
42			Average standard deviation of word length for all words	i x 0.047	ReaderBench (RB.WdLettStdDev)	Dascalu, M., Dessus, P., Trausan-Matu, Ş., Bianco, M., & Nardy, A. (2013, July). ReaderBench, an environment for analyzing text complexity and reading strategies. In International Conference on Artificial Intelligence in Education (pp. 379-388). Springer, Berlin, Heidelberg.
43			Average character entropy for all characters	- (i x 0.395)	ReaderBench (RB.CharEnt)
44	CARES	Crowdsourced algorithm of reading speed	Constant	-0.862	N/A	Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: new methods and new models. Journal of Research in Reading, 42(3-4), 541-561.
45			Average word naming response time for all words	i x 0.003	TAALES (WN_Mean_RT)	Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., ... & Treiman, R. (2007). The English lexicon project. Behavior research methods, 39(3), 445-459.
46			Average concreteness score (MRC) for all words	- (i x 0.001)	TAALES (MRC_Concreteness_AW)	Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology Section A, 33(4), 497-505.
47			Average semantic distinctiveness scores for all words	- (i x 0.461)	TAALES (Sem_D)	Hoffman, P., Ralph, M. A. L., & Rogers, T. T. (2013). Semantic diversity: A measure of semantic ambiguity based on variability in the contextual usage of words. Behavior research methods, 45(3), 718-730.
48			Number of content word lemmas	i x 0.004	TAACO (nlemma_content_words)	Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
49			Number of function words	i x 0.002	TAACO (nfunction_words)
50			Complex nominals per T-unit	i x 0.011	TAASSC (CN_T)	Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4):474-496.
51			Number of dependents per direct object	i x 0.023	TAASSC (av_dobj_deps)	Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication (Doctoral Dissertation).
52			Average number of sentences per paragraph	- (i x 0.015)	ReaderBench (RB.BlStDevSen)	Dascalu, M., Dessus, P., Trausan-Matu, Ş., Bianco, M., & Nardy, A. (2013, July). ReaderBench, an environment for analyzing text complexity and reading strategies. In International Conference on Artificial Intelligence in Education (pp. 379-388). Springer, Berlin, Heidelberg.
53			Average number of characters per word	i x 0.062	ReaderBench (RB.WdLettStdDev)
54	CML2RI	Coh-Metrix L2 Readability Index (approximated)	Constant	-43.142	N/A	Crossley, S. A., & McNamara, D. S. (2008). Assessing L2 reading texts at the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy & McNamara (2007). Language Teaching, 41(3), 409-429.
55			Number of sentences in text	i x 0.642	spaCy
56			Average frequency score (SUBTLEXus) for all content words logged	i x 12.671	TAALES (SUBTLEXus_Freq_CW_Log)	Brysbaert, M., & New, B. (2009). Subtlexus: American word frequencies. Http:/Subtlexus. Lexique. Org.
57			Proportion of noun and pronoun lemma types that occur in the next two sentences for all sentences	i x 29.619	TAACO (adjacent_overlap_2_argument_sent)	Crossley, S. A., Kyle, K., & Dascalu, M. (in press). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating Semantic Similarity and Text Overlap. Behavioral Research Methods.
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100