NCELP Residential 3���Phonics, Vocabulary and Grammar testing: Principles, Design, and Creation
Date updated: 02/03/20
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
NCELP Phonics Assessments ���Principles, design, creation
NCELP Residential 3
Robert Woore
Date updated: 02/03/20
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Phonics assessment team effort!
Robert Woore
Rachel Hawkes
Nick Avery
Giulia Bovolenta
Inge Alferink
Natalie Finlayson
Emma Marsden
All the NCELP team!
Native speakers (checking & audio)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Outline
1.1 What are we trying to test?
1.2 How can we test it, validly and practically?
2. Question types
2.1 Task design
2.2 Item selection
3.1 What do we mean by ‘accurate’ decoding?
3.2 How the scoring works
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
1. Principles of Phonics assessment
Objectives
Why are we testing this?
transcription reading aloud
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Skill acquisition theory
Practice with control, with awareness.
Errors happen, speed is variable.
Practice requires less awareness
Less error, faster, less variability in speed.
Declarative knowledge may be lost.
proceduralisation
automatisation
Declarative knowledge
Procedural knowledge
Automatised knowledge
Ultimate goal!
Fluent, accurate decoding
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Accuracy and fluency
< Wagen >
/ v ɑː ɡ ə n /
Fluency and accuracy – both are important in L2 learning
BUT fluency presupposes accuracy. There is no point automatizing incorrect knowledge!
🡪 Primacy of accuracy at this stage
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Accuracy and fluency
< Wagen >
/ w a ɡ ə n /
<w> 🡪 / v /
<a> 🡪 / ɑ ː /
/ v ɑː ɡ ə n /
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
a. Print to sound
Scales image: by scott_kirkwood, openclipart.org.
Screenshot from Phonics Screening Check training video, DfE. www.youtube.com/watch?v=IPJ_ZEBh1Bk
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
a. Print to sound
Pseudowords:
🡪 A ‘pure’ measure of their knowledge of symbol-sound correspondences
However, we considered pseudowords problematic:
Standards and Testing Agency, 2018: 2018 key stage 1 phonics screening check: pupils’ materials
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
a. Print to sound
Familiar versus unfamiliar words
pain
grain
trois
chois
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
a. Print to sound
p < .001, d = 0.33
<pain> - /pæ̃/
<grain> - /ɡɹɛɪn/
<chat> - /ʃa/
<fat> - /fat/
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
a. Print to sound
Summary so far
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
b. Sound to print
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
a. Print to sound
ver, vers, verre, verres, vert, verts, vair
* vaire, * vaires, * verd, * verds, * vère, * vères, * vaîre, * vaîres …
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2. Item selection
For each language:
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
SSC in French
SSC | French 1 | French 2 | French 3 |
SFC –x | toux | poux | houx |
SFC -d | jard | lard | dard |
SFC –t | prit | vit | mit |
SFC –s | bris | gis | spis |
a | nase | prase | jase |
i | brime | cime | trime |
eu | yeuse | beuse | Meuse |
e | reloge | remous | recel |
o | lot | plot | trot |
eau | veau | seau | sceau |
au | chaule | gaule | taule |
u | cossu | cornu | crépu |
ou | zou | clou | flou |
é | pré | dé | ré |
en | ment | sent | gent |
French SSC | French 1 | French 2 | French 3 |
an | flan | cran | ban |
on | pond | tond | gond |
SFe | rame | crame | lame |
in | crin | clin | pin |
ain | tain | nain | zain |
è | sème | pèse | sève |
ê | vête | blême | bêche |
ai | daine | raine | gaine |
oi | aloi | coi | aboi |
ch | croche | hoche | loche |
ç | glaçon | maçon | pinçon |
qu | quinte | quine | quille |
j | jauge | jonc | joug |
-tion | dation | rection | brution |
-ien | salien | jovien | danien |
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2. Item selection
Scales image: by scott_kirkwood, openclipart.org.
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2. Item selection
✔
🗶
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Sample items – reading aloud
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Sample items – reading aloud
🡪 Time limit = compromise!
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Sample items – reading aloud
15 items to read aloud
t
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Sample items – dictation
🡪 Time limit = compromise!
Number of items?
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Sample items – dictation
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Pen-and-paper tests:
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Scoring – reading aloud
A note on ‘foreign accent’. Two dimensions to accuracy in reading aloud:
This is our focus for the phonics assessment!
< v e a u >
/ vjʉ / / viː əʊ /
🗶
/ vo /
✔
/ vəʊ /
✔
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Scoring – reading aloud
1 mark / item = max. 15 marks in total |
0 marks: Incorrect pronunciation based on English SSC. 1 mark: Correct knowledge of target SSC, but pronunciation with an English accent. 1 mark: Correct pronunciation of target SSC Notes: Give marks for correct pronunciation of target SSC in bold even if other parts of the word are mispronounced / not attempted. Be lenient when scoring. If you think the students have decoded the symbol (graphemes) to the correct sound (phonemes), then you can allow for some degree of foreign accent in pupils’ pronunciation of the target sounds. A foreign accent is hard to shift even for the most dedicated learner after years of practice; and people are perfectly intelligible with a foreign accent. In our teaching and in our phonics, we are targeting SSC knowledge (phonics) rather than native-like pronunciation (phonetics). |
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Scoring
Dictation task – automatically scored by the software platform ☺
Reading aloud task – marked individually by teachers (☹?)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
SOW-based vocabulary testing for NCELP Y7�
Principles, design, creation
NCELP Residential 3
Natalie Finlayson / Emma Marsden
Date updated: 02/03/20
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Outline
2.1 Breadth of knowledge testing
2.2 Depth of knowledge testing
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
1. Vocab testing team
Test leads: Natalie Finlayson & Emma Marsden
French
Test developer
Kirsten Somerville
Proofreader
Ivan Avaca
Spanish
Test developer
Ivan Avaca
Proofreaders
Nick Avery
Pep Mateos Gonzalez
German
Test developers
Inge Alferink
Natalie Finlayson
Proofreaders
Inge Alferink
Natalie Finlayson
With huge thanks to: Giulia Bovolenta, Victoria Hobson, Stephen Owen, Helen Thomas, Catherine Morris, Ciaran Morris, Jack Peacock, Laurence Anthony
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2. Vocab testing principles
Objectives
To test breadth and depth of knowledge of vocabulary studied in Y7 Terms 1.1.1 - 2.1.1
Considerations
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2.1 Breadth of knowledge
Objective 1: To find out how many words students know
Sample size
SOW coverage:
French: 43%* Spanish: 35% German: 35%
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2.1 Breadth of knowledge
Objective 1: To find out how many words students know
Item pool
(French: 2:1:2; Spanish: 3:1:3; German: 2:1:3)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2.2 Depth of knowledge
Objective 2: To find out how well students know target words
Year 8+
tested separately
Year 8+
Nation 2013, p. 538
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2.2 Depth of knowledge
How well do you know the words we have learned so far?
1. I have seen this word before.
2. I know what the word means.
3. I can read the word aloud.
4. I can spell the word correctly.
5. I can use the word in a sentence.
6. I know the gender of nouns.
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
2.2 Depth of knowledge
Recognition tests
Recall
Definitions adapted from Jones (2004)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 1
Modality
Listening
Type of activity
Spoken meaning recognition
Knowledge tested
Meaning (definition)
Read 2000, Chapter 3
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 2
Modality
Listening
Type of activity
Spoken meaning recognition
Knowledge tested
Meaning (definition) Meaning (association)
Read 2000, Chapter 3
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 3
Modality
Reading
Type of activity
Written meaning recall
Knowledge tested
Meaning (definition)
Read 2000, p. 163
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 4
Modality
Reading
Type of activity
Written meaning recall
Knowledge tested
Meaning (definition)
Use (collocation)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 5
Modality
Writing
Type of activity
Written form recall
Knowledge tested
Form (written)
Meaning (definition)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 6
Modality
Writing
Type of activity
Written form recall
Knowledge tested
Form (written)
Meaning (definition)
Use (collocation)
Laufer & Nation 1995
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question type 7
Modality
Speaking
Type of activity
Spoken form recall
Knowledge tested
Form (spoken)
Meaning (definition)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question types
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
3. Question types
80 x test items
Listening
20 x spoken meaning recognition
Reading
20 x written
meaning recall
Writing
20 x written
form recall
Speaking
20 x spoken
form recall
40 x receptive
40 x productive
Bias towards recall:
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
4. Scoring (binary)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
4. Scoring (tolerance)
For now, after 18 weeks lessons, we are semi-tolerant of article and accent errors,
if lemma (word) is correct
For now, after 18 weeks of lessons, we are tolerant of accent errors and semi-tolerant of article errors
if lemma (word) is correct
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
4. Scoring (tolerance)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
4. Scoring (tolerance)
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
5. Summary
The NCELP Y7 vocabulary test …
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
References
Cariana, R. B., & Lee, D. (2001). The effects of recognition and recall study tasks with feedback in a computer-based vocabulary lesson. Educational Technology Research & Development 49 (3), pp. 23-36.
Glover, J. A. (1989). The "testing" phenomenon: Not gone but nearly forgotten. Journal of Educational Psychology, 81(3), 392-399.
Jones, L. (2004). Testing L2 vocabulary recognition and recall using pictorial and written test items. Language Learning & Technology 8 (3), pp. 122-143.
Laufer, B., & Nation, P. (1995). Vocabulary Size and Use Lexical Richness in L2 Written Production. Applied Linguistics, 16, pp. 307-322.
McDaniel, M. A., & Mason, M. E. J. (1985). Altering memory representations through retrieval. Journal of experimental psychology. Learning, Memory and Cognition 11, pp. 371-385.
Nation, I. S. P. (2013). Learning vocabulary in another language. Cambridge: Cambridge University Press
Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.
Stewart, J. (2014). Do Multiple-Choice Options Inflate Estimates of Vocabulary Size on the VST? Language Assessment Quarterly, 11(3), pp. 271–282. doi:10.1080/15434303.2014.922977
Stewart, J., & White, D. A. (2011). Estimating guessing effects on the vocabulary levels test for differing degrees of word knowledge. TESOL Quarterly, 45 (2), pp. 370–380. doi:10.5054/tq.2011.254523
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
Questions?
Thanks for listening!
Any questions?
natalie_eloise
Material licensed as CC BY-NC-SA 4.0�
Rachel Hawkes
SOW-based grammar testing for NCELP Y7�
Principles, design, creation
NCELP Residential 3
Rowena Kasprowicz / Nicholas Avery / Stephen Owen
Last updated: 02/03/20
Outline
4. Test creation and example items
5. Scoring
1. Grammar testing team
Test lead: Rowena Kasprowicz
French
Test developers
Stephen Owen
Rowena Kasprowicz
Proofreader
Ivan Avaca
Spanish
Test developers
Nicholas Avery
Rowena Kasprowicz
Proofreader
Amanda Izquierdo
German
Test developers
Stephen Owen
Rowena Kasprowicz
Proofreaders
Inge Alfernik
Natalie Finlayson
With huge thanks to: Giulia Bovolenta, Victoria Hobson, Chloé Motard, Geraldine Bengsch
2. Grammar testing principles
Objective
To test receptive and productive knowledge of grammar studied in Y7 Terms 1.1.1-2.1.6
Considerations
3. Coverage of grammar: GERMAN
Grammar feature | Reading | Listening | Writing | Speaking |
Present continuous formation Two forms in English vs. one in TL | - | - | 4 items | 6 items |
Question formation Subject-verb inversion; do-aux in English vs. TL | 4 items | 4 items | ||
Subject-verb agreement (weak) 1st / 2nd / 3rd singular | 3 items | 3 items | ||
Subject-verb agreement (irregular) haben / sein (1st, 2nd, 3rd sing, 1st pl); mögen (1st, 2nd, 3rd sing) | 3 items | 3 items | 4 items | 4 items |
Article agreement Def / indef; gender; number; case (nom/acc) | 4 items | - | 4 items | |
Plural noun formation -en; umlaut + -e; -e | - | - | 3 items | - |
Negation nicht + verb; nicht + adjective | 4 items | - | - | - |
Subject and object pronoun agreement Gender; number; case (nom/acc) | 4 items | - | 3 items | 3 items |
3. Coverage of grammar: FRENCH
Grammar feature | Reading | Listening | Writing | Speaking |
Present continuous formation Two forms in English vs. one in TL | - | - | 8 items | 6 items |
Question formation Intonation; do-aux in English vs. TL | - | - | ||
Subject-verb agreement (regular -ER) 1st / 2nd / 3rd singular; 1st / 2nd / 3rd plural | 10 items | - | ||
Subject-verb agreement (irregular) être (all persons); avoir (all persons); faire (all persons); aller (1st, 2nd, 3rd sing) | 8 items | - | 4 items | 4 items |
Article & adjective agreement Def / indef; gender; number | 4 items | - | 4 items | |
Adjectival word order Post-nominal | 4 items | - | - | |
Preposition “to” + article | - | 4 items | 4 items | 3 items |
“il y a” vs. “est” vs. “a” | 4 items | - | - | - |
3. Coverage of grammar: SPANISH
Grammar feature | Reading | Listening | Writing | Speaking |
Present continuous formation Two forms in English vs. one in TL | - | - | 6 items | 6 items |
Question formation Intonation; do-aux in English vs. TL | - | - | ||
Subject-verb agreement (regular -AR) 1st / 2nd / 3rd singular | 4 items | 4 items | ||
Subject-verb agreement (irregular) estar (1st, 2nd, 3rd sing); ser (1st, 2nd, 3rd sing / 3rd pl); tener (1st, 2nd, 3rd sing / 1st, 3rd pl); querer (1st, 2nd, 3rd sing); hacer (1st, 2nd, 3rd sing); dar (1st, 2nd, 3rd sing) | 6 items | 6 items | 4 items (all) | 4 items (tener, querer) |
Article & adjective agreement Def / indef; gender; number | 4 items | - | 4 items | |
Adjectival word order Post-nominal | 4 items | - | - | |
Negation no + verb | 4 items | - | - | 3 items (-AR) |
“hay” vs. “tiene” | 4 items | - | - | - |
4. Test creation process
Each question tests a specific grammatical feature (or combination of features)
Size of the test
Question item pool
(e.g. equally likely to be tested on 1st person singular as 2nd person singular in subject-verb
agreement questions, etc.)
Variation between languages in weighting of different modes / modalities, due to variation in nature of grammar features being tested in each language.
4. Test creation: examples items
Testing written and aural receptive knowledge
Multiple choice options appear in a random order for each item
(Avoid position indicating correct answer)
4. Test creation: examples items
Testing written and aural receptive knowledge
Isolating recognition of gender and number
4. Test creation: examples items
Isolating receptive knowledge of syntax
Remove punctuation
(Avoid . or ? indicating answer)
Including variety of ‘subjects’
(Avoid reliance on ‘du’ to indicate question)
4. Test creation: examples items
Isolating productive knowledge of syntax
Article / noun / adjective appear in a random (vertical) order for each item.
(Ensures learner is paying attention to each element and its correct position)
4. Test creation: examples items
Testing written productive knowledge
Testing understanding of question formation
(Check pupils understand that ‘do’ auxiliary is not needed)
Isolating grammatical knowledge by providing verb infinitive
(Avoid missing answers due to lack of lexical knowledge)
4. Test creation: examples items
Testing written productive knowledge
Isolating grammatical knowledge by indicating gender (directly or via article).
(Ensure that pupils are not reliant on recalling gender of a specific lexical item)
4. Test creation: examples items
Testing written productive knowledge
Testing syntax alongside subject-verb agreement.
Half of the pool included verbs in present continuous form
(Check pupils’ understanding that there is one present tense structure for simple and continuous meanings)
4. Test creation: examples items
Testing oral productive knowledge
Testing subject-verb agreement and question formation.
Isolating grammatical knowledge by providing verb infinitive
(Avoid missing answers due to lack of lexical knowledge)
Note: glosses not provided for other elements of the sentence which are not being tested (e.g. object)
4. Test creation: examples items
Testing oral productive knowledge
No gloss provided for irregular verbs
(Irregular verb forms taught as individual lexical items, rather than transforming from the infinitive)
Isolating knowledge of gender / number (& case) agreement for articles (& adjectives)
(Noun form provided and gender indicated)
5. Scoring
Items testing written and oral receptive knowledge = multiple choice
Correct option
Incorrect options
5. Scoring
Items testing written productive knowledge = open text response
Possible answers manually coded
e.g. article agreement (gender / number / case / definiteness)
subject-verb agreement (pronoun / verb ending)
🡪 Not possible to account for any other spelling errors (note: target words provided in glosses)
Pilot phase: data from first round of testing will help to identify common learner errors
5. Scoring
0.5 mark for incorrect indefinite article (but gender correct)
0.5 mark for incorrect case
(but gender correct)
0.5 mark for incorrect indefinite article (but gender correct)
0.5 mark for incorrect case and indefinite article (but gender correct)
Examples of partial marking
Examples of partial marking
5. Scoring
Examples of partial marking
Examples of partial marking
0.5 mark deducted for missing accent (changes meaning)
0.5 mark deducted for missing accent (changes meaning)
5. Scoring
Examples of partial marking
Examples of partial marking
0.5 mark for correct pronoun
0.5 mark for correct verb ending
0.5 deducted for incorrect verb ending
0.5 deducted for incorrect pronoun
5. Scoring
Examples of partial marking
Examples of partial marking
1 mark for correct word order
1 mark for correct subject-verb agreement
0.5 deducted for incorrect verb ending
0.5 deducted for incorrect pronoun
1 deducted for incorrect WO
0.5 deducted for incorrect verb ending
0.5 deducted for incorrect pronoun
1 deducted for incorrect WO
0.5 deducted for incorrect verb ending
1 deducted for incorrect WO
0.5 deducted for incorrect pronoun