JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 76

NCELP Residential 3��Phonics, Vocabulary and Grammar testing: Principles, Design, and Creation

Date updated: 02/03/20

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

2 of 76

NCELP Phonics Assessments ��Principles, design, creation

NCELP Residential 3

Robert Woore

Date updated: 02/03/20

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

3 of 76

Phonics assessment team effort!

Robert Woore

Rachel Hawkes

Nick Avery

Giulia Bovolenta

Inge Alferink

Natalie Finlayson

Emma Marsden

All the NCELP team!

Native speakers (checking & audio)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

4 of 76

Outline

Principles of Phonics assessment

1.1 What are we trying to test?

1.2 How can we test it, validly and practically?

2. Question types

2.1 Task design

2.2 Item selection

Scoring

3.1 What do we mean by ‘accurate’ decoding?

3.2 How the scoring works

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

5 of 76

1. Principles of Phonics assessment

Objectives

To test knowledge of Symbol-Sound Correspondences (SSC) covered to date in Y7 (i.e. a syllabus-based achievement test)

Why are we testing this?

SSC knowledge is used

when reading aloud (including ‘inner voice’) – linked to reading comprehension, vocabulary acquisition (inter alia)
when listening (segmentation) and writing (spelling)

Testing is bidirectional: both sound 🡪 print and print 🡪 sound

transcription reading aloud

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

6 of 76

Skill acquisition theory

Practice with control, with awareness.

Errors happen, speed is variable.

Practice requires less awareness

Less error, faster, less variability in speed.

Declarative knowledge may be lost.

proceduralisation

automatisation

Declarative knowledge

Procedural knowledge

Automatised knowledge

Ultimate goal!

Fluent, accurate decoding

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

7 of 76

Accuracy and fluency

< Wagen >

/ v ɑː ɡ ə n /

Fluency and accuracy – both are important in L2 learning

Accuracy: allows correct spoken and written forms of words to be learnt and used effectively in communication
Fluency: reduces burden on working memory, thus allowing other ‘higher-level’ processes to be carried out more easily (e.g. inferencing when reading; conceptualizing ideas and formulating language when writing).

BUT fluency presupposes accuracy. There is no point automatizing incorrect knowledge!

🡪 Primacy of accuracy at this stage

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

8 of 76

Accuracy and fluency

L1 ‘interference’ and its impact on fluency / accuracy:

< Wagen >

/ w a ɡ ə n /

<w> 🡪 / v /

<a> 🡪 / ɑ ː /

/ v ɑː ɡ ə n /

More accurate L2 decoding associated with longer response times: Erler, 2003 (Year 7 learners of French in the UK); Li, 2019 (Chinese university students learning English)
Speeded tests may make pupils more likely to fall back on L1 knowledge
Main target of our phonics testing in Year 7 = accuracy
Ideally, untimed (‘unspeeded’) tests. But practical constraints 🡪 time limit

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

9 of 76

a. Print to sound

A number of different formats have been used to measure print-to-sound decoding

Rhyme judgment task (Erler & Macaro, 2012); Sound-Alike Task (SALT) (Woore et al., 2018 – but French only)

We believe the most valid method of assessing this is a ‘reading aloud task’ (RAT) (Woore 2006)

Cf. Year 1 ‘Phonics Screening Check’ (DfE)

But – trade-off between validity and ease of administration and scoring!

Scales image: by scott_kirkwood, openclipart.org.

Screenshot from Phonics Screening Check training video, DfE. www.youtube.com/watch?v=IPJ_ZEBh1Bk

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

10 of 76

a. Print to sound

Year 1 ‘Phonics Screening Check’ (DfE):
Real words and pseudowords
What’s the difference? Why include pseudowords?

Pseudowords:

By definition, unknown to the learners
Learners cannot draw on existing knowledge of the words’ pronunciations

🡪 A ‘pure’ measure of their knowledge of symbol-sound correspondences

However, we considered pseudowords problematic:

Ethically / pedagogically
In terms of validity (Papagno, Valentine & Baddeley, 1991)

Unfamiliar real words
Low frequency
Not in SoW to date

Standards and Testing Agency, 2018: 2018 key stage 1 phonics screening check: pupils’ materials

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

11 of 76

a. Print to sound

Familiar versus unfamiliar words

Woore (2018) – compared Y10 MFL pupils’ pronunciation of familiar / unfamiliar French words
For each participant, a set of orthographically matched word pairs was created. On average, 14 words pairs per participant.
Each pair consisted of 1 word they knew and 1 word they didn’t know (as shown by a vocab test), sharing the same spelling body
Unit of analysis = spelling body (phonological rime)
…thus controlling for word-level variables

pain

grain

trois

chois

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

12 of 76

a. Print to sound

p < .001, d = 0.33

<pain> - /pæ̃/

<grain> - /ɡɹɛɪn/

<chat> - /ʃa/

<fat> - /fat/

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

13 of 76

a. Print to sound

Summary so far

Reading Aloud Test
Administered individually
Unfamiliar words to test SSC knowledge (to stop them using existing knowledge of whole word pronunciations)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

14 of 76

b. Sound to print

Transcription task – can they write what they hear?
However… languages vary in “orthographic depth”:

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

15 of 76

a. Print to sound

So, in French particularly 🡪 multiple possible spellings for a given sound

For unfamiliar words: proliferation of ‘correct’ answers

ver, vers, verre, verres, vert, verts, vair

* vaire, * vaires, * verd, * verds, * vère, * vères, * vaîre, * vaîres …

So, we tweaked the response format to constrain possible responses

Blanks for individual letters.
E.g. “ v _ _ ”

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

16 of 76

2. Item selection

For each language:

We listed all the target SSC covered in the SoW
For each SSC, we identified 3 low-frequency words (>5000 frequency ranking for French and Spanish, >4037 for German) and ensured not covered in SoW
Therefore, pupils very unlikely to know them!
Where possible, we controlled for orthographic and phonological length – i.e. short monosyllabic words (French and German) or mostly 2 syllables (Spanish)
We tried to avoid words that look like English words (interlingual homographs), taboo words, etc.!

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

17 of 76

SSC in French

SSC	French 1	French 2	French 3
SFC –x	toux	poux	houx
SFC -d	jard	lard	dard
SFC –t	prit	vit	mit
SFC –s	bris	gis	spis
a	nase	prase	jase
i	brime	cime	trime
eu	yeuse	beuse	Meuse
e	reloge	remous	recel
o	lot	plot	trot
eau	veau	seau	sceau
au	chaule	gaule	taule
u	cossu	cornu	crépu
ou	zou	clou	flou
é	pré	dé	ré
en	ment	sent	gent

French SSC	French 1	French 2	French 3
an	flan	cran	ban
on	pond	tond	gond
SFe	rame	crame	lame
in	crin	clin	pin
ain	tain	nain	zain
è	sème	pèse	sève
ê	vête	blême	bêche
ai	daine	raine	gaine
oi	aloi	coi	aboi
ch	croche	hoche	loche
ç	glaçon	maçon	pinçon
qu	quinte	quine	quille
j	jauge	jonc	joug
-tion	dation	rection	brution
-ien	salien	jovien	danien

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

18 of 76

2. Item selection

How many items in the test?

Completeness of coverage for individual pupils – diagnostic feedback
Testing burden and time demands – aim to fit tests within a single lesson!
We don’t want assessment to soak up too much teaching and learning time!

Scales image: by scott_kirkwood, openclipart.org.

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

19 of 76

2. Item selection

15 items per pupil
On-line test: random selection of SSC for both the reading aloud and dictation components. Random selection (1 of three items) for each SSC
Class as a whole should cover all the SSC, but individual pupils do not

✔

Takes less time for individuals to complete
Prevents copying – all individuals have a different selection of items
Diagnostic feedback (to inform teaching) at whole class level

🗶

Incomplete diagnostic feedback for individuals

But – these are summative tests.
Diagnostic tests are already available – teacher- / self- / peer-assessment

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

20 of 76

Sample items – reading aloud

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

21 of 76

Sample items – reading aloud

🡪 Time limit = compromise!

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

22 of 76

Sample items – reading aloud

15 items to read aloud

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

23 of 76

Sample items – dictation

🡪 Time limit = compromise!

Number of items?

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

24 of 76

Sample items – dictation

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

25 of 76

Pen-and-paper tests:

Pre-selection of 15 items

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

26 of 76

3. Scoring – reading aloud

A note on ‘foreign accent’. Two dimensions to accuracy in reading aloud:

Accuracy of SSC knowledge: how graphemes / symbols are mapped onto L2 phonemes / sounds
Accuracy of pronunciation of L2 phonemes / sounds.

This is our focus for the phonics assessment!

< v e a u >

/ vjʉ / / viː əʊ /

🗶

/ vo /

✔

/ vəʊ /

✔

A foreign accent is hard to shift even for the most dedicated learner (even if they want to)
You can be intelligible (and comprehensible) with a foreign accent.
We are testing SSC knowledge (phonics) rather than native-like pronunciation (phonetics).

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

27 of 76

3. Scoring – reading aloud

1 mark / item = max. 15 marks in total
0 marks: Incorrect pronunciation based on English SSC. 1 mark: Correct knowledge of target SSC, but pronunciation with an English accent. 1 mark: Correct pronunciation of target SSC Notes: Give marks for correct pronunciation of target SSC in bold even if other parts of the word are mispronounced / not attempted. Be lenient when scoring. If you think the students have decoded the symbol (graphemes) to the correct sound (phonemes), then you can allow for some degree of foreign accent in pupils’ pronunciation of the target sounds. A foreign accent is hard to shift even for the most dedicated learner after years of practice; and people are perfectly intelligible with a foreign accent. In our teaching and in our phonics, we are targeting SSC knowledge (phonics) rather than native-like pronunciation (phonetics).

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

28 of 76

3. Scoring

Dictation task – automatically scored by the software platform ☺

Reading aloud task – marked individually by teachers (☹?)

Workload implications – need to make individual decisions about this.
One option would be to sit with individuals as they complete the speaking part of the test, and mark it in ‘real time’ as they go along.

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

29 of 76

SOW-based vocabulary testing for NCELP Y7�

Principles, design, creation

NCELP Residential 3

Natalie Finlayson / Emma Marsden

Date updated: 02/03/20

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

30 of 76

Outline

Vocab testing team

Principles of vocabulary testing

2.1 Breadth of knowledge testing

2.2 Depth of knowledge testing

Question types

Scoring

Summary

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

31 of 76

1. Vocab testing team

Test leads: Natalie Finlayson & Emma Marsden

French

Test developer

Kirsten Somerville

Proofreader

Ivan Avaca

Spanish

Test developer

Ivan Avaca

Proofreaders

Nick Avery

Pep Mateos Gonzalez

German

Test developers

Inge Alferink

Natalie Finlayson

Proofreaders

Inge Alferink

Natalie Finlayson

With huge thanks to: Giulia Bovolenta, Victoria Hobson, Stephen Owen, Helen Thomas, Catherine Morris, Ciaran Morris, Jack Peacock, Laurence Anthony

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

32 of 76

2. Vocab testing principles

Objectives

To test breadth and depth of knowledge of vocabulary studied in Y7 Terms 1.1.1 - 2.1.1

Considerations

Syllabus-based achievement test

Productive vs receptive knowledge

Recall vs recognition

Oral and written modalities

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

33 of 76

2.1 Breadth of knowledge

Objective 1: To find out how many words students know

Sample size

Target time: 12 minutes

Pilot testing indicates students can comfortably answer one question in 9 seconds

720 / 9 = 80 total test items (split equally between L, R, W & S)

SOW coverage:

French: 43%* Spanish: 35% German: 35%

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

34 of 76

2.1 Breadth of knowledge

Objective 1: To find out how many words students know

Item pool

To test overall knowledge of words in SOW – words distributed amongst students

Words are equally likely to be selected from randomised pools

All words in the SOW appear in the test once only

Part of speech ratios (noun : verb : other) broadly upheld within different question types

(French: 2:1:2; Spanish: 3:1:3; German: 2:1:3)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

35 of 76

2.2 Depth of knowledge

Objective 2: To find out how well students know target words

Year 8+

tested separately

Year 8+

Nation 2013, p. 538

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

36 of 76

2.2 Depth of knowledge

How well do you know the words we have learned so far?

1. I have seen this word before.

2. I know what the word means.

3. I can read the word aloud.

4. I can spell the word correctly.

5. I can use the word in a sentence.

6. I know the gender of nouns.

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

37 of 76

2.2 Depth of knowledge

Recognition tests

multiple choice activities whereby learners select or guess the correct response from the alternatives given
such tests may strengthen any existing memory traces (McDaniel & Mason, 1985)

Recall

demands the production of responses from memory
more difficult than recognition because learners must search for the correct response within their mental representation of the newly experienced information (Cariana & Lee, 2001; Glover, 1989; McDaniel & Mason, 1985).

Definitions adapted from Jones (2004)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

38 of 76

3. Question type 1

Modality

Listening

Type of activity

Spoken meaning recognition

Knowledge tested

Meaning (definition)

Read 2000, Chapter 3

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

39 of 76

3. Question type 2

Modality

Listening

Type of activity

Spoken meaning recognition

Knowledge tested

Meaning (definition) Meaning (association)

Read 2000, Chapter 3

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

40 of 76

3. Question type 3

Modality

Reading

Type of activity

Written meaning recall

Knowledge tested

Meaning (definition)

Read 2000, p. 163

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

41 of 76

3. Question type 4

Modality

Reading

Type of activity

Written meaning recall

Knowledge tested

Meaning (definition)

Use (collocation)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

42 of 76

3. Question type 5

Modality

Writing

Type of activity

Written form recall

Knowledge tested

Form (written)

Meaning (definition)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

43 of 76

3. Question type 6

Modality

Writing

Type of activity

Written form recall

Knowledge tested

Form (written)

Meaning (definition)

Use (collocation)

Laufer & Nation 1995

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

44 of 76

3. Question type 7

Modality

Speaking

Type of activity

Spoken form recall

Knowledge tested

Form (spoken)

Meaning (definition)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

45 of 76

3. Question types

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

46 of 76

3. Question types

80 x test items

Listening

20 x spoken meaning recognition

Reading

20 x written

meaning recall

Writing

20 x written

form recall

Speaking

20 x spoken

form recall

40 x receptive

40 x productive

Bias towards recall:

encourages active vocabulary building from the beginning

minimises use of multi-choice format - average score increase of 16.7% due to guessing in a 6-choice format (Stewart & White, 2011) and up to 25% in 4-choice (Stewart 2014)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

47 of 76

4. Scoring (binary)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

48 of 76

4. Scoring (tolerance)

For now, after 18 weeks lessons, we are semi-tolerant of article and accent errors,

if lemma (word) is correct

For now, after 18 weeks of lessons, we are tolerant of accent errors and semi-tolerant of article errors

if lemma (word) is correct

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

49 of 76

4. Scoring (tolerance)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

50 of 76

4. Scoring (tolerance)

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

51 of 76

5. Summary

The NCELP Y7 vocabulary test …

is a syllabus-based vocabulary test designed to measure vocabulary breadth and depth

provides a highly reliable snapshot of student achievement in a manageable timeframe

tests all words featured in the SOW tested by distributing randomly amongst students

tests recognition and recall skills across four modalities

thoroughly tests different elements of word knowledge tested, in line with aspects taught

provides automated scoring of 6/7 question types

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

52 of 76

References

Cariana, R. B., & Lee, D. (2001). The effects of recognition and recall study tasks with feedback in a computer-based vocabulary lesson. Educational Technology Research & Development 49 (3), pp. 23-36.

Glover, J. A. (1989). The "testing" phenomenon: Not gone but nearly forgotten. Journal of Educational Psychology, 81(3), 392-399.

Jones, L. (2004). Testing L2 vocabulary recognition and recall using pictorial and written test items. Language Learning & Technology 8 (3), pp. 122-143.

Laufer, B., & Nation, P. (1995). Vocabulary Size and Use Lexical Richness in L2 Written Production. Applied Linguistics, 16, pp. 307-322.

McDaniel, M. A., & Mason, M. E. J. (1985). Altering memory representations through retrieval. Journal of experimental psychology. Learning, Memory and Cognition 11, pp. 371-385.

Nation, I. S. P. (2013). Learning vocabulary in another language. Cambridge: Cambridge University Press

Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.

Stewart, J. (2014). Do Multiple-Choice Options Inflate Estimates of Vocabulary Size on the VST? Language Assessment Quarterly, 11(3), pp. 271–282. doi:10.1080/15434303.2014.922977

Stewart, J., & White, D. A. (2011). Estimating guessing effects on the vocabulary levels test for differing degrees of word knowledge. TESOL Quarterly, 45 (2), pp. 370–380. doi:10.5054/tq.2011.254523

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

53 of 76

Questions?

Thanks for listening!

Any questions?

natalie_eloise

Material licensed as CC BY-NC-SA 4.0�

Rachel Hawkes

54 of 76

SOW-based grammar testing for NCELP Y7�

Principles, design, creation

NCELP Residential 3

Rowena Kasprowicz / Nicholas Avery / Stephen Owen

Last updated: 02/03/20

55 of 76

Outline

Grammar testing team

Principles of grammar testing

Coverage of grammar (by language)

4. Test creation and example items

5. Scoring

56 of 76

1. Grammar testing team

Test lead: Rowena Kasprowicz

French

Test developers

Stephen Owen

Rowena Kasprowicz

Proofreader

Ivan Avaca

Spanish

Test developers

Nicholas Avery

Rowena Kasprowicz

Proofreader

Amanda Izquierdo

German

Test developers

Stephen Owen

Rowena Kasprowicz

Proofreaders

Inge Alfernik

Natalie Finlayson

With huge thanks to: Giulia Bovolenta, Victoria Hobson, Chloé Motard, Geraldine Bengsch

57 of 76

2. Grammar testing principles

Objective

To test receptive and productive knowledge of grammar studied in Y7 Terms 1.1.1-2.1.6

Considerations

Isolating grammatical knowledge; disassociating from lexical knowledge

Ensuring coverage of the range of grammar features taught (including relevant morphology and syntax)

Receptive knowledge (recognising meaning as well as form)
Productive knowledge (accuracy of production)

Written and oral modalities

58 of 76

3. Coverage of grammar: GERMAN

Grammar feature	Reading	Listening	Writing	Speaking
Present continuous formation Two forms in English vs. one in TL	-	-	4 items	6 items
Question formation Subject-verb inversion; do-aux in English vs. TL	4 items	4 items
Subject-verb agreement (weak) 1^st / 2^nd / 3^rd singular	3 items	3 items
Subject-verb agreement (irregular) haben / sein (1^st, 2^nd, 3^rd sing, 1^st pl); mögen (1^st, 2^nd, 3^rd sing)	3 items	3 items	4 items	4 items
Article agreement Def / indef; gender; number; case (nom/acc)	4 items	-	4 items	4 items
Plural noun formation -en; umlaut + -e; -e	-	-	3 items	-
Negation nicht + verb; nicht + adjective	4 items	-	-	-
Subject and object pronoun agreement Gender; number; case (nom/acc)	4 items	-	3 items	3 items

59 of 76

3. Coverage of grammar: FRENCH

Grammar feature	Reading	Listening	Writing	Speaking
Present continuous formation Two forms in English vs. one in TL	-	-	8 items	6 items
Question formation Intonation; do-aux in English vs. TL	-	-
Subject-verb agreement (regular -ER) 1^st / 2^nd / 3^rd singular; 1^st / 2^nd / 3^rd plural	10 items	-
Subject-verb agreement (irregular) être (all persons); avoir (all persons); faire (all persons); aller (1st, 2nd, 3rd sing)	8 items	-	4 items	4 items
Article & adjective agreement Def / indef; gender; number	4 items	-	4 items
Adjectival word order Post-nominal	4 items	-	-
Preposition “to” + article	-	4 items	4 items	3 items
“il y a” vs. “est” vs. “a”	4 items	-	-	-

60 of 76

3. Coverage of grammar: SPANISH

Grammar feature	Reading	Listening	Writing	Speaking
Present continuous formation Two forms in English vs. one in TL	-	-	6 items	6 items
Question formation Intonation; do-aux in English vs. TL	-	-
Subject-verb agreement (regular -AR) 1^st / 2^nd / 3^rd singular	4 items	4 items
Subject-verb agreement (irregular) estar (1st, 2nd, 3rd sing); ser (1st, 2nd, 3rd sing / 3rd pl); tener (1st, 2nd, 3rd sing / 1st, 3rd pl); querer (1st, 2nd, 3rd sing); hacer (1^st, 2^nd, 3^rd sing); dar (1^st, 2^nd, 3^rd sing)	6 items	6 items	4 items (all)	4 items (tener, querer)
Article & adjective agreement Def / indef; gender; number	4 items	-	4 items
Adjectival word order Post-nominal	4 items	-	-
Negation no + verb	4 items	-	-	3 items (-AR)
“hay” vs. “tiene”	4 items	-	-	-

61 of 76

4. Test creation process

Each question tests a specific grammatical feature (or combination of features)

Size of the test

Target time: 15 minutes (R/L/W); 4 minutes (S)

50 test items (R/L/W); 13 items (S)

Question item pool

Items created using vocabulary from SoW (reviewed to ensure no clash with vocabulary test)

Each pool contains an equal number of instances of each structure taught

(e.g. equally likely to be tested on 1^st person singular as 2^nd person singular in subject-verb

agreement questions, etc.)

Variation between languages in weighting of different modes / modalities, due to variation in nature of grammar features being tested in each language.

62 of 76

4. Test creation: examples items

Testing written and aural receptive knowledge

Testing ability to recognise meaning as well as form

Multiple choice; here matching to English equivalent

Multiple choice options appear in a random order for each item

(Avoid position indicating correct answer)

63 of 76

4. Test creation: examples items

Testing written and aural receptive knowledge

Testing ability to recognise meaning as well as form

Multiple choice; here matching to TL alternative

Isolating recognition of gender and number

64 of 76

4. Test creation: examples items

Isolating receptive knowledge of syntax

Recognising function (statement / question) indicated by word order

Remove punctuation

(Avoid . or ? indicating answer)

Including variety of ‘subjects’

(Avoid reliance on ‘du’ to indicate question)

65 of 76

4. Test creation: examples items

Isolating productive knowledge of syntax

Drag-and-drop into correct order

Article / noun / adjective appear in a random (vertical) order for each item.

(Ensures learner is paying attention to each element and its correct position)

66 of 76

4. Test creation: examples items

Testing written productive knowledge

Testing ability to accurately produce the structure
Open text box

Testing understanding of question formation

(Check pupils understand that ‘do’ auxiliary is not needed)

Isolating grammatical knowledge by providing verb infinitive

(Avoid missing answers due to lack of lexical knowledge)

67 of 76

4. Test creation: examples items

Testing written productive knowledge

Testing ability to accurately produce the structure
Open text box

Isolating grammatical knowledge by indicating gender (directly or via article).

(Ensure that pupils are not reliant on recalling gender of a specific lexical item)

68 of 76

4. Test creation: examples items

Testing written productive knowledge

Testing ability to accurately produce the structure
Open text box

Testing syntax alongside subject-verb agreement.

Half of the pool included verbs in present continuous form

(Check pupils’ understanding that there is one present tense structure for simple and continuous meanings)

69 of 76

4. Test creation: examples items

Testing oral productive knowledge

Testing ability to accurately produce the structure(s)
Combining a number of structures within each item

Testing subject-verb agreement and question formation.

Isolating grammatical knowledge by providing verb infinitive

(Avoid missing answers due to lack of lexical knowledge)

Note: glosses not provided for other elements of the sentence which are not being tested (e.g. object)

70 of 76

4. Test creation: examples items

Testing oral productive knowledge

Testing ability to accurately produce the structure(s)
Combining a number of structures within each item

No gloss provided for irregular verbs

(Irregular verb forms taught as individual lexical items, rather than transforming from the infinitive)

Isolating knowledge of gender / number (& case) agreement for articles (& adjectives)

(Noun form provided and gender indicated)

71 of 76

5. Scoring

Items testing written and oral receptive knowledge = multiple choice

One correct answer
Automated scoring

Correct option

Incorrect options

72 of 76

5. Scoring

Items testing written productive knowledge = open text response

Possible answers manually coded

1 mark per structure tested
Partial marks accounted for different elements of a structure being tested

e.g. article agreement (gender / number / case / definiteness)

subject-verb agreement (pronoun / verb ending)

Accents dealt with systematically

Tolerated, where absence does not alter the meaning (e.g. préparer vs. preparer) 🡪 no marks deducted
Semi-tolerant, where absence does alter the meaning (e.g. à la vs. a la) 🡪 0.5 mark deducted

🡪 Not possible to account for any other spelling errors (note: target words provided in glosses)

Pilot phase: data from first round of testing will help to identify common learner errors

73 of 76

5. Scoring

0.5 mark for incorrect indefinite article (but gender correct)

0.5 mark for incorrect case

(but gender correct)

0.5 mark for incorrect indefinite article (but gender correct)

0.5 mark for incorrect case and indefinite article (but gender correct)

Examples of partial marking

74 of 76

5. Scoring

Examples of partial marking

0.5 mark deducted for missing accent (changes meaning)

75 of 76

5. Scoring

Examples of partial marking

0.5 mark for correct pronoun

0.5 mark for correct verb ending

0.5 deducted for incorrect verb ending

0.5 deducted for incorrect pronoun

76 of 76

5. Scoring

Examples of partial marking

1 mark for correct word order

1 mark for correct subject-verb agreement

0.5 deducted for incorrect verb ending

0.5 deducted for incorrect pronoun

1 deducted for incorrect WO

0.5 deducted for incorrect verb ending

0.5 deducted for incorrect pronoun

1 deducted for incorrect WO

0.5 deducted for incorrect verb ending

1 deducted for incorrect WO

0.5 deducted for incorrect pronoun