1 of 76

NCELP Residential 3��Phonics, Vocabulary and Grammar testing: Principles, Design, and Creation

Date updated: 02/03/20

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

2 of 76

NCELP Phonics Assessments �Principles, design, creation

NCELP Residential 3

Robert Woore

Date updated: 02/03/20

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

3 of 76

Phonics assessment team effort!

Robert Woore

Rachel Hawkes

Nick Avery

Giulia Bovolenta

Inge Alferink

Natalie Finlayson

Emma Marsden

All the NCELP team!

Native speakers (checking & audio)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

4 of 76

Outline

  1. Principles of Phonics assessment

1.1 What are we trying to test?

1.2 How can we test it, validly and practically?

2. Question types

2.1 Task design

2.2 Item selection

  1. Scoring

3.1 What do we mean by ‘accurate’ decoding?

3.2 How the scoring works

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

5 of 76

1. Principles of Phonics assessment

Objectives

  • To test knowledge of Symbol-Sound Correspondences (SSC) covered to date in Y7 (i.e. a syllabus-based achievement test)

Why are we testing this?

  • SSC knowledge is used
    • when reading aloud (including ‘inner voice’) – linked to reading comprehension, vocabulary acquisition (inter alia)
    • when listening (segmentation) and writing (spelling)
  • Testing is bidirectional: both sound 🡪 print and print 🡪 sound

transcription reading aloud

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

6 of 76

Skill acquisition theory

Practice with control, with awareness.

Errors happen, speed is variable.

Practice requires less awareness

Less error, faster, less variability in speed.

Declarative knowledge may be lost.

proceduralisation

automatisation

Declarative knowledge

Procedural knowledge

Automatised knowledge

Ultimate goal!

Fluent, accurate decoding

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

7 of 76

Accuracy and fluency

< Wagen >

/ v ɑː ɡ ə n /

Fluency and accuracy – both are important in L2 learning

  • Accuracy: allows correct spoken and written forms of words to be learnt and used effectively in communication
  • Fluency: reduces burden on working memory, thus allowing other ‘higher-level’ processes to be carried out more easily (e.g. inferencing when reading; conceptualizing ideas and formulating language when writing).

BUT fluency presupposes accuracy. There is no point automatizing incorrect knowledge!

🡪 Primacy of accuracy at this stage

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

8 of 76

Accuracy and fluency

  • L1 ‘interference’ and its impact on fluency / accuracy:

< Wagen >

/ w a ɡ ə n /

<w> 🡪 / v /

<a> 🡪 / ɑ ː /

/ v ɑː ɡ ə n /

  • More accurate L2 decoding associated with longer response times: Erler, 2003 (Year 7 learners of French in the UK); Li, 2019 (Chinese university students learning English)
  • Speeded tests may make pupils more likely to fall back on L1 knowledge
  • Main target of our phonics testing in Year 7 = accuracy
  • Ideally, untimed (‘unspeeded’) tests. But practical constraints 🡪 time limit

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

9 of 76

a. Print to sound

  • A number of different formats have been used to measure print-to-sound decoding
    • Rhyme judgment task (Erler & Macaro, 2012); Sound-Alike Task (SALT) (Woore et al., 2018 – but French only)
  • We believe the most valid method of assessing this is a ‘reading aloud task’ (RAT) (Woore 2006)
  • Cf. Year 1 ‘Phonics Screening Check’ (DfE)
  • But – trade-off between validity and ease of administration and scoring!

Scales image: by scott_kirkwood, openclipart.org.

Screenshot from Phonics Screening Check training video, DfE. www.youtube.com/watch?v=IPJ_ZEBh1Bk

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

10 of 76

a. Print to sound

  • Year 1 ‘Phonics Screening Check’ (DfE):
  • Real words and pseudowords
  • What’s the difference? Why include pseudowords?

Pseudowords:

  • By definition, unknown to the learners
  • Learners cannot draw on existing knowledge of the words’ pronunciations

🡪 A ‘pure’ measure of their knowledge of symbol-sound correspondences

However, we considered pseudowords problematic:

  • Ethically / pedagogically
  • In terms of validity (Papagno, Valentine & Baddeley, 1991)
  • Unfamiliar real words
  • Low frequency
  • Not in SoW to date

Standards and Testing Agency, 2018: 2018 key stage 1 phonics screening check: pupils’ materials

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

11 of 76

a. Print to sound

Familiar versus unfamiliar words

  • Woore (2018) – compared Y10 MFL pupils’ pronunciation of familiar / unfamiliar French words
  • For each participant, a set of orthographically matched word pairs was created. On average, 14 words pairs per participant.
  • Each pair consisted of 1 word they knew and 1 word they didn’t know (as shown by a vocab test), sharing the same spelling body
  • Unit of analysis = spelling body (phonological rime)
  • …thus controlling for word-level variables

pain

grain

trois

chois

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

12 of 76

a. Print to sound

p < .001, d = 0.33

<pain> - /pæ̃/

<grain> - /ɡɹɛɪn/

<chat> - /ʃa/

<fat> - /fat/

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

13 of 76

a. Print to sound

Summary so far

  • Reading Aloud Test
  • Administered individually
  • Unfamiliar words to test SSC knowledge (to stop them using existing knowledge of whole word pronunciations)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

14 of 76

b. Sound to print

  • Transcription task – can they write what they hear?
  • However… languages vary in “orthographic depth”:

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

15 of 76

a. Print to sound

  • So, in French particularly 🡪 multiple possible spellings for a given sound
  • For unfamiliar words: proliferation of ‘correct’ answers

ver, vers, verre, verres, vert, verts, vair

* vaire, * vaires, * verd, * verds, * vère, * vères, * vaîre, * vaîres …

  • So, we tweaked the response format to constrain possible responses
    • Blanks for individual letters.
    • E.g. “ v _ _ ”

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

16 of 76

2. Item selection

For each language:

  • We listed all the target SSC covered in the SoW
  • For each SSC, we identified 3 low-frequency words (>5000 frequency ranking for French and Spanish, >4037 for German) and ensured not covered in SoW
  • Therefore, pupils very unlikely to know them!
  • Where possible, we controlled for orthographic and phonological length – i.e. short monosyllabic words (French and German) or mostly 2 syllables (Spanish)
  • We tried to avoid words that look like English words (interlingual homographs), taboo words, etc.!

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

17 of 76

SSC in French

SSC

French 1

French 2

French 3

SFC –x

toux

poux

houx

SFC -d

jard

lard

dard

SFC –t

prit

vit

mit

SFC –s

bris

gis

spis

a

nase

prase

jase

i

brime

cime

trime

eu

yeuse

beuse

Meuse

e

reloge

remous

recel

o

lot

plot

trot

eau

veau

seau

sceau

au

chaule

gaule

taule

u

cossu

cornu

crépu

ou

zou

clou

flou

é

pré

en

ment

sent

gent

French SSC

French 1

French 2

French 3

an

flan

cran

ban

on

pond

tond

gond

SFe

rame

crame

lame

in

crin

clin

pin

ain

tain

nain

zain

è

sème

pèse

sève

ê

vête

blême

bêche

ai

daine

raine

gaine

oi

aloi

coi

aboi

ch

croche

hoche

loche

ç

glaçon

maçon

pinçon

qu

quinte

quine

quille

j

jauge

jonc

joug

-tion

dation

rection

brution

-ien

salien

jovien

danien

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

18 of 76

2. Item selection

  • How many items in the test?
  • Completeness of coverage for individual pupils – diagnostic feedback
  • Testing burden and time demands – aim to fit tests within a single lesson!
  • We don’t want assessment to soak up too much teaching and learning time!

Scales image: by scott_kirkwood, openclipart.org.

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

19 of 76

2. Item selection

  • 15 items per pupil
  • On-line test: random selection of SSC for both the reading aloud and dictation components. Random selection (1 of three items) for each SSC
  • Class as a whole should cover all the SSC, but individual pupils do not

  • Takes less time for individuals to complete
  • Prevents copying – all individuals have a different selection of items
  • Diagnostic feedback (to inform teaching) at whole class level

🗶

  • Incomplete diagnostic feedback for individuals
  • But – these are summative tests.
  • Diagnostic tests are already available – teacher- / self- / peer-assessment

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

20 of 76

Sample items – reading aloud

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

21 of 76

Sample items – reading aloud

🡪 Time limit = compromise!

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

22 of 76

Sample items – reading aloud

15 items to read aloud

t

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

23 of 76

Sample items – dictation

🡪 Time limit = compromise!

Number of items?

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

24 of 76

Sample items – dictation

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

25 of 76

Pen-and-paper tests:

  • Pre-selection of 15 items

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

26 of 76

3. Scoring – reading aloud

A note on ‘foreign accent’. Two dimensions to accuracy in reading aloud:

  1. Accuracy of SSC knowledge: how graphemes / symbols are mapped onto L2 phonemes / sounds
  2. Accuracy of pronunciation of L2 phonemes / sounds.

This is our focus for the phonics assessment!

< v e a u >

/ vjʉ / / viː əʊ /

🗶

/ vo /

/ vəʊ /

  • A foreign accent is hard to shift even for the most dedicated learner (even if they want to)
  • You can be intelligible (and comprehensible) with a foreign accent.
  • We are testing SSC knowledge (phonics) rather than native-like pronunciation (phonetics).

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

27 of 76

3. Scoring – reading aloud

1 mark / item = max. 15 marks in total

0 marks: Incorrect pronunciation based on English SSC.

1 mark: Correct knowledge of target SSC, but pronunciation with an English accent.

1 mark: Correct pronunciation of target SSC

Notes:

Give marks for correct pronunciation of target SSC in bold even if other parts of the word are mispronounced / not attempted.

Be lenient when scoring. If you think the students have decoded the symbol (graphemes) to the correct sound (phonemes), then you can allow for some degree of foreign accent in pupils’ pronunciation of the target sounds. A foreign accent is hard to shift even for the most dedicated learner after years of practice; and people are perfectly intelligible with a foreign accent. In our teaching and in our phonics, we are targeting SSC knowledge (phonics) rather than native-like pronunciation (phonetics). 

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

28 of 76

3. Scoring

Dictation task – automatically scored by the software platform

Reading aloud task – marked individually by teachers (☹?)

  • Workload implications – need to make individual decisions about this.
  • One option would be to sit with individuals as they complete the speaking part of the test, and mark it in ‘real time’ as they go along.

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

29 of 76

SOW-based vocabulary testing for NCELP Y7

Principles, design, creation

NCELP Residential 3

Natalie Finlayson / Emma Marsden

Date updated: 02/03/20

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

30 of 76

Outline

  1. Vocab testing team

  • Principles of vocabulary testing

2.1 Breadth of knowledge testing

2.2 Depth of knowledge testing

  1. Question types

  • Scoring

  • Summary

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

31 of 76

1. Vocab testing team

Test leads: Natalie Finlayson & Emma Marsden

French

Test developer

Kirsten Somerville

Proofreader

Ivan Avaca

Spanish

Test developer

Ivan Avaca

Proofreaders

Nick Avery

Pep Mateos Gonzalez

German

Test developers

Inge Alferink

Natalie Finlayson

Proofreaders

Inge Alferink

Natalie Finlayson

With huge thanks to: Giulia Bovolenta, Victoria Hobson, Stephen Owen, Helen Thomas, Catherine Morris, Ciaran Morris, Jack Peacock, Laurence Anthony

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

32 of 76

2. Vocab testing principles

Objectives

To test breadth and depth of knowledge of vocabulary studied in Y7 Terms 1.1.1 - 2.1.1

Considerations

  • Syllabus-based achievement test

  • Productive vs receptive knowledge

  • Recall vs recognition

  • Oral and written modalities

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

33 of 76

2.1 Breadth of knowledge

Objective 1: To find out how many words students know

Sample size

  • Target time: 12 minutes

  • Pilot testing indicates students can comfortably answer one question in 9 seconds

  • 720 / 9 = 80 total test items (split equally between L, R, W & S)

SOW coverage:

French: 43%* Spanish: 35% German: 35%

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

34 of 76

2.1 Breadth of knowledge

Objective 1: To find out how many words students know

Item pool

  • To test overall knowledge of words in SOW – words distributed amongst students

  • Words are equally likely to be selected from randomised pools

  • All words in the SOW appear in the test once only

  • Part of speech ratios (noun : verb : other) broadly upheld within different question types

(French: 2:1:2; Spanish: 3:1:3; German: 2:1:3)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

35 of 76

2.2 Depth of knowledge

Objective 2: To find out how well students know target words

Year 8+

tested separately

Year 8+

Nation 2013, p. 538

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

36 of 76

2.2 Depth of knowledge

How well do you know the words we have learned so far?

1. I have seen this word before.

2. I know what the word means.

3. I can read the word aloud.

4. I can spell the word correctly.

5. I can use the word in a sentence.

6. I know the gender of nouns.

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

37 of 76

2.2 Depth of knowledge

Recognition tests

  • multiple choice activities whereby learners select or guess the correct response from the alternatives given
  • such tests may strengthen any existing memory traces (McDaniel & Mason, 1985)

Recall

  • demands the production of responses from memory
  • more difficult than recognition because learners must search for the correct response within their mental representation of the newly experienced information (Cariana & Lee, 2001; Glover, 1989; McDaniel & Mason, 1985).  

Definitions adapted from Jones (2004)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

38 of 76

3. Question type 1

Modality

Listening

Type of activity

Spoken meaning recognition

Knowledge tested

Meaning (definition)

Read 2000, Chapter 3

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

39 of 76

3. Question type 2

Modality

Listening

Type of activity

Spoken meaning recognition

Knowledge tested

Meaning (definition) Meaning (association)

Read 2000, Chapter 3

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

40 of 76

3. Question type 3

Modality

Reading

Type of activity

Written meaning recall

Knowledge tested

Meaning (definition)

Read 2000, p. 163

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

41 of 76

3. Question type 4

Modality

Reading

Type of activity

Written meaning recall

Knowledge tested

Meaning (definition)

Use (collocation)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

42 of 76

3. Question type 5

Modality

Writing

Type of activity

Written form recall

Knowledge tested

Form (written)

Meaning (definition)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

43 of 76

3. Question type 6

Modality

Writing

Type of activity

Written form recall

Knowledge tested

Form (written)

Meaning (definition)

Use (collocation)

Laufer & Nation 1995

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

44 of 76

3. Question type 7

Modality

Speaking

Type of activity

Spoken form recall

Knowledge tested

Form (spoken)

Meaning (definition)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

45 of 76

3. Question types

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

46 of 76

3. Question types

80 x test items

Listening

20 x spoken meaning recognition

Reading

20 x written

meaning recall

Writing

20 x written

form recall

Speaking

20 x spoken

form recall

40 x receptive

40 x productive

Bias towards recall:

  • encourages active vocabulary building from the beginning

  • minimises use of multi-choice format - average score increase of 16.7% due to guessing in a 6-choice format (Stewart & White, 2011) and up to 25% in 4-choice (Stewart 2014)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

47 of 76

4. Scoring (binary)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

48 of 76

4. Scoring (tolerance)

For now, after 18 weeks lessons, we are semi-tolerant of article and accent errors,

if lemma (word) is correct

For now, after 18 weeks of lessons, we are tolerant of accent errors and semi-tolerant of article errors

if lemma (word) is correct

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

49 of 76

4. Scoring (tolerance)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

50 of 76

4. Scoring (tolerance)

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

51 of 76

5. Summary

The NCELP Y7 vocabulary test …

  • is a syllabus-based vocabulary test designed to measure vocabulary breadth and depth

  • provides a highly reliable snapshot of student achievement in a manageable timeframe

  • tests all words featured in the SOW tested by distributing randomly amongst students

  • tests recognition and recall skills across four modalities

  • thoroughly tests different elements of word knowledge tested, in line with aspects taught

  • provides automated scoring of 6/7 question types

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

52 of 76

References

Cariana, R. B., & Lee, D. (2001). The effects of recognition and recall study tasks with feedback in a computer-based vocabulary lesson. Educational Technology Research & Development 49 (3), pp. 23-36.

Glover, J. A. (1989). The "testing" phenomenon: Not gone but nearly forgotten. Journal of Educational Psychology, 81(3), 392-399.

Jones, L. (2004). Testing L2 vocabulary recognition and recall using pictorial and written test items. Language Learning & Technology 8 (3), pp. 122-143.

Laufer, B., & Nation, P. (1995). Vocabulary Size and Use Lexical Richness in L2 Written Production. Applied Linguistics, 16, pp. 307-322.

McDaniel, M. A., & Mason, M. E. J. (1985). Altering memory representations through retrieval. Journal of experimental psychology. Learning, Memory and Cognition 11, pp. 371-385.

Nation, I. S. P. (2013). Learning vocabulary in another language. Cambridge: Cambridge University Press

Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.

Stewart, J. (2014). Do Multiple-Choice Options Inflate Estimates of Vocabulary Size on the VST? Language Assessment Quarterly, 11(3), pp. 271–282. doi:10.1080/15434303.2014.922977

Stewart, J., & White, D. A. (2011). Estimating guessing effects on the vocabulary levels test for differing degrees of word knowledge. TESOL Quarterly, 45 (2), pp. 370–380. doi:10.5054/tq.2011.254523

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

53 of 76

Questions?

Thanks for listening!

Any questions?

natalie_eloise

Material licensed as CC BY-NC-SA 4.0

Rachel Hawkes

54 of 76

SOW-based grammar testing for NCELP Y7

Principles, design, creation

NCELP Residential 3

Rowena Kasprowicz / Nicholas Avery / Stephen Owen

Last updated: 02/03/20

55 of 76

Outline

  1. Grammar testing team

  • Principles of grammar testing

  • Coverage of grammar (by language)

4. Test creation and example items

5. Scoring

56 of 76

1. Grammar testing team

Test lead: Rowena Kasprowicz

French

Test developers

Stephen Owen

Rowena Kasprowicz

Proofreader

Ivan Avaca

Spanish

Test developers

Nicholas Avery

Rowena Kasprowicz

Proofreader

Amanda Izquierdo

German

Test developers

Stephen Owen

Rowena Kasprowicz

Proofreaders

Inge Alfernik

Natalie Finlayson

With huge thanks to: Giulia Bovolenta, Victoria Hobson, Chloé Motard, Geraldine Bengsch

57 of 76

2. Grammar testing principles

Objective

To test receptive and productive knowledge of grammar studied in Y7 Terms 1.1.1-2.1.6

Considerations

  • Isolating grammatical knowledge; disassociating from lexical knowledge

  • Ensuring coverage of the range of grammar features taught (including relevant morphology and syntax)

  • Receptive knowledge (recognising meaning as well as form)
  • Productive knowledge (accuracy of production)

  • Written and oral modalities

58 of 76

3. Coverage of grammar: GERMAN

Grammar feature

Reading

Listening

Writing

Speaking

Present continuous formation

Two forms in English vs. one in TL

-

-

4 items

6 items

Question formation

Subject-verb inversion; do-aux in English vs. TL

4 items

4 items

Subject-verb agreement (weak)

1st / 2nd / 3rd singular

3 items

3 items

Subject-verb agreement (irregular)

haben / sein (1st, 2nd, 3rd sing, 1st pl); mögen (1st, 2nd, 3rd sing)

3 items

3 items

4 items

4 items

Article agreement

Def / indef; gender; number; case (nom/acc)

4 items

-

4 items

Plural noun formation

-en; umlaut + -e; -e

-

-

3 items

-

Negation

nicht + verb; nicht + adjective

4 items

-

-

-

Subject and object pronoun agreement

Gender; number; case (nom/acc)

4 items

-

3 items

3 items

59 of 76

3. Coverage of grammar: FRENCH

Grammar feature

Reading

Listening

Writing

Speaking

Present continuous formation

Two forms in English vs. one in TL

-

-

8 items

6 items

Question formation

Intonation; do-aux in English vs. TL

-

-

Subject-verb agreement (regular -ER)

1st / 2nd / 3rd singular; 1st / 2nd / 3rd plural

10 items

-

Subject-verb agreement (irregular)

être (all persons); avoir (all persons); faire (all persons);

aller (1st, 2nd, 3rd sing)

8 items

-

4 items

4 items

Article & adjective agreement

Def / indef; gender; number

4 items

-

4 items

Adjectival word order

Post-nominal

4 items

-

-

Preposition “to” + article

-

4 items

4 items

3 items

“il y a” vs. “est” vs. “a”

4 items

-

-

-

60 of 76

3. Coverage of grammar: SPANISH

Grammar feature

Reading

Listening

Writing

Speaking

Present continuous formation

Two forms in English vs. one in TL

-

-

6 items

6 items

Question formation

Intonation; do-aux in English vs. TL

-

-

Subject-verb agreement (regular -AR)

1st / 2nd / 3rd singular

4 items

4 items

Subject-verb agreement (irregular)

estar (1st, 2nd, 3rd sing); ser (1st, 2nd, 3rd sing / 3rd pl); tener (1st, 2nd, 3rd sing / 1st, 3rd pl); querer (1st, 2nd, 3rd sing); hacer (1st, 2nd, 3rd sing); dar (1st, 2nd, 3rd sing)

6 items

6 items

4 items

(all)

4 items

(tener, querer)

Article & adjective agreement

Def / indef; gender; number

4 items

-

4 items

Adjectival word order

Post-nominal

4 items

-

-

Negation

no + verb

4 items

-

-

3 items

(-AR)

“hay” vs. “tiene”

4 items

-

-

-

61 of 76

4. Test creation process

Each question tests a specific grammatical feature (or combination of features)

Size of the test

  • Target time: 15 minutes (R/L/W); 4 minutes (S)

  • 50 test items (R/L/W); 13 items (S)

Question item pool

  • Items created using vocabulary from SoW (reviewed to ensure no clash with vocabulary test)

  • Each pool contains an equal number of instances of each structure taught

(e.g. equally likely to be tested on 1st person singular as 2nd person singular in subject-verb

agreement questions, etc.)

Variation between languages in weighting of different modes / modalities, due to variation in nature of grammar features being tested in each language.

62 of 76

4. Test creation: examples items

Testing written and aural receptive knowledge

  • Testing ability to recognise meaning as well as form

  • Multiple choice; here matching to English equivalent

Multiple choice options appear in a random order for each item

(Avoid position indicating correct answer)

63 of 76

4. Test creation: examples items

Testing written and aural receptive knowledge

  • Testing ability to recognise meaning as well as form

  • Multiple choice; here matching to TL alternative

Isolating recognition of gender and number

64 of 76

4. Test creation: examples items

Isolating receptive knowledge of syntax

  • Recognising function (statement / question) indicated by word order

Remove punctuation

(Avoid . or ? indicating answer)

Including variety of ‘subjects’

(Avoid reliance on ‘du’ to indicate question)

65 of 76

4. Test creation: examples items

Isolating productive knowledge of syntax

  • Drag-and-drop into correct order

Article / noun / adjective appear in a random (vertical) order for each item.

(Ensures learner is paying attention to each element and its correct position)

66 of 76

4. Test creation: examples items

Testing written productive knowledge

  • Testing ability to accurately produce the structure
  • Open text box

Testing understanding of question formation

(Check pupils understand that ‘do’ auxiliary is not needed)

Isolating grammatical knowledge by providing verb infinitive

(Avoid missing answers due to lack of lexical knowledge)

67 of 76

4. Test creation: examples items

Testing written productive knowledge

  • Testing ability to accurately produce the structure
  • Open text box

Isolating grammatical knowledge by indicating gender (directly or via article).

(Ensure that pupils are not reliant on recalling gender of a specific lexical item)

68 of 76

4. Test creation: examples items

Testing written productive knowledge

  • Testing ability to accurately produce the structure
  • Open text box

Testing syntax alongside subject-verb agreement.

Half of the pool included verbs in present continuous form

(Check pupils’ understanding that there is one present tense structure for simple and continuous meanings)

69 of 76

4. Test creation: examples items

Testing oral productive knowledge

  • Testing ability to accurately produce the structure(s)
  • Combining a number of structures within each item

Testing subject-verb agreement and question formation.

Isolating grammatical knowledge by providing verb infinitive

(Avoid missing answers due to lack of lexical knowledge)

Note: glosses not provided for other elements of the sentence which are not being tested (e.g. object)

70 of 76

4. Test creation: examples items

Testing oral productive knowledge

  • Testing ability to accurately produce the structure(s)
  • Combining a number of structures within each item

No gloss provided for irregular verbs

(Irregular verb forms taught as individual lexical items, rather than transforming from the infinitive)

Isolating knowledge of gender / number (& case) agreement for articles (& adjectives)

(Noun form provided and gender indicated)

71 of 76

5. Scoring

Items testing written and oral receptive knowledge = multiple choice

  • One correct answer
  • Automated scoring

Correct option

Incorrect options

72 of 76

5. Scoring

Items testing written productive knowledge = open text response

Possible answers manually coded

  • 1 mark per structure tested
  • Partial marks accounted for different elements of a structure being tested

e.g. article agreement (gender / number / case / definiteness)

subject-verb agreement (pronoun / verb ending)

  • Accents dealt with systematically
    • Tolerated, where absence does not alter the meaning (e.g. préparer vs. preparer) 🡪 no marks deducted
    • Semi-tolerant, where absence does alter the meaning (e.g. à la vs. a la) 🡪 0.5 mark deducted

🡪 Not possible to account for any other spelling errors (note: target words provided in glosses)

Pilot phase: data from first round of testing will help to identify common learner errors

73 of 76

5. Scoring

0.5 mark for incorrect indefinite article (but gender correct)

0.5 mark for incorrect case

(but gender correct)

0.5 mark for incorrect indefinite article (but gender correct)

0.5 mark for incorrect case and indefinite article (but gender correct)

Examples of partial marking

Examples of partial marking

74 of 76

5. Scoring

Examples of partial marking

Examples of partial marking

0.5 mark deducted for missing accent (changes meaning)

0.5 mark deducted for missing accent (changes meaning)

75 of 76

5. Scoring

Examples of partial marking

Examples of partial marking

0.5 mark for correct pronoun

0.5 mark for correct verb ending

0.5 deducted for incorrect verb ending

0.5 deducted for incorrect pronoun

76 of 76

5. Scoring

Examples of partial marking

Examples of partial marking

1 mark for correct word order

1 mark for correct subject-verb agreement

0.5 deducted for incorrect verb ending

0.5 deducted for incorrect pronoun

1 deducted for incorrect WO

0.5 deducted for incorrect verb ending

0.5 deducted for incorrect pronoun

1 deducted for incorrect WO

0.5 deducted for incorrect verb ending

1 deducted for incorrect WO

0.5 deducted for incorrect pronoun