1 of 3

Building trust with a concordancer�(a.k.a. baby steps in corpus linguistics)

Dr Jade Smith

Durban University of Technology

English & Communication programme

2 of 3

Scope of the research

  • Smith, J., & Adendorff, R. (2014). For the people: Defining communities of readership through an appraisal comparison of letters to two South African newspapers. Discourse, Context & Media3, 1-13. DOI: https://doi.org/10.1016/j.dcm.2013.10.003

  • Aim: find the values/priorities of readers of two newspapers: Daily Sun and The Times. Are the readers so different?
  • What is important to the readers?
  • What are the characteristics of their community of readership?

Data: 20 letters to the editor (10 from each newspaper – 2012)

  • I had a qualitative framework to evaluate their feelings on certain subjects, but how could I objectively identify their ‘favourite’ topics of conversation?

3 of 3

Using a concordancer for quantitative confirmation

  • Frequency lists generate the most common words in the data corpus
  • But one writer might skew the data by mentioning a word many times.
  • To find out which words are most salient in a corpus, one list of words can be compared with another. Gives a list of keywords – words which appear in the Daily Sun corpus more frequently than in The Times corpus and vice versa.

  • To generate a list of keywords, a concordancer does a log-likelihood (LL) test, which generates a log-likelihood score.
  • To achieve at least 95% confidence that the word does not occur due to chance, the LL score of a word should be over 3.83. I set the LL “cut-off” score as 3.9, ensuring that any words with a score higher than this would be significant.