The British National Corpus 2014 - Tweet Collection

Would you like to contribute to the British National Corpus - a very large research repository of modern British English?

The British National Corpus 2014 is a major project led by Lancaster University to create a 100 million word corpus (a large collection of ‘real life’ language) of modern-day British English.This corpus is used by researchers to understand more about how language works and how it is evolving. Educators, dictionary compilers and the interested public will also be able to access it to find usage examples of modern British English in different genres.

To collect tweets for the Written BNC2014 we rely on the generosity of the British public to give us access to their published tweets to incorporate them into the corpus. We are asking the public to provide us with their Twitter archives containing tweets sent between 2014 and 2018. Tweets should be unedited, but you are welcome to anonymise them.

You can submit these tweets as a .csv file, which is automatically generated by Twitter. Your contribution to this world-leading language resource will be fully credited in the corpus documentation.

How to submit your tweets to us:
1. Log into your Twitter account and visit the “Settings and Privacy” page.

2. Click on “Request your archive”. You will receive an email containing a link where you can download an archive of all your tweets since you set up your account.

3. When the download is complete, open the file entitled "tweets.csv". If you wish to anonymise the data, you are responsible for this process. You can edit the spreadsheet by highlighting all the text > clicking the dropdown box on the toolbar which reads "General" > selecting "text". We understand that this may be time consuming, especially if you have a lot of tweets, so we are able to automatically remove all Twitter handles, URLs, and retweets of other users on your behalf. However, we would strongly encourage you to sift through your tweets yourself to ensure you are happy for everything you submit to be included within the corpus. Bear in mind that names often appear within tweets that may not be picked up by our automatic anonymisation tool. For example, in the following tweet, we would automatically remove the Twitter handle (@KellieBee123), but you would be responsible for replacing the name (Kellie) with [anon], if you wish.

Original tweet:
"@KellieBee123 Best of luck with it Kellie! It's not that bad!"

With your anonymisation:
"@KellieBee123 Best of luck with it [anon]! It's not that bad!"

With our anonymisation, as it will appear in the corpus:
"[anon] Best of luck with it [anon]! It's not that bad!"

4. Upload the .csv file below.


Thank you very much for your contribution.

The Lancaster team.
email: m.gillings@lancaster.ac.uk

What is your name? (Please fill out this field if you wish to be credited in the corpus documentation).
What is your gender? *
Are you a: *
Please submit your Twitter archive containing tweets you have published between 2014 and 2018. Please anonymise any names and/or personal information - you can do this by replacing names/information with [anon]. We are able to automatically remove all Twitter handles, URLs, and retweets of other users on your behalf. However, we would strongly encourage you to sift through your tweets to anonymise other personal information and ensure you are happy for everything to be included within the corpus. *
Required
I give consent for my tweets to be included in the Written BNC2014. *
Required
Submit
Never submit passwords through Google Forms.
This content is neither created nor endorsed by Google. - Terms of Service - Additional Terms