Corpora

  • British National Corpus (BYU-BNC)
  • British National Corpus 2014. A resource for research and teaching on the contemporary English language.
  • Clarino Repository Home. A Norwegian infrastructure project to make existing and future language resources easily accessible for researchers and to bring eScience to humanities disciplines.
  • Corpus of Political Speeches. An online archive of speeches from politicians around the world. This Corpus has a web-based concordance feature, which allows corpus searches in untagged texts.
  • Dialogue corpora: Coconut corpus, Dialog diversity ‘corpus’, Speech act annotated dialogues corpus, SRI American Express travel agent dialogue corpus, Switchboard corpus, TRAINS spoken dialogue corpus.
  • EF – Cambridge Open Language Database: Currently contains over 83 million words from 1 million assignments written by 174,000 learners, across a wide range of levels (CEFR stages A1-C2). This text corpus includes information on learner errors, part of speech, and grammatical relationships.
  • EuroCoat: The European Corpus of Academic Talk. 27 Spanish undergraduate students from different universities and academic disciplines were video-recorded in conversation with their lecturers. The resulting 5 hours and 47 minutes of conversation was subsequently transcribed and form what is, to the best of our knowledge, the first corpus of office hours’ consultations carried out in English as academic lingua franca.
  • Norwegian-English Student Translation Corpus (NEST): Translations from Norwegian into English, produced by students of English at Norwegian universities and colleges.