The Corpus of Late Modern English Texts

Please note: The Corpus of Late Modern English Texts (CLMET) has been superseded by the Corpus of Late Modern English Texts, version 3.0 (CLMET3.0). The CLMET is kept available on this site for archiving purposes.


The CLMET, compiled by Hendrik De Smet, is a principled collection of texts drawn from the Project Gutenberg and Oxford Text Archive. In total, the CLMET contains some ten million words of running text, divided over three 70-year sub-periods.

In compiling the corpus, the following principles have been applied:

Given its unavoidable sociolinguistic bias, the CLMET is not fit for any form of fine-grained variationist research. The CLMET can especially be used in investigations of qualitative change in the history of the English language, including grammaticalisation and other types of lexico-grammatical change (e.g. Breban 2004; De Smet 2004; De Smet & Cuyckens 2004a, 2004b; Vanden Eynde 2004). Given its size, the corpus may complement smaller historical corpora (or other resources such as the Oxford English Dictionary) by providing empirical data on relatively rare linguistic phenomena. A more detailed description and discussion of the CLMET (with a full list of sources) can be found in De Smet (2005), available online on the ICAME Journal website.

The following table summarises the corpus make-up.

Sub-period Number of authors Number of texts Number of words
1710-1780 15 24 2,096,405
1780-1850 29 39 3,739,657
1850-1920 28 52 3,982,264
TOTAL 72 115 9,818,326

To download the corpus, you can obtain a free password and user-id by contacting Hendrik De Smet. If you already have a password and user-id, simply click here to download or access.

----------

References:

Breban, Tine. 2004. The grammaticalization of the English adjectives of comparison. A diachronic study. Paper presented at ICAME25 (25th Conference of the International Computer Archive of Modern and Medieval English), Verona (Italy), 19-23 May 2004.

De Smet, Hendrik. 2004. The development of for...to-infinitives. Paper presented at ICEHL13 (13th International Conference on English Historical Linguistics), Vienna (Austria), 23-28 August 2004.

De Smet, Hendrik. 2005. A corpus of Late Modern English. ICAME-Journal.

De Smet, Hendrik and Hubert Cuyckens. 2004a. Pragmatic strengthening and the meaning of complement constructions. The case of like and love with the infinitive. Preprint. University of Leuven: Department of Linguistics.

De Smet, Hendrik and Hubert Cuyckens. 2004b. A diachronic perspective on the variation between gerunds and infinitives as verbal complements. Paper presented at ICAME25 (25th Conference of the International Computer Archive of Modern and Medieval English), Verona (Italy), 19-23 May 2004.

Vanden Eynde, Martine. 2004. Edge-noun expressions as markers of imminence. A case of grammaticalization. Unpublished MA Thesis. University of Gent: Department of Linguistics.