I currently work outside of academia and hold an honorary position at the KU Leuven.

  • 2013-...: Honorary Research Fellow, University of Leuven, Belgium
  • 2012-2014: Postdoctoral researcher, Humboldt Universität zu Berlin, Deutschland (DFG-Projektstelle)
  • 2008-2012: Doctor in Linguistics, University of Leuven, Belgium
  • 2011: Visiting scholar, University of Freiburg, Germany (FWO-funded)
  • 2007-2008: Master of Linguistics (Computer linguistics und variational linguistics), University of Leuven, Belgium
  • 2006-2007: Erasmus Exchange semester at Freie Universität Berlin and Humboldt Universität Berlin, Germany
  • 2004-2007: Bachelor of Arts (Dutch und English), University of Leuven, Belgium
  • 1985: Born in Belgium

Full academic cv as pdf.


Publications and pre-print pdfs are also available via the KU Leuven Lirias repository.

  • In preparation (comments welcome)
    • Ruette, T., and Grieve, J. (In preparation) Regional variation in source domains for Dutch invectives in Twitter. [pdf, data, corpus]
  • Proceedings and other academic outlets
    • Mikolajczak, S., Ruette, T., Tsiporkova, E., Angelova, M. and Boeva, V. (2015) A semantic reasoning engine for lifestyle profiling in support of personalised coaching. Proceedings of the fourth international conference on global health challenges
    • Dagnely, P., Ruette, T., Tourwé, T. and Tsiporkova, E. (2015) Predicting hourly energy consumption. Can you beat an autoregressive model? In: Proceedings of BENELEARN 2015
    • Dagnely, P., Ruette, T., Verhelst, C. (2015) A semantic model of events for Integrating photovoltaic monitoring data. In: Proceedings of INDIN 2015
    • Daems, J., Speelman, D. and Ruette, T. (2014). Register analysis in blogs : Correlation between professional sector and functional dimensions. In: Oben, B. (ed.) Leuven Working Papers in Linguistics 2(1): 1–27. [pdf]
    • Ruette, T. and Speelman, D. (2012) Applying Individual Differences Scaling to the measurement of lexical convergence between Netherlandic and Belgian Dutch. In: Dister, A., Longree, D. and Purnelle, G. (eds.) Proceedings of the 11th International Conference on Textual Data Statistical Analysis: 883-895. [pdf]
    • Ruette, T., Speelman, D. and Geeraerts, D. (2010) Tell me who you talk to, and I'll tell you how you talk: Comparing the language use of two interaction based clusters of people in a single Usenet newsgroup. Proceedings of the conference on Cognitive Sociolinguistics: 315-339. [pdf]
  • Popular press
    • Dit zijn dé peetvaders van de Belgische startup mafia (with Omar Mohout), Bloovi, De Standaard, March 2016.
    • The profile of Artemis co-summit 2015 participants (with Nicolás Gonzalez-Deleito) ARTEMIS newsletter, April 2015.
    • A Battle, not the War. A view on The Linguistic War from fifty years ago. Babel, May 2014 [pdf]
    • De appel valt niet ver van de boom: afstanden meten tussen taalvariëteiten. Over taal, January/February 2013 [pdf]
    • Vlaanderen ligt qua woordkeuze dichter bij Nederland dan dorpsstraat bij Wetstraat. Campuskrant, October 2012 [pdf]
  • Extended writing
    • PhD Dissertation: Aggregating Lexical Variation, towards large-scale lexical lectometry. [pdf]
    • Blog: Corpus Linguistic Methods: An Introduction. [link]
  • 2015
    • A semantic reasoning engine for lifestyle profiling in support of personalised coaching Global Health 2015, Nice, France, July 2015
    • ICT for eHealth Panel session at Global Health 2015, Nice, France, July 2015
    • A semantic model of events for integrating photovoltaic monitoring data INDIN 2015, Cambridge, UK, July 2015
    • Swearing in Dutch with diseases ICLC, Newcastle, UK, July 2015
    • Predicting hourly energy consumption. Can you beat an autoregressive model? BENELEARN 2015, Delft, The Netherlands, June 2015
  • 2014
    • Referenzkorpus Altdeutsch and Laudatio. And everything in between. Laudatio workshop on digital humanities, Berlin, Germany, October, 2014
    • Compiling and annotating historical corpora. Computational Linguistics AG, DGfS Conference Marburg, Germany, March 2014
    • Cognitive Sociolinguistics and Twitter. Why do the Dutch swear with diseases? Web as a corpus for Theoretical Linguistics AG, DGfS Conference Marburg, Marburg, Germany, March 2014
    • Korpus II: Korpusauswertung. 5. Methodenworkshop HU Berlin, Berlin, Germany, February 2014
    • Why do the Dutch swear with diseases? Corpus linguistic Colloquium HU Berlin, Berlin, Germany, February 2014
  • 2013
    • Corpus-based historical linguistics. Convenor of the international workshop on corpus-based historical linguistics, Berlin, Germany, September 2013
    • Korpus II: Korpusauswertung. 4. Methodenworkshop HU Berlin, Berlin, Germany, February 2013
  • 2012
    • A quantitative approach to find patterns in variables and varieties: lexical variation in Belgian and Netherlandic Dutch. International Conference on Textual Data Statistical Analysis, Liège, Belgium, June 2012
    • North American dialect regions and their vowels. Leuven Statistics Days, Leuven, Belgium, June 2012
    • A bottom-up approach to multilectal variation in the lexicon of written Standard English. International Computer Archive of Modern and Medieval English (ICAME) 33, Leuven, Belgium, May 2012
    • Aggregating Lexical Variation: towards large-scale lexical lectometry. Maxipresentation (pre-defense) in the doctoral school of humanities, Leuven, Belgium, May 2012
    • A quantitative approach to find patterns in variables and varieties: lexical variation in Belgian and Netherlandic Dutch. Linguistic Evidence, Tübingen, Germany, February 2012
  • 2011
    • Disease inspired expletives in Dutch due to entrenched Calvinism. Corpus-based evidence from Twitter. Workshop on quantitative methods in geolinguistics, Freiburg, Germany, December 2011
    • With Kris Heylen. Degrees of semantic control in measuring lexical distances. Workshop on comparing approaches to measuring linguistic differences, Gothenburg, Sweden, September 2011
    • Attacking two issues in lexicon-based sociolectometric studies: biased variable sampling and individual pattern loss. ICLAVE Conference, Freiburg, Germany, June 2011
    • With Jack Grieve. Socio-functional variation in spoken Dutch. Stylistics across disciplines conference, Leiden, The Netherlands, June 2011
    • Aggregating and interpreting lexical alternation variables. Benefits of Weighted Multidimensional Scaling for lectal categorization. Quantitative Investigations in Theoretical Linguistics 4, Berlin, Germany, March 2011
    • With Dirk Geeraerts. Scaling it up. Generating a large set of lexical variables for a comprensive study of language varieties in Dutch. Cross-linguistic and language-internal variation in text and speech: focus on the joint analysis of multiple characteristics, Freiburg, Germany, 2 February 2011
  • 2010
    • Toward a multivariate account of Belgian Dutch and Netherlandic Dutch. Plurilang 2010, Braga, Portugal, 15 September 2010
    • Sorting out variables and their variation. Sociolinguistic Symposium 18, Southampton, United Kingdom, 1 September 2010
    • Multivariate and Aggregated. LOT Summer School, Nijmegen, The Netherlands, 14 June 2010
    • COSIC Seminar on privacy issues for demographic profiling with linguistic means. Weekly seminars at COSIC, Leuven, Belgium, 29 April 2010
    • What’s going on? Groups of interacting people in Usenet. Laud 34 Symposium, Landau, Germany, 12 March 2010
    • Convergentie en Divergentie in het Nederlands. Paneldiscussie Freie Universitaet Berlin/Niederlandistik, Berlin, Germany, 5 January 2010
  • 2009
    • Large-scale and multidimensional. A socio-situational approach to language variation. PhD Day, Leuven, Belgium, 4 September 2009
    • “De Standaard” en “Het Belang van Limburg”. Elke krant zijn eigen taalvariëteit? BKL Taaldag, Brussel, België, 16 May 2009
    • Automatic Text Categorization: Adding Syntactic Knowledge. Computational Linguistics in the Netherlands, Groningen, The Netherlands, 22 January 2009
  • 2008
    • Automatische Tekstcategorizatie. Hoe kan de taalkunde helpen? Studentencongres, Berlin, Germany, 22 May 2008
  • Funding, grants and organisations
    • 2014-2016: Collaborator on several H2020-ECSEL, IWT-SBO, Innoviris-Doctiris, VLAIO-KMO-Innovatieproject, ...
    • 2013: HU Berlin Philosophical Faculty II competition grant to organize a workshop on Corpus-based Historical Linguistics.
    • 2012: Belgian American Education Foundation scholarship for one year post-doc research at University of California Santa Barbara (overruled to start working at HU Berlin)
    • 2011: Co-organizer of the Workshop on quantitative methods in geolinguistics, held in the framework of the Hermann Paul School of Linguistics at the University of Freiburg
    • 2011: FWO (Flanders Research Council) two month travel grant for research stay at the Graduiertenkolleg DFG Frequenzeffekte in der Sprache, University of Freiburg
    • 2011: Co-organizer of LOT Summerschool
    • 2009: Co-organizer of PhD-day
    • 2008: DAAD scholarship for a one year M.Sc Computation Linguistics in Potsdam (overruled to start a PhD at KULeuven)
    • 2006: ERASMUS scholarship for a one semester exchange to the FU Berlin.
  • Teaching
    • 2015-2016: Mastercourse Data Innovation
    • February 2014: 5. Methodenworkshop - Korpus II
    • WS 2013/2014: System, Struktur, Variation - Seminar im Master Germanistik, Modul 4 (HU Berlin)
    • 26.2.2013: 4. Methodenworkshop - Korpus II
    • 2008-2013: occasional replacement lecturer for Prof. Dr. Dirk Geeraerts (KU Leuven) and Prof. Dr. Anke Lüdeling (HU Berlin)
  • Peer-review
    • Deliverables in several EU projects (Artemis, H2020-ECSEL)
    • Corpora (Edinburg University Press
    • Fernand Brodel Fellowship
    • Functions of language (Benjamins)
    • Trends in Linguistics: Studies and Monographs (de Gruyter)
    • Language Resources and Evaluation (Springer)
    • Linguae et Litterae series (de Gruyter)
    • Psychometrica (Springer)
    • Rodopi
  • Program committees
    • 2013: fifth installment of the QITL conferences, organized by Dirk Speelman, Dirk Geeraerts, Kris Heylen (University of Leuven), Gert De Sutter (University College Ghent), and Timothy Colleman (Ghent University). Cognitive Aspects of the Lexicon
    • 2012: workshop at the 24th International Conference on Computational Linguistics (COLING), organized by Michael Zock (LIF_CNRS, Marseille, France) and Reinhard Rapp (University of Mainz, Germany).
Data and analysis
  • Datasets
    • Semi-automatically detected lexical variation in the Brown corpora, INDSCAL R code [download]
    • Old High German endings of dative plural in a-stems, INDSCAL R code [download]
    • Invectives in Dutch Twitter messages for about 200 locations in Flanders and the Netherlands [download]
    • Real(ly) good|bad in the Corpus of Historical American English, introduction R code [download]
    • Abstracta or concreta after article-alike determiners in Old High German [download]
    • Phonologically reduced pronominal subjects in inversion across clothing stores in Berlin [download]
    • Variable frequencies of Geeraerts et al. 1999 [download]
  • Corpora
    • Moroccorp: chatlogs from two Moroccan oriented, but Dutch IRC channels (with Freek van de Velde)
      • Cite: Ruette, T. and van de Velde, F. (2013) Moroccorp: tien miljoen woorden uit twee Marokkaans-Nederlandse chatkanalen. Lexikos 23: 456-475 [pdf]
      • Download moroccorp in flat text.
      • A blog post on how to search the corpus with a simple Python script.
    • Dutch Regional Twitter Corpus
      • A Twitter corpus of Dutch with reported regional information about the Twitter users in an XML format.
    • Deutsch Diachron Digital - Altdeutsch
  • Programming
    • R package for lectometric analyses (in preparation) [git]
    • Python script for making Twitter corpora [git]
    • Python script for making phpBB corpora [git]
    • Python script for searching moroccorp [git]
    • Python script for making RSS-based corpora (in preparation) [git]

Tom Ruette
Honorary Research Fellow

Quantitative Lexicology and Variational Linguistics
University of Leuven, Belgium
Blijde Inkomststraat 21, P.O. Box 3308
B-3000, Leuven, Belgium

E-mail: tom (dot) ruette (at) kuleuven (dot) be
Phone: +32 474 400 112

Links: Twitter, Wordpress