General description
This module introduces participants to aspects and applications of predictive modeling that go a bit beyond what is maybe normally encountered in modeling in statistical methods in linguistics. Sessions 1 to 3 deal with regression modeling. In particular, session 1 deals with how to extract more useful information from regression models’ summaries and coefficients via multiple post-hoc testing, a priori orthogonal contrasts, and general linear hypothesis tests. Session 2 introduces ways to deal with non-linearities in linear models and discusses multinomial models. Session 3 introduces Poisson regression for count data and touches upon cross-validation for all kinds of models. Sessions 4 and 5 mostly deal with tree-based approaches. In particular, they deal with classification (and regression) trees and random forests (the latter with an emphasis on their interpretation); time permitting, the course will end with a brief theoretical overview of why mixed-effects or multi-level modeling is so important for both experimental and corpus-linguistic research.
Target audience
Empirical researchers at any career level in linguistics (from advanced undergraduates to senior researchers) who feel they want to learn more than just the basics of predictive modeling.
Course prerequisites
- A good understanding of regression modeling in R
- An ability to work with data loaded from .csv files into R in an IDE of your choice (I will use RStudio, but Positron or MS Visual Studio Code are fine as well)
- Up-to-date versions of R (i.e. at least version 4.5), its packages, and the IDE must be installed prior to the course
Course materials
Slides and knitted HTML reports will be provided.
Recommended but optional handbook: Gries, Stefan Th. (2021). Statistics for Linguistics with R. 3rd rev. & ext. ed. Boston & Berlin: De Gruyter.
Teacher bio
Stefan Th. Gries is Distinguished Professor of Linguistics in the Department of Linguistics at the University of California, Santa Barbara (UCSB) and Chair of English Linguistics (Corpus Linguistics with a focus on quantitative methods, 25%) in the Department of English at the Justus-Liebig-Universität Giessen. He earned his M.A. and Ph.D. degrees at the University of Hamburg, Germany in 1998 and 2000 and his Habilitation/Venia Legendi at the University of Marburg in 2024. Since 1999, he has published >200 items, has given >350 talks (>160 invited), taught ≈100 invited bootcamps and workshops, performs editorial functions for >15 journals and book series, and has reviewed >600 papers, grant proposals, and personnel cases. According to Google Scholar (15 Jan 2026), his work has been cited ≈26,500 times, with an h-index of 79.
Schedule
- Monday 13/07/2026, 9:00-10:30 & 11:00-12:30
- Tuesday 14/07/2026, 9:00-10:30 & 11:00-12:30
- Wednesday 15/07/2026, 9:00-10:30 & 11:00-12:30
- Thursday 16/07/2026, 9:00-10:30 & 11:00-12:30
- Friday 17/07/2026, 9:00-10:30 & 11:00-12:30
In addition to these contact hours this module expects maybe 1-2 hours per day for self-study, going over HTML reports that summarize the last session and may provide additional background info.