References

Ackoff, R. L. (1989). From data to wisdom. Journal of Applied Systems Analysis, 16, 3–9.
Ädel, A. (2020). Corpus compilation. In M. Paquot & S. Th. Gries (Eds.), A practical handbook of corpus linguistics (pp. 3–24). Switzerland: Springer.
Almeida, T. A., G’omez Hildago, J. M., & Yamakami, A. (2011). Contributions to the study of SMS spam filtering: New collection and results. In Proceedings of the 2011 ACM symposium on document engineering (DOCENG’11) (p. 4). Mountain View, CA.
Baayen, R. Harald. (2004). Statistics in psycholinguistics: A critique of some current gold standards. Mental Lexicon Working Papers, 1, 1–47.
Baayen, R. Harald. (2008). Analyzing linguistic data: A practical introduction to statistics using r. Cambridge Univ Pr.
Baayen, R. Harald. (2011). Corpus linguistics and naive discriminative learning. Revista Brasileira de Lingu\’\istica Aplicada, 11, 295–328.
Baayen, R. H., & Shafaei-Bajestan, E. (2019). languageR: Analyzing linguistic data: A practical introduction to statistics. Retrieved from https://CRAN.R-project.org/package=languageR
Bao, W., Lianju, N., & Yue, K. (2019). Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Systems with Applications, 128, 301–315.
Benoit, K. (2020). Quanteda.corpora: A collection of corpora for quanteda. Retrieved from http://github.com/quanteda/quanteda.corpora
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., … Lowe, W. (2022). Quanteda: Quantitative analysis of textual data. Retrieved from https://quanteda.io
Broman, K. W., & Woo, K. H. (2018). Data organization in spreadsheets. The American Statistician, 72, 2–10.
Brown, K. (2005). Encyclopedia of language and linguistics (Vol. 1). Elsevier.
Buckheit, J. B., & Donoho, D. L. (1995). Wavelab and reproducible research. In Wavelets and statistics (pp. 55–81). Springer.
Bychkovska, T., & Lee, J. J. (2017). At the same time: Lexical bundles in L1 and L2 university student argumentative writing. Journal of English for Academic Purposes, 30, 38–52.
Carmi, E., Yates, S. J., Lockley, E., & Pawluczuk, A. (2020). Data citizenship: Rethinking data literacy in the age of disinformation, misinformation, and malinformation. Internet Policy Review, 9.
Chambers, J. M. (2020). S, r, and data science. Proceedings of the ACM on Programming Languages, 4, 1–17.
Chan, S. (2014). Routledge encyclopedia of translation technology. Routledge.
Conway, L. G., Gornick, L. J., Burfeind, C., Mandella, P., Kuenzli, A., Houck, S. C., & Fullerton, D. T. (2012). Does complex or simple rhetoric win elections? An integrative complexity analysis of u.s. Presidential campaigns. Political Psychology, 33, 599–618.
Cross, N. (2006). Design as a discipline. Designerly Ways of Knowing, 95–103.
Data never sleeps 7.0 infographic. (2019). https://www.domo.com/learn/infographic/data-never-sleeps-7.
Deshors, S. C., & Gries, S. Th. (2016). Profiling verb complementation constructions across new englishes. International Journal of Corpus Linguistics., 21, 192–218.
Desjardins, J. (2019, April). How much data is generated each day? Visual Capitalist.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26, 745–766.
Dubnjakovic, A., & Tomlin, P. (2010). A practical guide to electronic resources in the humanities. Elsevier.
Eisenstein, J., O’Connor, B., Smith, N. A., & Xing, E. P. (2012). Mapping the geographical diffusion of new words. Computation and Language, 1–13.
Gandrud, C. (2015). Reproducible research with r and r studio (Second edition.). CRC Press.
Gentleman, R., & Temple Lang, D. (2007). Statistical analyses and reproducible research. Journal of Computational and Graphical Statistics, 16, 1–23.
Gilquin, G., & Gries, S. T. (2009). Corpora and experimental methods: A state-of-the-art review. Corpus Linguistics and Linguistic Theory, 5, 1–26.
Gomez-Uribe, C. A., & Hunt, N. (2015). The netflix recommender system: Algorithms, business value, and innovation. ACM Transactions on Management Information Systems (TMIS), 6, 1–19.
Gries, S. Th. (2013). Statistics for linguistics with r. A practical introduction (2nd revise.).
Grieve, J., Nini, A., & Guo, D. (2018). Mapping lexical innovation on american social media. Journal of English Linguistics, 46, 293–319.
Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLOS Biology, 13, e1002106.
How to make a data dictionary. (2021). OSF Guides. https://help.osf.io/hc/en-us/articles/360019739054-How-to-Make-a-Data-Dictionary.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 168–177).
Ignatow, G., & Mihalcea, R. (2017). An introduction to text mining: Research design, data collection, and analysis. Sage Publications.
Jaeger, T. F., & Snider, N. (2007). Implicit learning and syntactic persistence: Surprisal and cumulativity. University of Rochester Working Papers in the Language Sciences, 3.
Jurafsky, D., & Martin, J. H. (2020). Speech and language processing.
Kearney, M. W. (2020). Rtweet: Collecting twitter data. Retrieved from https://CRAN.R-project.org/package=rtweet
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and social psychology review, 2, 196–217.
Kloumann, I., Danforth, C., Harris, K., & Bliss, C. (2012). Positivity of the english language. PloS one.
Kowsari, K., Meimandi, K. J., Heidarysafa, M., Mendu, S., Barnes, L. E., & Brown, D. E. (2019). Text classification algorithms: A survey. Information, 10, 150.
Kucera, H., & Francis, W. N. (1967). Computational analysis of present day american english. Brown University Press Providence.
Lantz, B. (2013). Machine learning with r. Birmingham: Packt Publishing.
Lewis, M. (2004). Moneyball: The art of winning an unfair game. WW Norton & Company.
Lozano, C. (2009). CEDEL2: Corpus escrito del español L2. Applied Linguistics Now: Understanding Language and Mind/La Lingüística Aplicada Hoy: Comprendiendo el Lenguaje y la Mente. Almería: Universidad de Almería, 197–212.
Marwick, B., Boettiger, C., & Mullen, L. (2018). Packaging data analytical work reproducibly using r (and friends). The American Statistician, 72, 80–88.
Millikan, R. A. (1923). The electron and the light-quant from the experimental point of view. Nobel prize acceptance speech.
Mosteller, F., & Wallace, D. L. (1963). Inference in an authorship problem. Journal of the American Statistical Association, 58, 275–309.
Olohan, M. (2008). Leave it out! Using a comparable corpus to investigate aspects of explicitation in translation. Cadernos de Tradução, 153–169.
Paquot, M., & Gries, S. Th. (Eds.). (2020). A practical handbook of corpus linguistics. Switzerland: Springer.
Roediger, H. L. L., & McDermott, K. B. B. (2000). Distortions of memory. The Oxford handbook of memory, 149–162.
Saxena, S., & Gyanchandani, M. (2020). Machine learning methods for computer-aided breast cancer diagnosis using histopathology: A narrative review. Journal of medical imaging and radiation sciences, 51, 182–193.
Talarico, J. M., & Rubin, D. C. (2003). Confidence, not consistency, characterizes flashbulb memories. Psychological Science, 14, 455–461.
Voigt, R., Camp, N. P., Prabhakaran, V., Hamilton, W. L., Hetey, R. C., Griffiths, C. M., … Eberhardt, J. L. (2017). Language from police body camera footage shows racial disparities in officer respect. Proceedings of the National Academy of Sciences, 114, 6521–6526.
Welbers, K., & van Atteveldt, W. (2022). Rsyntax: Extract semantic relations from text by querying and reshaping syntax. Retrieved from https://CRAN.R-project.org/package=rsyntax
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59. doi:10.18637/jss.v059.i10
Wickham, H. (2021). Rvest: Easily harvest (scrape) web pages. Retrieved from https://CRAN.R-project.org/package=rvest
Wulff, S., Stefanowitsch, A., & Gries, S. Th. (2007). Brutal brits and persuasive americans. Aspects of Meaning.