Predicting Psychological Outcomes of Digital Life: A Machine Learning Analysis on European Data

Supervisors: Sebastien Chappuis, Michael Papinutto, Simon Ruffieux, Chantal Martin Sölch, Denis Lalanne

Contact person: Simon Ruffieux

Student: Diogo Rocha Moreira

Project status: Finished

Year: 2024

Investigating the use of ML methods to predict outcomes of digital life behaviour based on a large quantity of questionnaires. The work is at the frontier of psychology and computer-sciences; using the latests advances in Machine Learning to support analysis and investigations of data collected through mutliple questionnaires.

The last decades have seen the appearance and increase in digital technology use as well as the rise of concerns concerning their potential impact in psychological distress and time pressure. While the increasing amount of data allows a more extended analysis of such relationships, few researchers have attempted to use regression-based machine learning pipelines to analyze larger datasets. The present thesis investigates the use of machine learning algorithms to predict psychological distress (DASS) and chronic time pressure (CTPI) from psychometric and socio-demographic variables. This approach focuses on predictive accuracy and interpretability through a regression-based pipeline. Five regression algorithms, Random Forest, Extra Trees, XGBoost, LightGBM, and Linear Regression, were evaluated through extensive hyperparameter optimization and statistical testing. Results showed that boosting algorithms (XGBoost for DASS and LightGBM for CTPI) significantly outperformed the Linear Regression baseline. SHAP values revealed that for DASS, problematic internet use was a strongly positive predictor of distress. For CTPI, quality of digital experience had low positive predictive value, which went against initial suppositions. A cross-coutnry analysis revealed substantial variation in the importance of predictors which highlights potential cultural influences on the effects of digitalization. Overall, the study highlights the value of complex machine learning algorithms for larger psychometric datasets. However, careful tuning of the models are required to avoid overfitting and the interpretation of SHAP values depends greatly on the generalization capability of the model.

Keywords: Machine Learning, Qestionnaire Analysis, Item Prediction, Psychology.

Document: report.pdf