Artificial Intelligence (AI) is rapidly advancing and transforming the way we work, communicate, and live, with an increasing use in multiple fields. In mental health, AI has great potential to address unique challenges, such as the lack of objective markers for psychological assessment and the reliance on self-reports and clinical observations which are prone to bias. As efforts continue to improve clinical evaluation, AI provides a pathway to uncover quantifiable links between behavior and psychological functioning.
Among the behavioral signals studied in the literature, speech is as a particularly rich and non-invasive source of information. Speech production is highly sensitive to psychological changes, capturing both conscious and unconscious manifestations of mental states. These changes are observable across two primary dimensions: the acoustic properties of speech, i.e. how something is said; and its linguistic content, i.e. what is said. Consequently, the application of speech and text processing techniques enables the identification of meaningful patterns associated with mental health conditions.
This thesis investigates the use of AI and speech analysis to assess three constructs that span different dimensions of human psychology: attachment style, an internal stable state that shapes how individuals behave in their close relationships across the lifespan; emotions, which are internal transitory states that fluctuate in response to context; and depression, a mental health disorder characterized by persistent disturbances in mood, cognition, and behavior. By examining speech as a window into each of these dimensions, this thesis adopts a data-driven approach to psychological assessment.
Dedicated datasets were carefully designed and collected to study each of the psychological constructs. For attachment, a remote system was used to collect speech responses to open-ended questions. For emotions, the novel EMOVOME dataset was created and publicly released, featuring spontaneous voice messages from real-life conversations. For depression, the DEPTALK dataset was collected using an innovative system of virtual humans designed to engage in casual, open-ended conversation. Leveraging these datasets, AI models were developed using both acoustic and linguistic features. The modeling approaches ranged from traditional machine learning with hand-crafted features to cutting-edge deep learning and foundation models, reflecting the evolving landscape of AI. These models were evaluated using quantitative and qualitative analyses to ensure performance and interpretability. Furthermore, fairness was evaluated by examining gender bias in AI models.
Overall, this thesis highlights the potential of speech as an objective marker and underscores the value of AI-based methodologies in advancing psychological assessment. By addressing a range of psychological dimensions¿stable and transitory states, and clinical disorders¿this work contributes to the development of objective and scalable tools that can assist clinicians in the detection, understanding, and monitoring of mental health conditions.
© 2001-2026 Fundación Dialnet · Todos los derechos reservados