Research Article: Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders

Date Published: June 20, 2019

Publisher: Public Library of Science

Author(s): Wei Pan, Jonathan Flint, Liat Shenhav, Tianli Liu, Mingming Liu, Bin Hu, Tingshao Zhu, Zezhi Li.


A large proportion of Depression Disorder patients do not receive an effective diagnosis, which makes it necessary to find a more objective assessment to facilitate a more rapid and accurate diagnosis of depression. Speech data is easy to acquire clinically, its association with depression has been studied, although the actual predictive effect of voice features has not been examined. Thus, we do not have a general understanding of the extent to which voice features contribute to the identification of depression. In this study, we investigated the significance of the association between voice features and depression using binary logistic regression, and the actual classification effect of voice features on depression was re-examined through classification modeling. Nearly 1000 Chinese females participated in this study. Several different datasets was included as test set. We found that 4 voice features (PC1, PC6, PC17, PC24, P<0.05, corrected) made significant contribution to depression, and that the contribution effect of the voice features alone reached 35.65% (Nagelkerke's R2). In classification modeling, voice data based model has consistently higher predicting accuracy(F-measure) than the baseline model of demographic data when tested on different datasets, even across different emotion context. F-measure of voice features alone reached 81%, consistent with existing data. These results demonstrate that voice features are effective in predicting depression and indicate that more sophisticated models based on voice features can be built to help in clinical diagnosis.

Partial Text

Depression disorder is the commonest psychiatric disorder (the lifetime prevalence reaching 16.2%) [1–7] and is now the leading cause of disability [8]. Yet despite its prevalence, many cases of depression remain unrecognized [9], depriving many of the possibility of receiving effective treatment. Fewer than half of those eligible receive treatment and in many countries the figure is less than 10% [10]. One of the main obstacles in the way of treatment provision is the difficulty of recognizing and diagnosing depression. Diagnosis currently requires interview by a clinician, often over half an hour or more, a method that rarely exceeds an inter-rater reliability of 0.7 (kappa coefficient) [11]; in one large field study reliability was estimated to be as low as 0.25 [12]. According to a meta-analysis of 41 studies [13], community doctors had an accuracy of 47% in recognizing patients with depression recognition accuracy of depression by general practitioners is 47.3%. Methods to identify cases of depression that can be deployed at an appropriate scale are needed.

In this study, the relationships among voice features, demographic variables, and depression were systematically examined. Our findings addressed the primary goals of the study finding that (1) voice features make a significant contribution to the prediction of depression; (2) In regression model, the model fitness of voice data alone is significantly better than the model fitness of demographic data alone; (3) Depression classification models built on voice features are effective with generalization ability; (4) Depression classification models built on voice features are robust across different emotion situations.

Taken together, our findings suggest that voice features play an important role in predicting depression. In addition, demographic variables should be valued in future research. Our results contribute to our understanding of the actual effect of voice features on depression. This research provides a foundation to further explore more robust classification models, as well as to identify related voice features to build more robust models and exploit the clinical application value of voice features to the fullest.




0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments