Coimbra (Sé Nova), Portugal
Schizophrenia is a complex disease with severely disabling symptoms. A consistent leading causal gene for the disease onset has not been found. There is also a lack of consensus on the disease etiology and diagnosis. Sweden poses a paradigmatic case, where relatively high misdiagnosis rates (19%) have been reported.
A large-scale case-control dataset based on the Swedish population was reduced to its most representative variants and the distinction between cases and controls was further scrutinized through gene-annotation based Machine Learning (ML) models.
The intra-group differences on cases and controls were accentuated by training the model on the entire dataset. The cases and controls with a higher likelihood to be misclassified, and hence more likely to be misdiagnosed were excluded from subsequent analysis. The model was then conventionally trained on the reduced dataset and the performances were compared.
The results indicate that the reported prevalence and misdiagnosis rates for Schizophrenia may be transposed to case-control cohorts, hence, reducing the performance of eventual association studies based on such datasets. After the sample filtering procedure, a simple Machine Learning model reached a performance more concurrent with the Schizophrenia heritability estimates on the literature.
Sample selection on large-scale datasets sequenced for Association Studies may enable the adaptation of ML approaches and strategies to complex studies research
© 2001-2026 Fundación Dialnet · Todos los derechos reservados