Enhanced variable selection for distributional regression

Annika Strömer; Leonie Weinhold; Christian Staerk; Stefanie Titze; Nadja Klein; Andreas Mayr

Ayuda

Enhanced variable selection for distributional regression

Annika Strömer ^[1] ; Leonie Weinhold ^[1] ; Christian Staerk ^[1] ; Stefanie Titze ^[3] ; Nadja Klein ; Andreas Mayr ^[2]
1. [1] University of Bonn
  
  University of Bonn
  
  Kreisfreie Stadt Bonn, Alemania
2. [2] Humboldt University of Berlin
  
  Humboldt University of Berlin
  
  Berlin, Stadt, Alemania
3. [3] FAU Erlangen-Nuremberg
Mostrar afiliaciones +
Localización: Proceedings of the 35th International Workshop on Statistical Modelling : July 20-24, 2020 Bilbao, Basque Country, Spain / Itziar Irigoien Garbizu (ed. lit.), Dae-Jin Lee (ed. lit.), Joaquín Martínez Minaya (ed. lit.), María Xosé Rodríguez Álvarez (ed. lit.), 2020, ISBN 978-84-1319-267-3, págs. 233-237
Idioma: inglés
Enlaces
- Texto Completo Libro
Resumen
- We present an approach for enhanced variable selection for distributional regression via component-wise boosting. Boosting is an alternative method for tting regression models and is applicable for high-dimensional data problems.
  
  Furthermore, the algorithm leads to data-driven variable selection. In practice, however, the algorithm still tends to select too many variables in some situations including false positives. This occurs particularly for low-dimensional data (p < n) in which case we observe a slow over tting behavior. Due to the slow over tting, the stopping iteration gets larger and more variables get included in the model. Many of the false positives are incorporated with a small coecient and therefore have a small impact, but lead to a larger model with dicult interpretation.
  
  We try to overcome this issue by giving the algorithm the chance to de-select those variables. We consider the impact on variable selection and prediction and additionally compare the new approach to the One Standard Error Rule.