Variable selection after screening: with or without data splitting?

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

Variable selection after screening: with or without data splitting?

详细信息查看全文

作者：Xiaoyi Zhu ; Yuhong Yang
关键词：Model selection ; Sparse regression ; Variable screening ; Prediction
刊名：Computational Statistics
出版年：2015
出版时间：March 2015
年：2015
卷：30
期：1
页码：191-203
全文大小：165 KB
参考文献：1. Breheny, P, Huang, J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann Appl Stat 5: pp. 232-253 CrossRef
2. Bühlmann, P, Mandozzi, J (2014) High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Comput Stat 29: pp. 407-430 CrossRef
3. Chen, L, Yang, Y Combining statistical procedures. Frontiers of Statistics. In: Cai, T, Shen, X eds. (2010) High-dimensional data analysis. World Scientific Publishing, Singapore CrossRef
4. Clarke, B (2003) Comparing Bayes and non-Bayes model averaging when model approximation error cannot be ignored. J Mach Learn Res 4: pp. 683-712
5. Fan, J, Li, R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96: pp. 1348-1360 CrossRef
6. Fan, J, Lv, J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70: pp. 849-911 CrossRef
7. Hoeting, J, Madigan, D, Raftery, A, Volinsky, C (1999) Bayesian model averaging: a tutorial (with discussion). Stat Sci 14: pp. 382-417 CrossRef
8. Huang, J, Ma, S, Zhang, C (2008) Adaptive Lasso for sparse high-dimensional regression models. Stat Sin 18: pp. 1603-1618
9. Leng, C, Wang, H (2008) Discussion on “Sure independence screening for ultrahigh dimensional feature space- J R Stat Soc Ser B 70: pp. 849-911 CrossRef
10. Meinshausen, N, Meiera, L, Bühlmann, P (2009) $$p$$ p -values for high-dimensional regression. J Am Stat Assoc 104: pp. 1671-1681 CrossRef
11. Scheetz, TE, Kim, K-YA, Swiderski, RE, Philip, AR, Braun, TA, Knudtson, KL, Dorrance, AM, DiBona, GF, Huang, J, Casavant, TL, Sheeld, VC, Stone, EM (2006) Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc Natl Acad Sci 103: pp. 14429-14434 CrossRef
12. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267-88
13. Wasserman, L, Roeder, K (2009) High-dimensional variable selection. Ann Stat 37: pp. 2178-2201 CrossRef
14. Yang, Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92: pp. 937-950 CrossRef
15. Zhang, C (2010) Nearly unbiased variables selection under minimax concave penalty. Ann Stat 38: pp. 894-942 CrossRef
16. Zhang, W, Xia, Y (2008) Discussion on “Sure independence screening for ultrahigh dimensional feature space- J R Stat Soc Ser B 70: pp. 849-911 CrossRef
刊物类别：Mathematics and Statistics
刊物主题：Mathematics
Statistics
Statistics
Probability and Statistics in Computer Science
Probability Theory and Stochastic Processes
Economic Theory
出版者：Physica Verlag, An Imprint of Springer-Verlag GmbH
ISSN：1613-9658

文摘

High dimensional data sets are now frequently encountered in many scientific fields. In order to select a sparse set of predictors that have predictive power and/or provide insightful understanding on which predictors really influence the response, a preliminary variable screening is typically done often informally. Fan and Lv (J R Stat Soc Ser B 70:849-11, 2008) proposed sure independence screening (SIS) to reduce the dimension of the set of predictors from ultra-high to a moderate scale below the sample size. Then one may apply a familiar variable selection technique. While this approach has become popular, the screening bias issue has been mainly ignored. The screening bias may lead to the final selection of a number of predictors that have no/little value for prediction/explanation. In this paper we set to examine this screening bias both theoretically and numerically compare the approach with an alternative that utilizes data splitting. The simulation results and real bioinformatics examples show that data splitting can significantly reduce the screening bias for variable selection and improve the prediction accuracy as well.

网站地图　|　常见问题　|　交通位置　|　联系我们　|　OA远程办公　|　English

© 2004-2018 中国地质图书馆版权所有京ICP备05064691号京公网安备11010802017129号

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700