Clustering of variables for enhanced interpretability of predictive models.
Abstract
Full Text:
PDFReferences
Algamal, Z. and M. Lee (2019). A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classication. Adv. Data Anal. Classif. 13,753-771.
Bondell, H. D. and B. J. Reich (2008). Simultaneous regression shrinkage, variable selection and clustering of predictors with OSCAR. Biometrics 64, 115-123.
Breiman, L. (2001). Random forests. Mach. Learn. 45, 5-32.
Bühlmann, P. and T. Hothorn (2007). Boosting algorithms: Regularization, prediction and model settting. Stat. Sc. 22, 477-505.
Bühlmann, P., P. Rutimann, S. van de Geer, and C.-H. Zhan (2013). Correlated variables in regression: clustering and sparse estimation. J. Stat. Plan. Infer. 143, 1835-1858.
Celeux, G., C. Maugis-Rabusseau, and M.Sedki (2019). Variable selection in model-based clustering and discriminant analysis with a regularization approach. Adv. Data Anal. Classif. 13, 259-278.
Chakraborty, S. and A. C. Lozano (2019). A graph laplacian prior for bayesian variable selection and grouping. Comput. Stat. Data An. 136, 72-91.
Chen, M. and E. Vigneau (2016). Supervised clustering of variables. Adv. Data Anal. Classif. 10, 85-101.
Chipman, H. A. and H. GU (2005). Interpretable dimension reduction. J. Appl. Stat. 32, 969-987.
Chun, H. and S. Keles (2010). Sparse partial least squares for simultaneous dimension reduction and variable selection. J. Roy. Stat. Soc. B 72, 3-25.
Cox, T. F. and D. S. Arnold (2018). Simple components. J. Appl. Stat. 45, 83-99.
Curtis, S. M. and S. K. Ghosh (2011). A bayesian approach to multicollinearity and the simultaneous selection and clustering of predictors in linear regression. J. Stat. Theory Pract. 5, 715-735.
Efron, B. and T. Hastie (2016). Computer age statistical inference: algorithms, evidence and data science. New York: Cambridge University Press.
Enki, D. G., N. T. Trendalov, and I. T. Jolliffe (2013). A clustering approach to interpretable principal components. J. Appl. Stat. 40, 583-599.
Figueiredo, A. and P. Gomes (2015). Clustering of variables based on watson distribution on hypersphere: A comparison of algorithms. Comm. Stat. - Simul Comput. 44, 2622-2635.
Friedman, J., T. Hastie, and R. Tibshirani (2010). A note on the group lasso and a sparse group lasso. Technical report, Statistics Department, Stanford University.
Hastie, T., R. Tibshirani, D. Botstein, and P. Brown (2001). Supervised harvesting of expression trees. Genom. Biol. 2, 1-12.
Hastie, T., R. Tibshirani, and J. Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Second ed.). Springer Series in Statistics. New York: Springer.
Hofner, B., A. Mayr, N. Robinzonov, and M. Schmid (2014). Model-based boosting in r: A hands-on tutorial using the r package mboost. Comput. Stat. 29, 3-35.
Jolliffe, I., N. Trendalov, and M. Uddin (2003). A modied principal component technique based on the lasso. J. Comput. Graph. Stat. 12, 531-547.
Karlis, D., G. Saporta, and A. Spinakis (2003). A simple rule for the selection of principal components. Comm. Stat. - Theor. M. 32, 643-666.
Park, M. Y., T. Hastie, and R. Tibshirani (2007). Averaged gene expressions for regression. Biostat. 8, 212-227.
Rinke, P., S. Moitrier, E. Humpfer, S. Keller, M. Moertter, M. Godejohann, G. Hoffmann, H. Schaefer, and M. Spraul (2007). An 1H NMR technique for high troughput screening in quality and authenticity control of fruit juice and fruit juice raw materials- SGF- proling. Fruit Process. 1, 10-18.
SDBSWeb (2020). Spectral Database for organic Compounds. https://sdbs.db.aist.go.jp (National Institute of Advanced Industrial Science and Technology).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58, 267-288.
Tibshirani, R., M. Saunders, S. Rosset, J.Zhu, and K. Knight (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B 67, 91-108.
Vigneau, E. and M. Chen (2016). Dimensionality reduction by clustering of variables while setting aside atypical variables. Electron. J. Appl. Stat. An. 9, 134-153.
Vigneau, E., M. Chen, and E. M. Qannari (2015). ClustVarLV: An R package for the clustering of variables around latent variables. R J. 7, 134-148.
Vigneau, E. and E. Qannari (2003). Clustering of variables around latent components. Comm. Stat. - Simul Comput. 32, 1131-1150.
Vigneau, E. and F. Thomas (2012). Model calibration and feature selection for orange juice authentication by 1H NMR spectroscopy. Chemometr. Intell. Lab. Sys. 117, 22-30.
Yengo, L., J. Jacques, C. Biernack, and M. Canoui (2016). Variable clustering in highdimensional linear regression: the r package clere. The R journal 8, 92-106.
Yuan, M. and Y. Lin (2007). Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc. Ser. B 68, 49-67.
Zeng, B., X. M. Wen, and L.Zhu (2017). A link-free sparse group variable selection method for single-index model. J. Appl. Stat. 44, 2388-2400.
Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67, 301-320.
DOI: https://doi.org/10.31449/inf.v45i4.3283
This work is licensed under a Creative Commons Attribution 3.0 License.