An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects

Sergio Arciniegas-Alarcón; Marisol García-Peña; Wojtek Janusz Krzanowski; Carlos Tadeu dos Santos Dias

Biometrical Letters (2014)

  • Volume: 51, Issue: 2, page 75-88
  • ISSN: 1896-3811

Abstract

top
A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.

How to cite

top

Sergio Arciniegas-Alarcón, et al. "An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects." Biometrical Letters 51.2 (2014): 75-88. <http://eudml.org/doc/268748>.

@article{SergioArciniegas2014,
abstract = {A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.},
author = {Sergio Arciniegas-Alarcón, Marisol García-Peña, Wojtek Janusz Krzanowski, Carlos Tadeu dos Santos Dias},
journal = {Biometrical Letters},
keywords = {cross-validation; singular value decomposition; imputation; genotype-by-environment interaction; weights; missing values},
language = {eng},
number = {2},
pages = {75-88},
title = {An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects},
url = {http://eudml.org/doc/268748},
volume = {51},
year = {2014},
}

TY - JOUR
AU - Sergio Arciniegas-Alarcón
AU - Marisol García-Peña
AU - Wojtek Janusz Krzanowski
AU - Carlos Tadeu dos Santos Dias
TI - An alternative methodology for imputing missing data in trials with genotype-by-environment interaction: some new aspects
JO - Biometrical Letters
PY - 2014
VL - 51
IS - 2
SP - 75
EP - 88
AB - A common problem in multi-environment trials arises when some genotypeby- environment combinations are missing. In Arciniegas-Alarcón et al. (2010) we outlined a method of data imputation to estimate the missing values, the computational algorithm for which was a mixture of regression and lower-rank approximation of a matrix based on its singular value decomposition (SVD). In the present paper we provide two extensions to this methodology, by including weights chosen by cross-validation and allowing multiple as well as simple imputation. The three methods are assessed and compared in a simulation study, using a complete set of real data in which values are deleted randomly at different rates. The quality of the imputations is evaluated using three measures: the Procrustes statistic, the squared correlation between matrices and the normalised root mean squared error between these estimates and the true observed values. None of the methods makes any distributional or structural assumptions, and all of them can be used for any pattern or mechanism of the missing values.
LA - eng
KW - cross-validation; singular value decomposition; imputation; genotype-by-environment interaction; weights; missing values
UR - http://eudml.org/doc/268748
ER -

References

top
  1. Arciniegas-Alarcón S., García-Peña M., Dias C.T.S. (2011): Data imputation in trials with genotype×environment interaction. Interciencia 36(6): 444-449. 
  2. Arciniegas-Alarcón S., García-Peña M., Dias C.T.S., Krzanowski W.J. (2010): An alternative methodology for imputing missing data in trials with genotypeby- environment interaction. Biometrical Letters 47(1): 1-14. 
  3. Bergamo G.C., Dias C.T.S., Krzanowski W.J. (2008): Distribution-free multiple imputation in an interaction matrix through singular value decomposition. Scientia Agricola 65(4): 422-427.[WoS] 
  4. Calinski T., Czajka S., Kaczmarek Z., Krajewski P., Pilarczyk W. (2009): Analyzing the Genotype-by-Environment Interactions Under a Randomization- Derived Mixed Model. Journal of Agricultural, Biological and Environmental Statistics 14(2): 224-241.[WoS][Crossref] Zbl1306.62254
  5. Ching W., Li L., Tsing N., Tai C., Ng T. (2010): A weighted local least squares imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4(3): 331-347. 
  6. Denis J.B., Baril C.P. (1992): Sophisticated models with numerous missing values: the multiplicative interaction model as an example. Biuletyn Oceny Odmian 24-25: 33-45. 
  7. Di Ciaccio A. (2011): Bootstrap and nonparametric predictors to impute missing data. In: B. Fichet et al. (eds.), Classification and Multivariate Analysis for Complex Data Structures, Studies in Classification, Data Analysis, and Knowledge Organization. Springer-Verlag Berlin Heidelberg. 
  8. Dias C.T.S., Krzanowski W.J. (2003): Model selection and cross validation in additive main effect and multiplicative interaction models. Crop Science 43: 865-873.[Crossref] 
  9. Gabriel K.R. (2002): Le biplot - outil d’exploration de données multidimensionelles. Journal de la Société Française de Statistique 143(3-4): 5-55. 
  10. García-Peña M., Dias C.T.S. (2009): Analysis of bivariate additive models with multiplicative interaction (AMMI). Biometric Brazilian Journal 27(4): 586-602. 
  11. Gauch H.G. (2013): A simple protocol for AMMI analysis of yield trials. Crop Science 53: 1860-1869.[Crossref][WoS] 
  12. Gauch H.G., Zobel R.W. (1990): Imputing missing yield trial data. Theoretical and Applied Genetics 79: 753-761. 
  13. Josse J., Pagès J., Husson F. (2011): Multiple imputation in PCA. Advances in data analysis and classification 5(3): 231-246. Zbl1274.62409
  14. Josse J., Husson F. (2012): Handling missing values in exploratory multivariate data analysis methods. Journal de la Société Française de Statistique 153(2): 79-99. Zbl1316.62006
  15. Krzanowski W.J. (1988): Missing value imputation in multivariate data using the singular value decomposition of a matrix. Biometrical Letters XXV(1-2): 31-39. 
  16. Krzanowski W.J. (2000): Principles of multivariate analysis: A user’s perspective. Oxford: University Press. Zbl0678.62001
  17. Kroonenberg P.M. (2008): Applied multiway data analysis. John Wiley & Sons. Zbl1160.62002
  18. Kumar A., Verulkar S.B., Mandal N.P., Variar M., Shukla V.D., Dwivedi J.L., Singh B.N., Singh O.N., Swain P., Mall A.K., Robin S., Chandrababu R., Jain A., Haefele S.M., Piepho H.P., Raman A. (2012): High-yielding, droughttolerant, stable rice genotypes for the shallow rainfed lowland droughtprone ecosystem. Field Crops Research 133: 37-47.[WoS] 
  19. Little R., Rubin D. (2002): Statistical analysis with missing data. 2nd ed. John Wiley & Sons, New York, NY. Zbl1011.62004
  20. Paderewski J., Rodrigues P.C. (2014): The usefulness of EM-AMMI to study the influence of missing data pattern and application to Polish post-registration winter wheat data. Australian Journal of Crop Science 8: 640-645. 
  21. Piepho H.P. (1995): Methods for estimating missing genotype-location combinations in multilocation trials - an empirical comparison. Informatik Biometrie und Epidemiologie in Medizin und Biologie 26: 335-349. 
  22. Piepho H.P., Möhring J. (2006): Selection in cultivar trials - Is it ignorable? Crop Science 46: 192-201.[Crossref] 
  23. R Development Core Team (2013): R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. http://www.R-project.org/ 
  24. Rodrigues P., Pereira D.G.S., Mexia J.T. (2011): A comparison between joint regression analysis and the additive main and multiplicative interaction model: the robustness with increasing amounts of missing data. Scientia Agricola 68(6): 679-686.[Crossref] 
  25. Rubin D.B. (1978): Multiple imputation in sample surveys: a phenomenological Bayesian approach to nonresponse. In: Survey Research Methods Section Of The American Statistical Association. Proceedings: 20-34. 
  26. Sabaghnia N., Karimizadeh R., Mohammadi M. (2012): Model selection in additive main effect and multiplicative interaction model in durum wheat. Genetika 44(2): 325-339.[Crossref][WoS] 
  27. Schafer J.L., Graham J.W. (2002): Missing data: our view of the state of the art. Psychological Methods 7(2): 147-177.[Crossref][PubMed] 
  28. van Buuren S. (2012): Flexible imputation of missing data. CRC press. Zbl1256.62005
  29. Wright K. (2012): agridat: Agricultural datasets. R package version 1.4. http://CRAN.R-project.org/package=agridat> 
  30. Yan W., Pageau D., Frégeau-Reid J., Durand J. (2011): Assessing the representativeness and repeatability of test locations for genotype evaluation. Crop Science 51: 1603-1610.[Crossref][WoS] 
  31. Yan W. (2013): Biplot analysis of incomplete two-way data. Crop Science 53(1): 48-57. [WoS][Crossref] 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.