A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data

Bill, Jo; Fokoue, Ernest

A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data

Bill, Jo; Fokoue, Ernest

Serdica Journal of Computing (2014)

Volume: 8, Issue: 2, page 137-168
ISSN: 1312-6555

Access Full Article

top

Access to full text

Abstract

top

This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.

How to cite

top

MLA
BibTeX
RIS

Bill, Jo, and Fokoue, Ernest. "A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data." Serdica Journal of Computing 8.2 (2014): 137-168. <http://eudml.org/doc/269897>.

@article{Bill2014,
abstract = {This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.},
author = {Bill, Jo, Fokoue, Ernest},
journal = {Serdica Journal of Computing},
keywords = {HDLSS; Machine Learning Algorithm; Pattern Recognition; Classification; Prediction; Regularization; Discriminant Analysis; Support Vector Machine; Kernels; Cross Validation; Microarray Cancer Data},
language = {eng},
number = {2},
pages = {137-168},
publisher = {Institute of Mathematics and Informatics Bulgarian Academy of Sciences},
title = {A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data},
url = {http://eudml.org/doc/269897},
volume = {8},
year = {2014},
}

TY - JOUR
AU - Bill, Jo
AU - Fokoue, Ernest
TI - A Comparative Analysis of Predictive Learning Algorithms on High-Dimensional Microarray Cancer Data
JO - Serdica Journal of Computing
PY - 2014
PB - Institute of Mathematics and Informatics Bulgarian Academy of Sciences
VL - 8
IS - 2
SP - 137
EP - 168
AB - This research evaluates pattern recognition techniques on a subclass of big data where the dimensionality of the input space (p) is much larger than the number of observations (n). Specifically, we evaluate massive gene expression microarray cancer data where the ratio κ is less than one. We explore the statistical and computational challenges inherent in these high dimensional low sample size (HDLSS) problems and present statistical machine learning methods used to tackle and circumvent these difficulties. Regularization and kernel algorithms were explored in this research using seven datasets where κ < 1. These techniques require special attention to tuning necessitating several extensions of cross-validation to be investigated to support better predictive performance. While no single algorithm was universally the best predictor, the regularization technique produced lower test errors in five of the seven datasets studied.
LA - eng
KW - HDLSS; Machine Learning Algorithm; Pattern Recognition; Classification; Prediction; Regularization; Discriminant Analysis; Support Vector Machine; Kernels; Cross Validation; Microarray Cancer Data
UR - http://eudml.org/doc/269897
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Language to use for this widget.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Number of notes per page

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.