A Bimodality Test in High Dimensions

Palejev, Dean

Serdica Journal of Computing (2012)

  • Volume: 6, Issue: 4, page 437-450
  • ISSN: 1312-6555

Abstract

top
We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.

How to cite

top

Palejev, Dean. "A Bimodality Test in High Dimensions." Serdica Journal of Computing 6.4 (2012): 437-450. <http://eudml.org/doc/250976>.

@article{Palejev2012,
abstract = {We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.},
author = {Palejev, Dean},
journal = {Serdica Journal of Computing},
keywords = {Clustering; Bimodality; Multidimensional Space; Asymptotic Test; clustering; bimodality; multidimensional space; asymptotic test},
language = {eng},
number = {4},
pages = {437-450},
publisher = {Institute of Mathematics and Informatics Bulgarian Academy of Sciences},
title = {A Bimodality Test in High Dimensions},
url = {http://eudml.org/doc/250976},
volume = {6},
year = {2012},
}

TY - JOUR
AU - Palejev, Dean
TI - A Bimodality Test in High Dimensions
JO - Serdica Journal of Computing
PY - 2012
PB - Institute of Mathematics and Informatics Bulgarian Academy of Sciences
VL - 6
IS - 4
SP - 437
EP - 450
AB - We present a test for identifying clusters in high dimensional data based on the k-means algorithm when the null hypothesis is spherical normal. We show that projection techniques used for evaluating validity of clusters may be misleading for such data. In particular, we demonstrate that increasingly well-separated clusters are identified as the dimensionality increases, when no such clusters exist. Furthermore, in a case of true bimodality, increasing the dimensionality makes identifying the correct clusters more difficult. In addition to the original conservative test, we propose a practical test with the same asymptotic behavior that performs well for a moderate number of points and moderate dimensionality. ACM Computing Classification System (1998): I.5.3.
LA - eng
KW - Clustering; Bimodality; Multidimensional Space; Asymptotic Test; clustering; bimodality; multidimensional space; asymptotic test
UR - http://eudml.org/doc/250976
ER -

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.