TY - GEN
T1 - Approximate trace of grid-based clusters over high dimensional data streams
AU - Park, Nam Hun
AU - Lee, Won Suk
PY - 2007
Y1 - 2007
N2 - Clustering in a large data set of high dimensionality has always been a serious challenge in the field of data mining. A good clustering method should provide flexible scalability to the number of dimensions as well as the size of a data set. We have proposed a grid-based clustering method called a hybrid-partition method for an on-line data stream. However, as the dimensionality of a data stream is increased, the time and space complexity of this method is increased rapidly. In this paper, a sibling list is proposed to find the clusters of a multi-dimensional data space based on the one-dimensional clusters of each dimension. Although the accuracy of identified multi-dimensional clusters may be less accurate, this one-dimensional approach can provide better scalability to the number of dimensions. This is because the one-dimensional approach requires much less memory usage than the multi-dimensional approach does. Therefore, the confined space of main memory can be more effectively utilized by the one-dimensional approach.
AB - Clustering in a large data set of high dimensionality has always been a serious challenge in the field of data mining. A good clustering method should provide flexible scalability to the number of dimensions as well as the size of a data set. We have proposed a grid-based clustering method called a hybrid-partition method for an on-line data stream. However, as the dimensionality of a data stream is increased, the time and space complexity of this method is increased rapidly. In this paper, a sibling list is proposed to find the clusters of a multi-dimensional data space based on the one-dimensional clusters of each dimension. Although the accuracy of identified multi-dimensional clusters may be less accurate, this one-dimensional approach can provide better scalability to the number of dimensions. This is because the one-dimensional approach requires much less memory usage than the multi-dimensional approach does. Therefore, the confined space of main memory can be more effectively utilized by the one-dimensional approach.
UR - http://www.scopus.com/inward/record.url?scp=38049150749&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=38049150749&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-71701-0_82
DO - 10.1007/978-3-540-71701-0_82
M3 - Conference contribution
AN - SCOPUS:38049150749
SN - 9783540717003
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 753
EP - 760
BT - Advances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings
PB - Springer Verlag
T2 - 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007
Y2 - 22 May 2007 through 25 May 2007
ER -