Approximate trace of grid-based clusters over high dimensional data streams

Nam Hun Park, Won Suk Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Clustering in a large data set of high dimensionality has always been a serious challenge in the field of data mining. A good clustering method should provide flexible scalability to the number of dimensions as well as the size of a data set. We have proposed a grid-based clustering method called a hybrid-partition method for an on-line data stream. However, as the dimensionality of a data stream is increased, the time and space complexity of this method is increased rapidly. In this paper, a sibling list is proposed to find the clusters of a multi-dimensional data space based on the one-dimensional clusters of each dimension. Although the accuracy of identified multi-dimensional clusters may be less accurate, this one-dimensional approach can provide better scalability to the number of dimensions. This is because the one-dimensional approach requires much less memory usage than the multi-dimensional approach does. Therefore, the confined space of main memory can be more effectively utilized by the one-dimensional approach.

Original languageEnglish
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, PAKDD 2007, Proceedings
PublisherSpringer Verlag
Number of pages8
ISBN (Print)9783540717003
Publication statusPublished - 2007
Event11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007 - Nanjing, China
Duration: 2007 May 222007 May 25

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4426 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Other11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2007

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Approximate trace of grid-based clusters over high dimensional data streams'. Together they form a unique fingerprint.

Cite this