Memory efficient subspace clustering for online data streams

Nam Hun Park, Won Suk Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.

Original languageEnglish
Title of host publicationProceedings of IDEAS'08
Subtitle of host publicationInternational Database Engineering and Applications Symposium
Pages199-208
Number of pages10
DOIs
Publication statusPublished - 2008
EventInternational Database Engineering and Applications Symposium, IDEAS'08 - Coimbra, Portugal
Duration: 2008 Sept 102008 Sept 12

Publication series

NameACM International Conference Proceeding Series
Volume299

Other

OtherInternational Database Engineering and Applications Symposium, IDEAS'08
Country/TerritoryPortugal
CityCoimbra
Period08/9/1008/9/12

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Memory efficient subspace clustering for online data streams'. Together they form a unique fingerprint.

Cite this