Blog classification using K-means

Jun Lee Ki, Lee Myungjin, Kim Woqju

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.

Original languageEnglish
Title of host publicationICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings
Pages61-67
Number of pages7
Publication statusPublished - 2009
EventICEIS 2009 - 11th International Conference on Enterprise Information Systems - Milan, Italy
Duration: 2009 May 62009 May 10

Publication series

NameICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings
VolumeSAIC

Other

OtherICEIS 2009 - 11th International Conference on Enterprise Information Systems
Country/TerritoryItaly
CityMilan
Period09/5/609/5/10

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Blog classification using K-means'. Together they form a unique fingerprint.

Cite this