TY - GEN
T1 - Blog classification using K-means
AU - Ki, Jun Lee
AU - Myungjin, Lee
AU - Woqju, Kim
PY - 2009
Y1 - 2009
N2 - With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.
AB - With the recent exponential growth of blogs, a vast amount of important data has appeared on blogs. However, dynamic, autonomous, and personal features of such blogs make blog pages be quite different from those on general web pages in many aspects. As a result, this also causes many problems which cannot be handled properly by general search engines. One of the problems which we focused in this study is that blog pages are inherently poorly-organized and very much duplicated. This means the blog search engines cannot but provide the poorly-organized and duplicated results. To solve this problem, we propose a blog classification method using K-means and present a blog search result reorganization approach based on this method. In this study, firstly, we review the current status and their performances of blogs and blog search engines. Secondly, we adopt the K-means algorithm as a base algorithm and devise a blog title classification method to reorganize the blog titles resulted by a search engine. Finally, by implementing a prototype system of our algorithm, we evaluate our algorithm's effectiveness, and present a conclusion and the directions for future work. We expect this algorithm can improve the current blog search engines' usability.
UR - http://www.scopus.com/inward/record.url?scp=74549192024&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=74549192024&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:74549192024
SN - 9789898111845
T3 - ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings
SP - 61
EP - 67
BT - ICEIS 2009 - 11th International Conference on Enterprise Information Systems, Proceedings
T2 - ICEIS 2009 - 11th International Conference on Enterprise Information Systems
Y2 - 6 May 2009 through 10 May 2009
ER -