TY - GEN
T1 - Visual analysis of conflicting opinions
AU - Chen, Chaomei
AU - Ibekwe-SanJuan, Fidelia
AU - SanJuan, Eric
AU - Weaver, Chris
PY - 2006
Y1 - 2006
N2 - Understanding the nature and dynamics of conflicting opinions is a profound and challenging issue. In this paper we address several aspects of the issue through a study of more than 3,000 Amazon customer reviews of the controversial bestseller The Da Vinci Code, including 1,738 positive and 918 negative reviews. The study is motivated by critical questions such as: What are the differences between positive and negative reviews? What is the origin of a particular opinion? How do these opinions change over time? To what extent can differentiating features be identified from unstructured text? How accurately can these features predict the category of a review? We first analyze terminology variations in these reviews in terms of syntactic, semantic, and statistic associations identified by TermWatch and use term variation patterns to depict underlying topics. We then select the most predictive terms based on log likelihood tests and demonstrate that this small set of terms classifies over 70% of the conflicting reviews correctly. This feature selection process reduces the dimensionality of the feature space from more than 20,000 dimensions to a couple of hundreds. We utilize automatically generated decision trees to facilitate the understanding of conflicting opinions in terms of these highly predictive terms. This study also uses a number of visualization and modeling tools to identify not only what positive and negative reviews have in common, but also they differ and evolve over time.
AB - Understanding the nature and dynamics of conflicting opinions is a profound and challenging issue. In this paper we address several aspects of the issue through a study of more than 3,000 Amazon customer reviews of the controversial bestseller The Da Vinci Code, including 1,738 positive and 918 negative reviews. The study is motivated by critical questions such as: What are the differences between positive and negative reviews? What is the origin of a particular opinion? How do these opinions change over time? To what extent can differentiating features be identified from unstructured text? How accurately can these features predict the category of a review? We first analyze terminology variations in these reviews in terms of syntactic, semantic, and statistic associations identified by TermWatch and use term variation patterns to depict underlying topics. We then select the most predictive terms based on log likelihood tests and demonstrate that this small set of terms classifies over 70% of the conflicting reviews correctly. This feature selection process reduces the dimensionality of the feature space from more than 20,000 dimensions to a couple of hundreds. We utilize automatically generated decision trees to facilitate the understanding of conflicting opinions in terms of these highly predictive terms. This study also uses a number of visualization and modeling tools to identify not only what positive and negative reviews have in common, but also they differ and evolve over time.
UR - http://www.scopus.com/inward/record.url?scp=36349011694&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=36349011694&partnerID=8YFLogxK
U2 - 10.1109/VAST.2006.261431
DO - 10.1109/VAST.2006.261431
M3 - Conference contribution
AN - SCOPUS:36349011694
SN - 1424405912
SN - 9781424405916
T3 - IEEE Symposium on Visual Analytics Science and Technology 2006, VAST 2006 - Proceedings
SP - 59
EP - 66
BT - IEEE Symposium on Visual Analytics Science and Technology 2006, VAST 2006 - Proceedings
T2 - IEEE Symposium on Visual Analytics Science and Technology 2006, VAST 2006
Y2 - 31 October 2006 through 2 November 2006
ER -