HOME INDEXING CALL FOR PAPERS JOURNAL POLICY MANUSCRIPT CURRENT ARCHIVES EDITORIAL BOARD
   
TITLE : PROBABILISTIC MATCHSIMILARITY MEASURE FOR DOCUMENT CLUSTERING  
AUTHORS : Selvi K      Suresh R.M            
DOI : http://dx.doi.org/10.18000/ijisac.50156  
ABSTRACT :

Machine Learning captures the intrinsic characteristics of  natural language,  synonymy and polysemy. Investigations indicate that Similarity Measure is fundamental to a variety of tasks such as Clustering. and Classification.Much work has been done by researchers on document clustering with the use of semantic properties. In this paper, we develop a Probabilistic match similarity measure that naturally extends the recently proposed Web-based kernel function which are trained and tested to cluster  the documents effectively. We consider two approaches to learning (similarity metric and preference ordering) and both achieved higher precision scores as compared to all other similarity measures. This method works well for Web tasks such as query/keyword matching and search query suggestion that  rely heavily on the quality of similarity measures between short text segments. We show that the learned measures are efficient at a wide range of scales and achieve better results than existing similarity measures.

Keywords: Text Mining, Similarity Measure, Machine Learning, Natural Language Processing, Document Clustering

 
  Download Full Paper
 
Copyrights ©Sathyabama Institute of Science and Technology (Deemed to be University).
Powered By: Infospace Technologies