Journal-Archives-Information Sciences & Computing


TITLE :	PROBABILISTIC MATCHSIMILARITY MEASURE FOR DOCUMENT CLUSTERING
AUTHORS :	Selvi K Suresh R.M
DOI :	http://dx.doi.org/10.18000/ijisac.50156
ABSTRACT :	Machine Learning captures the intrinsic characteristics of natural language, synonymy and polysemy. Investigations indicate that Similarity Measure is fundamental to a variety of tasks such as Clustering. and Classification.Much work has been done by researchers on document clustering with the use of semantic properties. In this paper, we develop a Probabilistic match similarity measure that naturally extends the recently proposed Web-based kernel function which are trained and tested to cluster the documents effectively. We consider two approaches to learning (similarity metric and preference ordering) and both achieved higher precision scores as compared to all other similarity measures. This method works well for Web tasks such as query/keyword matching and search query suggestion that rely heavily on the quality of similarity measures between short text segments. We show that the learned measures are efficient at a wide range of scales and achieve better results than existing similarity measures. Keywords: Text Mining, Similarity Measure, Machine Learning, Natural Language Processing, Document Clustering
	Download Full Paper