Journal-Archives-Information Sciences & Computing


TITLE :	EXTRACTING CONTENT IN REGIONAL WEB DOCUMENTS WITH TEXT VARIATIONS
AUTHORS :	Kolla Bhanu Prakash M.A.Dorai Rangaswamy
DOI :	http://dx.doi.org/10.18000/ijisac.50144
ABSTRACT :	The growth of the World Wide Web has led to a dramatic increase in accessible information. Today, people use Web for a large variety of activities including travel planning, comparison shopping, entertainment, and research. However, the tools available for collecting, organizing, and sharing Web content have not kept pace with the rapid growth in information. Today people continue to use bookmarks, email, and printers for managing Web content. Use of mobile phones has transformed the culture of communication with even villagers using sophisticated computer-related words like SMS and MBBS. But the major complexity arises when web documents in regional languages are displayed. Understanding the content of the document and later communication through oral or text means becomes difficult and this is the area the current paper addresses and in the process tries a generic concept-based mining model is proposed, for how the knowledge is created in the minds of illiterate user. The paper first presents how letters and words which form the basis of text-based communication can be used for content. The objective of this task is to achieve a concept-based term analysis on sentence and document levels rather than a single-term analysis in the document set only. Key words: Media Mining; Features; Multilingual; Web Communication; Statistical Interpretation; Content Extraction;
	Download Full Paper