Comparison Jaccard similarity, Cosine Similarity and Combined Both of the Data Clustering With Shared Nearest Neighbor Method
Abstract
Text Mining is the excavations carried out by the computer to get something new that comes from information extracted automatically from data sources of different text. Clustering technique itself is a grouping technique that is widely used in data mining. The aim of this study was to find the most optimum value similarity. Jaccard similarity method used similarity, cosine similarity and a combination of Jaccard similarity and cosine similarity. By combining the two similarity is expected to increase the value of the similarity of the two titles. While the document is used only in the form of a title document of practical work in the Department of Informatics Engineering University of Ahmad Dahlan. All these articles have been through the process of preprocessing beforehand. And the method used is the method of document clustering with Shared Nearest Neighbor (SNN). Results from this study is the cosine similarity method gives the best value of proximity or similarity compared to Jaccard similarity and a combination of both