Classification and similarity detection of Indonesian scientific journal articles
Keywords:
Classification, Cosine similarity, GARUDA, Naïve Bayes, SimilarityAbstract
The development of technology is accelerating in finding references to scientific articles or journals related to research topics. One of the sources of national aggregator services to find references is Garba Rujukan Digital (GARUDA), developed by the Ministry of Education, Culture, Research, and Technology (Kemendikbudristek) of the Republic of Indonesia. The naïve Bayes method classifies articles into several categories based on titles and abstracts. The system achieves an F1-score of 98%, which indicates high classification accuracy, and the classification process takes less than 60 minutes. Article similarity detection is done using the cosine similarity method, and a similarity score of 0.071 reflects the degree of similarity between the title and the abstract that has been concatenated, while a score close to 1 indicates a higher similarity. Searching for similar scientific articles based on title and abstract, sort articles based on the results of the highest similarity score are the most similar articles, and generating article categories. The results of the research show that the proposed method significantly improves the classification and search processes in GARUDA, as well as accurate and efficient similarity detection.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Institute of Advanced Engineering and Science

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.