The observed preprocessing strategies for doing automatic text summarizing

Muhammad Farhan Juna; Mardhiya Hayaty

doi:10.11591/csit.v4i2.pp119-126

Authors

Muhammad Farhan Juna Universitas Amikom Yogyakarta https://orcid.org/0000-0002-6251-9989
Mardhiya Hayaty Universitas Amikom Yogyakarta https://orcid.org/0000-0002-6251-9989

DOI:

https://doi.org/10.11591/csit.v4i2.pp119-126

Abstract

It is challenging for humans to keep up with the rapid creation of digital information due to the explosion of digital information. A written document can be analyzed to extract meaningful information using automatic text summarization. This research proposes 16 different experimental settings in which the model developed by IndoBERT will be applied in order to answer the question of how much of an impact preprocessing has on the quality of summaries produced by automatic text summarization. In order to answer this question, the researchers have devised this study. In this study, we will explicitly talk about preprocessing strategies by conducting tests with different combinations of preprocessing techniques. These techniques include data cleansing, stopwords, stemming, and case folding. After that, the recall-oriented understudy for gisting evaluation (ROUGE) assessment will be used to conduct the measurement of the research results. According to the findings of this research, the optimal level of performance may be accomplished by combining the processes of data cleaning and case folding with scores of 0.78, 0.60, and 0.68 for ROUGE-1, ROUGE-2, and ROUGE-L respectively

The observed preprocessing strategies for doing automatic text summarizing

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Submissions

Article Template

Make a Submission

Menu Bar

People

Quick Links

Information

About The Journal

Journal Policies

Author

Information