Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining

Readability is a great challenge necessary to solve in text summarization research. Referring to the previous research studies, one key concern is minimizing the gap between the summary result and reader understanding. It is important to keep the meaning of the text to reach a readable summary resul...

Full description

Bibliographic Details
Main Author: Dian Sa’adillah Maylawati
Format: Thesis
Language:English
English
Published: 2023
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/27713/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=123591
Abstract Abstract here
_version_ 1855750116108926976
author Dian Sa’adillah Maylawati
author_facet Dian Sa’adillah Maylawati
author_sort Dian Sa’adillah Maylawati
description Readability is a great challenge necessary to solve in text summarization research. Referring to the previous research studies, one key concern is minimizing the gap between the summary result and reader understanding. It is important to keep the meaning of the text to reach a readable summary result. However, every language has its grammar and structure characteristics. This also happens to the Indonesia language, in which a specific treatment is needed to find the meaning of the text. The present study hypothesizes that readability can be achieved with text representation that maintains the meaning of text documents well. Therefore, the present study aims: (1) to improve Indonesian text summary by enhancing the Sequence of Word (SoW) as text representation using Sequential Pattern Mining (SPM) with PrefixSpan algorithm since the effectiveness of SPM in Indonesian is proven useful for text classification and clustering; (2) to combine SPM and Deep Learning (DeepSPM) in text summarization with Indonesian text, as a result of its superior accuracy when trained with large amounts of data; and (3) to evaluate the readability of Indonesian text summary with several evaluation scenarios. Most text summarization research mainly uses co-selection based analysis to evaluate the summary result. This seems to be less sufficient to evaluate readability. Therefore, this study includes content-based analysis and human readability evaluation to evaluate the readability of summary result. First, this study combines SPM with Sentence Scoring method as feature-based approach and Bellman-Ford algorithm as graph-based to validate the performance of SPM. Second, the proposed SPM approach is combined with Deep Belief Network (DBN), called DeepSPM, based on the unsupervised Deep Learning method. Then, the performance of the proposed methods in producing Indonesian text summary result is evaluated by Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as co-selection-based analysis; Dwiyanto Djoko Pranowo metrics, Gunning Fog Index (GFI) and Flesch-Kincaid Grade Level (FKGL) as content-based analysis; and human readability evaluation. The experimental findings from this study, using IndoSum dataset, show that SPM can enhance the quality of summary results. DeepSPM achieves better results than DBN with f-measure scores of 46.21% for ROUGE-1, 36.94% for ROUGE-2, and 41.01% for ROUGE-L. Furthermore, the readability evaluation using Dwiyanto’s metrics, GFI, and FKGL also shows that the summary results of DeepSPM are readable at a moderate level and are consistent with the human evaluation results conducted by two Indonesian language experts.
format Thesis
id utem-27713
institution Universiti Teknikal Malaysia Melaka
language English
English
publishDate 2023
record_format EPrints
record_pdf Restricted
spelling utem-277132024-11-04T11:48:54Z http://eprints.utem.edu.my/id/eprint/27713/ Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining Dian Sa’adillah Maylawati Q Science (General) QA Mathematics Readability is a great challenge necessary to solve in text summarization research. Referring to the previous research studies, one key concern is minimizing the gap between the summary result and reader understanding. It is important to keep the meaning of the text to reach a readable summary result. However, every language has its grammar and structure characteristics. This also happens to the Indonesia language, in which a specific treatment is needed to find the meaning of the text. The present study hypothesizes that readability can be achieved with text representation that maintains the meaning of text documents well. Therefore, the present study aims: (1) to improve Indonesian text summary by enhancing the Sequence of Word (SoW) as text representation using Sequential Pattern Mining (SPM) with PrefixSpan algorithm since the effectiveness of SPM in Indonesian is proven useful for text classification and clustering; (2) to combine SPM and Deep Learning (DeepSPM) in text summarization with Indonesian text, as a result of its superior accuracy when trained with large amounts of data; and (3) to evaluate the readability of Indonesian text summary with several evaluation scenarios. Most text summarization research mainly uses co-selection based analysis to evaluate the summary result. This seems to be less sufficient to evaluate readability. Therefore, this study includes content-based analysis and human readability evaluation to evaluate the readability of summary result. First, this study combines SPM with Sentence Scoring method as feature-based approach and Bellman-Ford algorithm as graph-based to validate the performance of SPM. Second, the proposed SPM approach is combined with Deep Belief Network (DBN), called DeepSPM, based on the unsupervised Deep Learning method. Then, the performance of the proposed methods in producing Indonesian text summary result is evaluated by Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as co-selection-based analysis; Dwiyanto Djoko Pranowo metrics, Gunning Fog Index (GFI) and Flesch-Kincaid Grade Level (FKGL) as content-based analysis; and human readability evaluation. The experimental findings from this study, using IndoSum dataset, show that SPM can enhance the quality of summary results. DeepSPM achieves better results than DBN with f-measure scores of 46.21% for ROUGE-1, 36.94% for ROUGE-2, and 41.01% for ROUGE-L. Furthermore, the readability evaluation using Dwiyanto’s metrics, GFI, and FKGL also shows that the summary results of DeepSPM are readable at a moderate level and are consistent with the human evaluation results conducted by two Indonesian language experts. 2023 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/27713/1/Enhancement%20of%20text%20representation%20for%20Indonesian%20document%20summarization%20with%20deep%20sequential%20pattern%20mining.pdf text en http://eprints.utem.edu.my/id/eprint/27713/2/Enhancement%20of%20text%20representation%20for%20Indonesian%20document%20summarization%20with%20deep%20sequential%20pattern%20mining.pdf Dian Sa’adillah Maylawati (2023) Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining. Doctoral thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=123591
spellingShingle Q Science (General)
QA Mathematics
Dian Sa’adillah Maylawati
Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining
thesis_level PhD
title Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining
title_full Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining
title_fullStr Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining
title_full_unstemmed Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining
title_short Enhancement of text representation for Indonesian document summarization with deep sequential pattern mining
title_sort enhancement of text representation for indonesian document summarization with deep sequential pattern mining
topic Q Science (General)
QA Mathematics
url http://eprints.utem.edu.my/id/eprint/27713/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=123591
work_keys_str_mv AT diansaadillahmaylawati enhancementoftextrepresentationforindonesiandocumentsummarizationwithdeepsequentialpatternmining