Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization

The central challenge in Automatic Text Summarization (ATS) is efficiently generating machine-generated text summaries through optimization algorithms, a critical component for systems dealing with textual information processing. The current approach encounters a significant hurdle due to the lo...

Description complète

Détails bibliographiques
Auteur principal:	Hazmi Wahab, Muhammad Hafizul
Format:	Thèse
Langue:	anglais
Publié:	2024
Sujets:	Optimization algorithms (Computer science) Parallel processing (Computer science) Natural language processing (Computer science)
Accès en ligne:	http://psasir.upm.edu.my/id/eprint/120029/1/120029.pdf

_version_	1846217895674118144
author	Hazmi Wahab, Muhammad Hafizul
author_facet	Hazmi Wahab, Muhammad Hafizul
author_sort	Hazmi Wahab, Muhammad Hafizul
description	The central challenge in Automatic Text Summarization (ATS) is efficiently generating machine-generated text summaries through optimization algorithms, a critical component for systems dealing with textual information processing. The current approach encounters a significant hurdle due to the long execution time, especially when employing complex optimization techniques alongside a computationally expensive ATS repair operator that repairs multiple candidate solutions. While the current approach yields impressive Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics for the generated summary, it struggles with inefficiencies, mainly attributed to the substantial optimization time consumed by the ATS repair operator scheme. In order to address this, a novel solution called Decomposition-based Multi-objective Differential Evolution (MODE/D) is proposed. It is built upon the foundation of Differential Evolution for Multi-objective optimization (DEMO) and the weighted sum method (WS), coupled with an innovative ATS repair operator scheme. Through experimentation on Document Understanding Conferences (DUC) datasets, the novel approach of MODE/D is validated by evaluating the results using ROUGE metrics. The outcomes are twofold: a remarkable reduction in serial execution time and a noteworthy enhancement over existing techniques in the scholarly domain, as evidenced by improved ROUGE-1, ROUGE-2, and ROUGE-L scores. The multi-core variant of MODE/D explored an alternative computational environment, which not only demonstrates stability but also achieves remarkable efficiency when static loop scheduling is employed. Notably, in a multi-core environment, parallel multi-core MODE/D attains a commendable speedup of 2 times faster than the serial version of MODE/D, with the highest efficiency peaking at an impressive 86.35% when employing 6 CPU cores. Additionally, when the input size is tripled, the parallel multi-core MODE/D achieves a 7.9 speedup with 98.98% efficiency under static scheduling. The commendable speedup achieved comes with a slight degradation in terms of ROUGE-2 metrics. However, this efficiency milestone underscores the robustness and scalability of the proposed approach, showcasing its ability to harness the computational power of multiple cores while maintaining stability in summary quality metrics, yielding 31 words per second (WPS), a 233.13% increase compared to its serial counterpart for the topic of d061j in DUC2002. Furthermore, two GPU variants of GMODE/D, namely variant I and variant II, are implemented, with both incorporating unified and non-unified memory architectures. Variant I performs sentence scoring at the outset of the accelerator region, while variant II conducts sentence scoring within the accelerator region. GMODE/D variant I with unified memory achieves a significant speedup of 18.17 compared to the serial variant when a 256 vector size is used with NVIDIA Tesla V100 as an accelerator device, resulting in a substantial increase in WPS, amounting to 215.517. Despite suffering a slight reduction in ROUGE scores, it exhibits the most stable CV values among the serial, multi-core, and many core variants. These advancements collectively propel optimization-based ATS approaches closer to real-time applications where thousands of documents could be involved, demonstrating the versatility and efficiency of the proposed MODE/D algorithm across diverse computing architectures, including multicore and many core environments.
format	Thesis
id	oai:psasir.upm.edu.my:120029
institution	Universiti Putra Malaysia
language	English
publishDate	2024
record_format	eprints
spelling	oai:psasir.upm.edu.my:1200292025-10-09T08:27:29Z http://psasir.upm.edu.my/id/eprint/120029/ Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization Hazmi Wahab, Muhammad Hafizul The central challenge in Automatic Text Summarization (ATS) is efficiently generating machine-generated text summaries through optimization algorithms, a critical component for systems dealing with textual information processing. The current approach encounters a significant hurdle due to the long execution time, especially when employing complex optimization techniques alongside a computationally expensive ATS repair operator that repairs multiple candidate solutions. While the current approach yields impressive Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics for the generated summary, it struggles with inefficiencies, mainly attributed to the substantial optimization time consumed by the ATS repair operator scheme. In order to address this, a novel solution called Decomposition-based Multi-objective Differential Evolution (MODE/D) is proposed. It is built upon the foundation of Differential Evolution for Multi-objective optimization (DEMO) and the weighted sum method (WS), coupled with an innovative ATS repair operator scheme. Through experimentation on Document Understanding Conferences (DUC) datasets, the novel approach of MODE/D is validated by evaluating the results using ROUGE metrics. The outcomes are twofold: a remarkable reduction in serial execution time and a noteworthy enhancement over existing techniques in the scholarly domain, as evidenced by improved ROUGE-1, ROUGE-2, and ROUGE-L scores. The multi-core variant of MODE/D explored an alternative computational environment, which not only demonstrates stability but also achieves remarkable efficiency when static loop scheduling is employed. Notably, in a multi-core environment, parallel multi-core MODE/D attains a commendable speedup of 2 times faster than the serial version of MODE/D, with the highest efficiency peaking at an impressive 86.35% when employing 6 CPU cores. Additionally, when the input size is tripled, the parallel multi-core MODE/D achieves a 7.9 speedup with 98.98% efficiency under static scheduling. The commendable speedup achieved comes with a slight degradation in terms of ROUGE-2 metrics. However, this efficiency milestone underscores the robustness and scalability of the proposed approach, showcasing its ability to harness the computational power of multiple cores while maintaining stability in summary quality metrics, yielding 31 words per second (WPS), a 233.13% increase compared to its serial counterpart for the topic of d061j in DUC2002. Furthermore, two GPU variants of GMODE/D, namely variant I and variant II, are implemented, with both incorporating unified and non-unified memory architectures. Variant I performs sentence scoring at the outset of the accelerator region, while variant II conducts sentence scoring within the accelerator region. GMODE/D variant I with unified memory achieves a significant speedup of 18.17 compared to the serial variant when a 256 vector size is used with NVIDIA Tesla V100 as an accelerator device, resulting in a substantial increase in WPS, amounting to 215.517. Despite suffering a slight reduction in ROUGE scores, it exhibits the most stable CV values among the serial, multi-core, and many core variants. These advancements collectively propel optimization-based ATS approaches closer to real-time applications where thousands of documents could be involved, demonstrating the versatility and efficiency of the proposed MODE/D algorithm across diverse computing architectures, including multicore and many core environments. 2024-09 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/120029/1/120029.pdf Hazmi Wahab, Muhammad Hafizul (2024) Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization. Doctoral thesis, Universiti Putra Malaysia. http://ethesis.upm.edu.my/id/eprint/18497 Optimization algorithms (Computer science) Parallel processing (Computer science) Natural language processing (Computer science)
spellingShingle	Optimization algorithms (Computer science) Parallel processing (Computer science) Natural language processing (Computer science) Hazmi Wahab, Muhammad Hafizul Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization
title	Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization
title_full	Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization
title_fullStr	Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization
title_full_unstemmed	Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization
title_short	Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization
title_sort	optimization of extractive automatic text summarization using decomposition based multi objective differential evolution and parallelization
topic	Optimization algorithms (Computer science) Parallel processing (Computer science) Natural language processing (Computer science)
url	http://psasir.upm.edu.my/id/eprint/120029/1/120029.pdf
url-record	http://psasir.upm.edu.my/id/eprint/120029/ http://ethesis.upm.edu.my/id/eprint/18497
work_keys_str_mv	AT hazmiwahabmuhammadhafizul optimizationofextractiveautomatictextsummarizationusingdecompositionbasedmultiobjectivedifferentialevolutionandparallelization

Optimization of extractive Automatic Text Summarization using Decomposition-based Multi-objective Differential Evolution and parallelization

Documents similaires