Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu

Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), Dis...

وصف كامل

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Zulkalnain, Mohd Asyraf
التنسيق:	أطروحة
اللغة:	الإنجليزية الإنجليزية
منشور في:	2025
الموضوعات:	Q Science QA Mathematics
الوصول للمادة أونلاين:	http://eprints.utem.edu.my/id/eprint/29320/
Abstract	Abstract here

_version_	1855619841578237952
author	Zulkalnain, Mohd Asyraf
author_facet	Zulkalnain, Mohd Asyraf
author_sort	Zulkalnain, Mohd Asyraf
description	Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment.
format	Thesis
id	utem-29320
institution	Universiti Teknikal Malaysia Melaka
language	English English
publishDate	2025
record_format	EPrints
record_pdf	Restricted
spelling	utem-293202025-12-26T07:59:02Z http://eprints.utem.edu.my/id/eprint/29320/ Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu Zulkalnain, Mohd Asyraf Q Science QA Mathematics Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment. 2025 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/29320/1/Transformer-based%20sentiment%20analysis%20classification%20in%20natural%20language%20processing%20for%20Bahasa%20Melayu%20%2824%20pages%29.pdf text en http://eprints.utem.edu.my/id/eprint/29320/2/Transformer-based%20sentiment%20analysis%20classification%20in%20natural%20language%20processing%20for%20Bahasa%20Melayu.pdf Zulkalnain, Mohd Asyraf (2025) Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu. Masters thesis, Universiti Teknikal Malaysia Melaka.
spellingShingle	Q Science QA Mathematics Zulkalnain, Mohd Asyraf Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
thesis_level	Master
title	Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_full	Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_fullStr	Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_full_unstemmed	Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_short	Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_sort	transformer based sentiment analysis classification in natural language processing for bahasa melayu
topic	Q Science QA Mathematics
url	http://eprints.utem.edu.my/id/eprint/29320/
work_keys_str_mv	AT zulkalnainmohdasyraf transformerbasedsentimentanalysisclassificationinnaturallanguageprocessingforbahasamelayu

Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu

مواد مشابهة