Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu

Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), Dis...

Full description

Bibliographic Details
Main Author: Zulkalnain, Mohd Asyraf
Format: Thesis
Language:English
English
Published: 2025
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/29320/
Abstract Abstract here
_version_ 1855619841578237952
author Zulkalnain, Mohd Asyraf
author_facet Zulkalnain, Mohd Asyraf
author_sort Zulkalnain, Mohd Asyraf
description Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment.
format Thesis
id utem-29320
institution Universiti Teknikal Malaysia Melaka
language English
English
publishDate 2025
record_format EPrints
record_pdf Restricted
spelling utem-293202025-12-26T07:59:02Z http://eprints.utem.edu.my/id/eprint/29320/ Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu Zulkalnain, Mohd Asyraf Q Science QA Mathematics Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment. 2025 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/29320/1/Transformer-based%20sentiment%20analysis%20classification%20in%20natural%20language%20processing%20for%20Bahasa%20Melayu%20%2824%20pages%29.pdf text en http://eprints.utem.edu.my/id/eprint/29320/2/Transformer-based%20sentiment%20analysis%20classification%20in%20natural%20language%20processing%20for%20Bahasa%20Melayu.pdf Zulkalnain, Mohd Asyraf (2025) Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu. Masters thesis, Universiti Teknikal Malaysia Melaka.
spellingShingle Q Science
QA Mathematics
Zulkalnain, Mohd Asyraf
Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
thesis_level Master
title Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_full Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_fullStr Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_full_unstemmed Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_short Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
title_sort transformer based sentiment analysis classification in natural language processing for bahasa melayu
topic Q Science
QA Mathematics
url http://eprints.utem.edu.my/id/eprint/29320/
work_keys_str_mv AT zulkalnainmohdasyraf transformerbasedsentimentanalysisclassificationinnaturallanguageprocessingforbahasamelayu