Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu
Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), Dis...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English English |
| Published: |
2025
|
| Subjects: | |
| Online Access: | http://eprints.utem.edu.my/id/eprint/29320/ |
| Abstract | Abstract here |
| _version_ | 1855619841578237952 |
|---|---|
| author | Zulkalnain, Mohd Asyraf |
| author_facet | Zulkalnain, Mohd Asyraf |
| author_sort | Zulkalnain, Mohd Asyraf |
| description | Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment. |
| format | Thesis |
| id | utem-29320 |
| institution | Universiti Teknikal Malaysia Melaka |
| language | English English |
| publishDate | 2025 |
| record_format | EPrints |
| record_pdf | Restricted |
| spelling | utem-293202025-12-26T07:59:02Z http://eprints.utem.edu.my/id/eprint/29320/ Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu Zulkalnain, Mohd Asyraf Q Science QA Mathematics Sentiment analysis in Bahasa Melayu leverages Natural Language Processing (NLP) to interpret opinions and emotional tone expressed in Malay texts. This research investigates the application of transformer-based deep learning models, Bidirectional Encoder Representations from Transformers (BERT), DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment classification into positive, negative, and neutral categories. The study addresses challenges in Bahasa Melayu sentiment analysis, including limited annotated resources, linguistic nuances, and common mixed-language usage on platforms like social media.To train and evaluate the models, a large-scale Malay dataset (Malaya dataset) was used. Pretrained models from HuggingFace were fine-tuned using 10-fold cross-validation to improve generalization. Optimization methods such as data augmentation were also implemented. The evaluation considered not just accuracy but also precision, recall, F1 score, and computational efficiency. Among the models, BERT-CNN achieved the best performance, with 96.30% accuracy and consistently high scores across all sentiment classes. BERT also performed well, especially for neutral sentiment, reaching 89.5% accuracy but showed slightly lower recall in the positive class. DistilBERT offered competitive performance (88.96% accuracy) while being faster and more lightweight, making it suitable for deployment in resource-limited environments. BERT-multilingual showed balanced results with a peak accuracy of 89.84%, and ALBERT, despite having fewer parameters, reached 88.76% accuracy but underperformed in positive sentiment recall. The results demonstrate that transformer-based models outperform traditional machine learning and lexicon-based approaches, particularly in handling informal, mixed-language Malay text. The proposed models can support real-world applications such as analyzing consumer sentiment, public opinion, or social response to policies. This study contributes to advancing sentiment analysis for low-resource languages by offering comparative insights and effective model configurations, setting a solid foundation for further research and practical deployment. 2025 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/29320/1/Transformer-based%20sentiment%20analysis%20classification%20in%20natural%20language%20processing%20for%20Bahasa%20Melayu%20%2824%20pages%29.pdf text en http://eprints.utem.edu.my/id/eprint/29320/2/Transformer-based%20sentiment%20analysis%20classification%20in%20natural%20language%20processing%20for%20Bahasa%20Melayu.pdf Zulkalnain, Mohd Asyraf (2025) Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu. Masters thesis, Universiti Teknikal Malaysia Melaka. |
| spellingShingle | Q Science QA Mathematics Zulkalnain, Mohd Asyraf Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu |
| thesis_level | Master |
| title | Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu |
| title_full | Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu |
| title_fullStr | Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu |
| title_full_unstemmed | Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu |
| title_short | Transformer-based sentiment analysis classification in natural language processing for Bahasa Melayu |
| title_sort | transformer based sentiment analysis classification in natural language processing for bahasa melayu |
| topic | Q Science QA Mathematics |
| url | http://eprints.utem.edu.my/id/eprint/29320/ |
| work_keys_str_mv | AT zulkalnainmohdasyraf transformerbasedsentimentanalysisclassificationinnaturallanguageprocessingforbahasamelayu |
