Integrated approach for improving cross-project software defect prediction performance
This research addresses three critical challenges in cross-project defect prediction (CPDP): distribution differences, redundant features, and model overfitting. These issues often degrade prediction accuracy and robustness in various domains. To tackle these challenges, this study proposes a hol...
| Main Author: | |
|---|---|
| Format: | Thesis |
| Language: | English |
| Published: |
2024
|
| Subjects: | |
| Online Access: | http://psasir.upm.edu.my/id/eprint/120163/1/120163.pdf |
| _version_ | 1846217896606302208 |
|---|---|
| author | Bala, Yahaya Zakariyau |
| author_facet | Bala, Yahaya Zakariyau |
| author_sort | Bala, Yahaya Zakariyau |
| description | This research addresses three critical challenges in cross-project defect
prediction (CPDP): distribution differences, redundant features, and model
overfitting. These issues often degrade prediction accuracy and robustness in
various domains. To tackle these challenges, this study proposes a holistic
approach named Transformation, Feature Selection, and Multi-learning
(TFSM). This research is divided into three objectives: firstly, to proposed
transformation, feature selection and multi-learning techniques that can
mitigate distribution differences between datasets, identify and eliminate
redundant features and combat model overfitting, respectively. Secondly, to
integrate these techniques into a TFSM and implement. Thirdly, to evaluate
each technique and the integrated approach. The research methodology
involves the formulation, implementation, and evaluation of each technique
individually and their integrated approach, TFSM. Experimental evaluations
are conducted using open-source software projects sourced from the open
source repository, with F1_score serving as the primary evaluation metric. Results from the experiments demonstrate significant improvements in
predictive performance. The transformation techniques effectively reduce
distribution differences, enhancing the model's ability to generalize across
diverse datasets. Feature selection methods successfully mitigate the negative
impact of redundant features, streamlining the learning process and improving
model interpretability. Additionally, the multi-learning approach proves
effective in reducing model overfitting by aggregating diverse model outputs.
When integrated into the TFSM approach, these techniques collectively
demonstrated a marked improvement in CPDP performance. The TFSM
approach leverages the strengths of each individual technique, resulting in a
synergistic effect that enhances the model’s predictive accuracy. This
approach addresses the multifaceted challenges inherent in CPDP, providing
a more reliable and effective solution for defect prediction in software projects.
This work contributes to the ongoing efforts in the software engineering
community to develop more accurate and reliable defect prediction models,
ultimately aiding in the development of higher-quality software. Future work will
focus on further refining these techniques and exploring their applicability to a
broader range of software projects and repositories. |
| format | Thesis |
| id | oai:psasir.upm.edu.my:120163 |
| institution | Universiti Putra Malaysia |
| language | English |
| publishDate | 2024 |
| record_format | eprints |
| spelling | oai:psasir.upm.edu.my:1201632025-10-09T08:38:33Z http://psasir.upm.edu.my/id/eprint/120163/ Integrated approach for improving cross-project software defect prediction performance Bala, Yahaya Zakariyau This research addresses three critical challenges in cross-project defect prediction (CPDP): distribution differences, redundant features, and model overfitting. These issues often degrade prediction accuracy and robustness in various domains. To tackle these challenges, this study proposes a holistic approach named Transformation, Feature Selection, and Multi-learning (TFSM). This research is divided into three objectives: firstly, to proposed transformation, feature selection and multi-learning techniques that can mitigate distribution differences between datasets, identify and eliminate redundant features and combat model overfitting, respectively. Secondly, to integrate these techniques into a TFSM and implement. Thirdly, to evaluate each technique and the integrated approach. The research methodology involves the formulation, implementation, and evaluation of each technique individually and their integrated approach, TFSM. Experimental evaluations are conducted using open-source software projects sourced from the open source repository, with F1_score serving as the primary evaluation metric. Results from the experiments demonstrate significant improvements in predictive performance. The transformation techniques effectively reduce distribution differences, enhancing the model's ability to generalize across diverse datasets. Feature selection methods successfully mitigate the negative impact of redundant features, streamlining the learning process and improving model interpretability. Additionally, the multi-learning approach proves effective in reducing model overfitting by aggregating diverse model outputs. When integrated into the TFSM approach, these techniques collectively demonstrated a marked improvement in CPDP performance. The TFSM approach leverages the strengths of each individual technique, resulting in a synergistic effect that enhances the model’s predictive accuracy. This approach addresses the multifaceted challenges inherent in CPDP, providing a more reliable and effective solution for defect prediction in software projects. This work contributes to the ongoing efforts in the software engineering community to develop more accurate and reliable defect prediction models, ultimately aiding in the development of higher-quality software. Future work will focus on further refining these techniques and exploring their applicability to a broader range of software projects and repositories. 2024-04 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/120163/1/120163.pdf Bala, Yahaya Zakariyau (2024) Integrated approach for improving cross-project software defect prediction performance. Doctoral thesis, Universiti Putra Malaysia. http://ethesis.upm.edu.my/id/eprint/18501 Software engineering Computer software -Testing Machine learning |
| spellingShingle | Software engineering Computer software -Testing Machine learning Bala, Yahaya Zakariyau Integrated approach for improving cross-project software defect prediction performance |
| title | Integrated approach for improving cross-project software defect prediction performance |
| title_full | Integrated approach for improving cross-project software defect prediction performance |
| title_fullStr | Integrated approach for improving cross-project software defect prediction performance |
| title_full_unstemmed | Integrated approach for improving cross-project software defect prediction performance |
| title_short | Integrated approach for improving cross-project software defect prediction performance |
| title_sort | integrated approach for improving cross project software defect prediction performance |
| topic | Software engineering Computer software -Testing Machine learning |
| url | http://psasir.upm.edu.my/id/eprint/120163/1/120163.pdf |
| url-record | http://psasir.upm.edu.my/id/eprint/120163/ http://ethesis.upm.edu.my/id/eprint/18501 |
| work_keys_str_mv | AT balayahayazakariyau integratedapproachforimprovingcrossprojectsoftwaredefectpredictionperformance |