Integrated approach for improving cross-project software defect prediction performance

This research addresses three critical challenges in cross-project defect prediction (CPDP): distribution differences, redundant features, and model overfitting. These issues often degrade prediction accuracy and robustness in various domains. To tackle these challenges, this study proposes a hol...

Full description

Bibliographic Details
Main Author: Bala, Yahaya Zakariyau
Format: Thesis
Language:English
Published: 2024
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/120163/1/120163.pdf
_version_ 1846217896606302208
author Bala, Yahaya Zakariyau
author_facet Bala, Yahaya Zakariyau
author_sort Bala, Yahaya Zakariyau
description This research addresses three critical challenges in cross-project defect prediction (CPDP): distribution differences, redundant features, and model overfitting. These issues often degrade prediction accuracy and robustness in various domains. To tackle these challenges, this study proposes a holistic approach named Transformation, Feature Selection, and Multi-learning (TFSM). This research is divided into three objectives: firstly, to proposed transformation, feature selection and multi-learning techniques that can mitigate distribution differences between datasets, identify and eliminate redundant features and combat model overfitting, respectively. Secondly, to integrate these techniques into a TFSM and implement. Thirdly, to evaluate each technique and the integrated approach. The research methodology involves the formulation, implementation, and evaluation of each technique individually and their integrated approach, TFSM. Experimental evaluations are conducted using open-source software projects sourced from the open source repository, with F1_score serving as the primary evaluation metric. Results from the experiments demonstrate significant improvements in predictive performance. The transformation techniques effectively reduce distribution differences, enhancing the model's ability to generalize across diverse datasets. Feature selection methods successfully mitigate the negative impact of redundant features, streamlining the learning process and improving model interpretability. Additionally, the multi-learning approach proves effective in reducing model overfitting by aggregating diverse model outputs. When integrated into the TFSM approach, these techniques collectively demonstrated a marked improvement in CPDP performance. The TFSM approach leverages the strengths of each individual technique, resulting in a synergistic effect that enhances the model’s predictive accuracy. This approach addresses the multifaceted challenges inherent in CPDP, providing a more reliable and effective solution for defect prediction in software projects. This work contributes to the ongoing efforts in the software engineering community to develop more accurate and reliable defect prediction models, ultimately aiding in the development of higher-quality software. Future work will focus on further refining these techniques and exploring their applicability to a broader range of software projects and repositories.
format Thesis
id oai:psasir.upm.edu.my:120163
institution Universiti Putra Malaysia
language English
publishDate 2024
record_format eprints
spelling oai:psasir.upm.edu.my:1201632025-10-09T08:38:33Z http://psasir.upm.edu.my/id/eprint/120163/ Integrated approach for improving cross-project software defect prediction performance Bala, Yahaya Zakariyau This research addresses three critical challenges in cross-project defect prediction (CPDP): distribution differences, redundant features, and model overfitting. These issues often degrade prediction accuracy and robustness in various domains. To tackle these challenges, this study proposes a holistic approach named Transformation, Feature Selection, and Multi-learning (TFSM). This research is divided into three objectives: firstly, to proposed transformation, feature selection and multi-learning techniques that can mitigate distribution differences between datasets, identify and eliminate redundant features and combat model overfitting, respectively. Secondly, to integrate these techniques into a TFSM and implement. Thirdly, to evaluate each technique and the integrated approach. The research methodology involves the formulation, implementation, and evaluation of each technique individually and their integrated approach, TFSM. Experimental evaluations are conducted using open-source software projects sourced from the open source repository, with F1_score serving as the primary evaluation metric. Results from the experiments demonstrate significant improvements in predictive performance. The transformation techniques effectively reduce distribution differences, enhancing the model's ability to generalize across diverse datasets. Feature selection methods successfully mitigate the negative impact of redundant features, streamlining the learning process and improving model interpretability. Additionally, the multi-learning approach proves effective in reducing model overfitting by aggregating diverse model outputs. When integrated into the TFSM approach, these techniques collectively demonstrated a marked improvement in CPDP performance. The TFSM approach leverages the strengths of each individual technique, resulting in a synergistic effect that enhances the model’s predictive accuracy. This approach addresses the multifaceted challenges inherent in CPDP, providing a more reliable and effective solution for defect prediction in software projects. This work contributes to the ongoing efforts in the software engineering community to develop more accurate and reliable defect prediction models, ultimately aiding in the development of higher-quality software. Future work will focus on further refining these techniques and exploring their applicability to a broader range of software projects and repositories. 2024-04 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/120163/1/120163.pdf Bala, Yahaya Zakariyau (2024) Integrated approach for improving cross-project software defect prediction performance. Doctoral thesis, Universiti Putra Malaysia. http://ethesis.upm.edu.my/id/eprint/18501 Software engineering Computer software -Testing Machine learning
spellingShingle Software engineering
Computer software -Testing
Machine learning
Bala, Yahaya Zakariyau
Integrated approach for improving cross-project software defect prediction performance
title Integrated approach for improving cross-project software defect prediction performance
title_full Integrated approach for improving cross-project software defect prediction performance
title_fullStr Integrated approach for improving cross-project software defect prediction performance
title_full_unstemmed Integrated approach for improving cross-project software defect prediction performance
title_short Integrated approach for improving cross-project software defect prediction performance
title_sort integrated approach for improving cross project software defect prediction performance
topic Software engineering
Computer software -Testing
Machine learning
url http://psasir.upm.edu.my/id/eprint/120163/1/120163.pdf
url-record http://psasir.upm.edu.my/id/eprint/120163/
http://ethesis.upm.edu.my/id/eprint/18501
work_keys_str_mv AT balayahayazakariyau integratedapproachforimprovingcrossprojectsoftwaredefectpredictionperformance