A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani

Feature location is one of the frequent software maintenance activities that aims to identify a source code location pertinent to a software feature. Most of the proposed feature location approaches are based, at least in part, on text analysis to determine the similarity of a new feature with the s...

全面介紹

書目詳細資料
主要作者: Sima , Zamani
格式: Thesis
出版: 2016
主題:
_version_ 1849735230588452864
author Sima , Zamani
author_facet Sima , Zamani
author_sort Sima , Zamani
description Feature location is one of the frequent software maintenance activities that aims to identify a source code location pertinent to a software feature. Most of the proposed feature location approaches are based, at least in part, on text analysis to determine the similarity of a new feature with the source code data. However, the text analysis methods used in feature location originate from the natural language context. Unlike the typical context in which these methods are applied, text documents in software repositories, such as source code files, have a corresponding set of metadata including such items as time- stamps, developer identifiers, and commit comments. Furthermore, the history of changes of the source code is recorded in the repositories that leads to a larger dataset size. Due to these differences between the contexts in software repositories and natural language, the text analysis does not utilize its possible potential for accurately locating software features. Accordingly, the goal of this thesis is to improve feature location by addressing the specific characteristics of the repositories’ text data, i.e. incorporation of the data with metadata and larger dataset size, within the text analysis process. In this thesis, a new feature location approach is proposed that considers the metadata of time and developer, and uses only the nouns. The proposed approach analyzes and weights the data from the aspect of time when the data was recorded and the aspect of developer who recorded the data in the repository. In this approach, first, a time- and developer-based corpus is created from the nouns extracted from the repository’s data. Then, the nouns are weighted using two term-weighting techniques including a time-aware term-weighting technique and a developers-based time-aware term-weighting technique. Next, the calculated weights for each noun are combined to obtain the total noun’s weight. Finally, the source code files were ranked based on the summation of the total weights of the nouns that appeared in both the given software feature and the source code files. The empirical evaluation of the proposed approach on a set of open-source projects indicates remarkable improvements over the feature location baseline approaches that utilize VSM (Vector Space Model) and SUM (Smoothed Unigram Model). The proposed approach outperforms the accuracy, effectiveness and performance of the feature location baseline approaches as much as 62%, 43% and 30%, respectively. In this approach, the time-based analysis and weighting of the data make an improvement over the baseline approaches up to 38%, 35% and 19%, respectively; whereas the developer-based analysis and weighting of the data make an improvement up to 55%, 39% and 29%, respectively. Furthermore, the use of nouns-only, instead of using all types of terms, improves the accuracy, effectiveness and performance as much as 26%, 49% and 23%, respectively and reduces the dataset size up to 60%. The statistical analysis of the experimental results demonstrates the significance of the improvement in all aspects. In general, consideration of time-metadata and developer-metadata in analyzing and weighting the data, along with the use of only the nouns, makes significant improvements to feature location.
format Thesis
id oai:studentsrepo.um.edu.my:10764
institution Universiti Malaya
publishDate 2016
record_format eprints
spelling oai:studentsrepo.um.edu.my:107642020-01-18T02:15:43Z A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani Sima , Zamani QA75 Electronic computers. Computer science QA76 Computer software Feature location is one of the frequent software maintenance activities that aims to identify a source code location pertinent to a software feature. Most of the proposed feature location approaches are based, at least in part, on text analysis to determine the similarity of a new feature with the source code data. However, the text analysis methods used in feature location originate from the natural language context. Unlike the typical context in which these methods are applied, text documents in software repositories, such as source code files, have a corresponding set of metadata including such items as time- stamps, developer identifiers, and commit comments. Furthermore, the history of changes of the source code is recorded in the repositories that leads to a larger dataset size. Due to these differences between the contexts in software repositories and natural language, the text analysis does not utilize its possible potential for accurately locating software features. Accordingly, the goal of this thesis is to improve feature location by addressing the specific characteristics of the repositories’ text data, i.e. incorporation of the data with metadata and larger dataset size, within the text analysis process. In this thesis, a new feature location approach is proposed that considers the metadata of time and developer, and uses only the nouns. The proposed approach analyzes and weights the data from the aspect of time when the data was recorded and the aspect of developer who recorded the data in the repository. In this approach, first, a time- and developer-based corpus is created from the nouns extracted from the repository’s data. Then, the nouns are weighted using two term-weighting techniques including a time-aware term-weighting technique and a developers-based time-aware term-weighting technique. Next, the calculated weights for each noun are combined to obtain the total noun’s weight. Finally, the source code files were ranked based on the summation of the total weights of the nouns that appeared in both the given software feature and the source code files. The empirical evaluation of the proposed approach on a set of open-source projects indicates remarkable improvements over the feature location baseline approaches that utilize VSM (Vector Space Model) and SUM (Smoothed Unigram Model). The proposed approach outperforms the accuracy, effectiveness and performance of the feature location baseline approaches as much as 62%, 43% and 30%, respectively. In this approach, the time-based analysis and weighting of the data make an improvement over the baseline approaches up to 38%, 35% and 19%, respectively; whereas the developer-based analysis and weighting of the data make an improvement up to 55%, 39% and 29%, respectively. Furthermore, the use of nouns-only, instead of using all types of terms, improves the accuracy, effectiveness and performance as much as 26%, 49% and 23%, respectively and reduces the dataset size up to 60%. The statistical analysis of the experimental results demonstrates the significance of the improvement in all aspects. In general, consideration of time-metadata and developer-metadata in analyzing and weighting the data, along with the use of only the nouns, makes significant improvements to feature location. 2016-12 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10764/2/Sima.pdf application/pdf http://studentsrepo.um.edu.my/10764/1/Sima_Zamani_%E2%80%93_Thesis.pdf Sima , Zamani (2016) A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/10764/
spellingShingle QA75 Electronic computers. Computer science
QA76 Computer software
Sima , Zamani
A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_full A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_fullStr A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_full_unstemmed A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_short A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_sort noun based feature location approach supported by time aware term weighting technique for facilitating software maintenance sima zamani
topic QA75 Electronic computers. Computer science
QA76 Computer software
url-record http://studentsrepo.um.edu.my/10764/
work_keys_str_mv AT simazamani anounbasedfeaturelocationapproachsupportedbytimeawaretermweightingtechniqueforfacilitatingsoftwaremaintenancesimazamani
AT simazamani nounbasedfeaturelocationapproachsupportedbytimeawaretermweightingtechniqueforfacilitatingsoftwaremaintenancesimazamani