A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani

Feature location is one of the frequent software maintenance activities that aims to identify a source code location pertinent to a software feature. Most of the proposed feature location approaches are based, at least in part, on text analysis to determine the similarity of a new feature with the s...

全面介紹

書目詳細資料
主要作者:	Sima , Zamani
格式:	Thesis
出版:	2016
主題:	QA75 Electronic computers. Computer science QA76 Computer software

_version_	1849735230588452864
author	Sima , Zamani
author_facet	Sima , Zamani
author_sort	Sima , Zamani
description	Feature location is one of the frequent software maintenance activities that aims to identify a source code location pertinent to a software feature. Most of the proposed feature location approaches are based, at least in part, on text analysis to determine the similarity of a new feature with the source code data. However, the text analysis methods used in feature location originate from the natural language context. Unlike the typical context in which these methods are applied, text documents in software repositories, such as source code files, have a corresponding set of metadata including such items as time- stamps, developer identifiers, and commit comments. Furthermore, the history of changes of the source code is recorded in the repositories that leads to a larger dataset size. Due to these differences between the contexts in software repositories and natural language, the text analysis does not utilize its possible potential for accurately locating software features. Accordingly, the goal of this thesis is to improve feature location by addressing the specific characteristics of the repositories’ text data, i.e. incorporation of the data with metadata and larger dataset size, within the text analysis process. In this thesis, a new feature location approach is proposed that considers the metadata of time and developer, and uses only the nouns. The proposed approach analyzes and weights the data from the aspect of time when the data was recorded and the aspect of developer who recorded the data in the repository. In this approach, first, a time- and developer-based corpus is created from the nouns extracted from the repository’s data. Then, the nouns are weighted using two term-weighting techniques including a time-aware term-weighting technique and a developers-based time-aware term-weighting technique. Next, the calculated weights for each noun are combined to obtain the total noun’s weight. Finally, the source code files were ranked based on the summation of the total weights of the nouns that appeared in both the given software feature and the source code files. The empirical evaluation of the proposed approach on a set of open-source projects indicates remarkable improvements over the feature location baseline approaches that utilize VSM (Vector Space Model) and SUM (Smoothed Unigram Model). The proposed approach outperforms the accuracy, effectiveness and performance of the feature location baseline approaches as much as 62%, 43% and 30%, respectively. In this approach, the time-based analysis and weighting of the data make an improvement over the baseline approaches up to 38%, 35% and 19%, respectively; whereas the developer-based analysis and weighting of the data make an improvement up to 55%, 39% and 29%, respectively. Furthermore, the use of nouns-only, instead of using all types of terms, improves the accuracy, effectiveness and performance as much as 26%, 49% and 23%, respectively and reduces the dataset size up to 60%. The statistical analysis of the experimental results demonstrates the significance of the improvement in all aspects. In general, consideration of time-metadata and developer-metadata in analyzing and weighting the data, along with the use of only the nouns, makes significant improvements to feature location.
format	Thesis
id	oai:studentsrepo.um.edu.my:10764
institution	Universiti Malaya
publishDate	2016
record_format	eprints
spelling	oai:studentsrepo.um.edu.my:107642020-01-18T02:15:43Z A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani Sima , Zamani QA75 Electronic computers. Computer science QA76 Computer software Feature location is one of the frequent software maintenance activities that aims to identify a source code location pertinent to a software feature. Most of the proposed feature location approaches are based, at least in part, on text analysis to determine the similarity of a new feature with the source code data. However, the text analysis methods used in feature location originate from the natural language context. Unlike the typical context in which these methods are applied, text documents in software repositories, such as source code files, have a corresponding set of metadata including such items as time- stamps, developer identifiers, and commit comments. Furthermore, the history of changes of the source code is recorded in the repositories that leads to a larger dataset size. Due to these differences between the contexts in software repositories and natural language, the text analysis does not utilize its possible potential for accurately locating software features. Accordingly, the goal of this thesis is to improve feature location by addressing the specific characteristics of the repositories’ text data, i.e. incorporation of the data with metadata and larger dataset size, within the text analysis process. In this thesis, a new feature location approach is proposed that considers the metadata of time and developer, and uses only the nouns. The proposed approach analyzes and weights the data from the aspect of time when the data was recorded and the aspect of developer who recorded the data in the repository. In this approach, first, a time- and developer-based corpus is created from the nouns extracted from the repository’s data. Then, the nouns are weighted using two term-weighting techniques including a time-aware term-weighting technique and a developers-based time-aware term-weighting technique. Next, the calculated weights for each noun are combined to obtain the total noun’s weight. Finally, the source code files were ranked based on the summation of the total weights of the nouns that appeared in both the given software feature and the source code files. The empirical evaluation of the proposed approach on a set of open-source projects indicates remarkable improvements over the feature location baseline approaches that utilize VSM (Vector Space Model) and SUM (Smoothed Unigram Model). The proposed approach outperforms the accuracy, effectiveness and performance of the feature location baseline approaches as much as 62%, 43% and 30%, respectively. In this approach, the time-based analysis and weighting of the data make an improvement over the baseline approaches up to 38%, 35% and 19%, respectively; whereas the developer-based analysis and weighting of the data make an improvement up to 55%, 39% and 29%, respectively. Furthermore, the use of nouns-only, instead of using all types of terms, improves the accuracy, effectiveness and performance as much as 26%, 49% and 23%, respectively and reduces the dataset size up to 60%. The statistical analysis of the experimental results demonstrates the significance of the improvement in all aspects. In general, consideration of time-metadata and developer-metadata in analyzing and weighting the data, along with the use of only the nouns, makes significant improvements to feature location. 2016-12 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/10764/2/Sima.pdf application/pdf http://studentsrepo.um.edu.my/10764/1/Sima_Zamani_%E2%80%93_Thesis.pdf Sima , Zamani (2016) A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/10764/
spellingShingle	QA75 Electronic computers. Computer science QA76 Computer software Sima , Zamani A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title	A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_full	A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_fullStr	A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_full_unstemmed	A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_short	A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani
title_sort	noun based feature location approach supported by time aware term weighting technique for facilitating software maintenance sima zamani
topic	QA75 Electronic computers. Computer science QA76 Computer software
url-record	http://studentsrepo.um.edu.my/10764/
work_keys_str_mv	AT simazamani anounbasedfeaturelocationapproachsupportedbytimeawaretermweightingtechniqueforfacilitatingsoftwaremaintenancesimazamani AT simazamani nounbasedfeaturelocationapproachsupportedbytimeawaretermweightingtechniqueforfacilitatingsoftwaremaintenancesimazamani

A noun-based feature location approach supported by time aware term-weighting technique for facilitating software maintenance / Sima Zamani

相似書籍