Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation

There are several data mining tasks such as classification, clustering, prediction, summarization and others. Among them, a prediction task is widely applied in many real applications such as in manufacturing, medical, business and mainly for developing prediction model. However, to build a robust p...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखक: Lateh, Masitah bdul
स्वरूप: थीसिस
भाषा:अंग्रेज़ी
अंग्रेज़ी
प्रकाशित: 2020
विषय:
ऑनलाइन पहुंच:http://eprints.utem.edu.my/id/eprint/25379/1/Small%20Dataset%20Learning%20In%20Prediction%20Model%20Using%20Box-Whisker%20Data%20Transformation.pdf
http://eprints.utem.edu.my/id/eprint/25379/2/Small%20Dataset%20Learning%20In%20Prediction%20Model%20Using%20Box-Whisker%20Data%20Transformation.pdf
_version_ 1846509801367928832
author Lateh, Masitah bdul
author_facet Lateh, Masitah bdul
author_sort Lateh, Masitah bdul
description There are several data mining tasks such as classification, clustering, prediction, summarization and others. Among them, a prediction task is widely applied in many real applications such as in manufacturing, medical, business and mainly for developing prediction model. However, to build a robust prediction model, the learning process from the training set are advised to have many samples. Otherwise, learning from small sample sizes might cause prediction task produced an imprecise model. However, to enlarge a sample size and ensure sufficient learning is sometimes difficult or expensive in certain situations. Thus, the information gained from small samples size are deficient. The main reason why a small sample size has problem in extracting the valuable information is that, the information gaps is exist. These gaps should be filled with observations in a complete dataset. However, these observations are not available. This situation has caused most of the learning tools are difficult to perform the prediction task. This is due to a small samples size will not provide sufficient information in the learning process which will lead to incorrect result. From the previous studies, there are solutions to improve learning accuracy and predictive capability where some artificial data will be added to the system using artificial data generation approach. Hence, the aims of this study are proposing an algorithm of hybrid to generate artificial samples adopts Small Johnson Data Transformation and Box-Whisker Plot which is introduced in previous studies. The proposed algorithm named as Box-Whisker Data Transformation considered all samples contain in a MLCC dataset in order to generate artificial samples. This study also investigates the effectiveness of employing the artificial data generation approach into a prediction model. Initially, the quantiles of raw samples are determine using Box-whisker Plot technique. Subsequently, the Small Johnson Data Transformation is employed to transformed raw samples to a Normal Distribution. Next, samples are generated from Normal Distribution. To test the effectiveness of the proposed algorithm, the real and generated samples is added to training phase to build a prediction model using M5 Model Tree. The results of this study are sample quantiles from reasonable range are generated. Not only that, using all samples available in a dataset as a training samples caused the properties of original pattern behaviors is retained. Besides, the effectivess of the learning performance of prediction model are proved when the number of artificial samples are increased, the average of the mean absolute Percentage Error (AvgMAPE) results of a M5 Model Tree are decreased. This reveals that the training size effect the accuracy of prediction models when the sample size is small.
format Thesis
id oai:eprints.utem.edu.my:25379
institution Universiti Teknikal Malaysia Melaka
language English
English
publishDate 2020
record_format eprints
spelling oai:eprints.utem.edu.my:253792021-10-27T16:13:54Z http://eprints.utem.edu.my/id/eprint/25379/ Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation Lateh, Masitah bdul QA Mathematics QA76 Computer software There are several data mining tasks such as classification, clustering, prediction, summarization and others. Among them, a prediction task is widely applied in many real applications such as in manufacturing, medical, business and mainly for developing prediction model. However, to build a robust prediction model, the learning process from the training set are advised to have many samples. Otherwise, learning from small sample sizes might cause prediction task produced an imprecise model. However, to enlarge a sample size and ensure sufficient learning is sometimes difficult or expensive in certain situations. Thus, the information gained from small samples size are deficient. The main reason why a small sample size has problem in extracting the valuable information is that, the information gaps is exist. These gaps should be filled with observations in a complete dataset. However, these observations are not available. This situation has caused most of the learning tools are difficult to perform the prediction task. This is due to a small samples size will not provide sufficient information in the learning process which will lead to incorrect result. From the previous studies, there are solutions to improve learning accuracy and predictive capability where some artificial data will be added to the system using artificial data generation approach. Hence, the aims of this study are proposing an algorithm of hybrid to generate artificial samples adopts Small Johnson Data Transformation and Box-Whisker Plot which is introduced in previous studies. The proposed algorithm named as Box-Whisker Data Transformation considered all samples contain in a MLCC dataset in order to generate artificial samples. This study also investigates the effectiveness of employing the artificial data generation approach into a prediction model. Initially, the quantiles of raw samples are determine using Box-whisker Plot technique. Subsequently, the Small Johnson Data Transformation is employed to transformed raw samples to a Normal Distribution. Next, samples are generated from Normal Distribution. To test the effectiveness of the proposed algorithm, the real and generated samples is added to training phase to build a prediction model using M5 Model Tree. The results of this study are sample quantiles from reasonable range are generated. Not only that, using all samples available in a dataset as a training samples caused the properties of original pattern behaviors is retained. Besides, the effectivess of the learning performance of prediction model are proved when the number of artificial samples are increased, the average of the mean absolute Percentage Error (AvgMAPE) results of a M5 Model Tree are decreased. This reveals that the training size effect the accuracy of prediction models when the sample size is small. 2020 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/25379/1/Small%20Dataset%20Learning%20In%20Prediction%20Model%20Using%20Box-Whisker%20Data%20Transformation.pdf text en http://eprints.utem.edu.my/id/eprint/25379/2/Small%20Dataset%20Learning%20In%20Prediction%20Model%20Using%20Box-Whisker%20Data%20Transformation.pdf Lateh, Masitah bdul (2020) Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation. Masters thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=119720
spellingShingle QA Mathematics
QA76 Computer software
Lateh, Masitah bdul
Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation
title Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation
title_full Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation
title_fullStr Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation
title_full_unstemmed Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation
title_short Small Dataset Learning In Prediction Model Using Box-Whisker Data Transformation
title_sort small dataset learning in prediction model using box whisker data transformation
topic QA Mathematics
QA76 Computer software
url http://eprints.utem.edu.my/id/eprint/25379/1/Small%20Dataset%20Learning%20In%20Prediction%20Model%20Using%20Box-Whisker%20Data%20Transformation.pdf
http://eprints.utem.edu.my/id/eprint/25379/2/Small%20Dataset%20Learning%20In%20Prediction%20Model%20Using%20Box-Whisker%20Data%20Transformation.pdf
url-record http://eprints.utem.edu.my/id/eprint/25379/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=119720
work_keys_str_mv AT latehmasitahbdul smalldatasetlearninginpredictionmodelusingboxwhiskerdatatransformation