Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data

The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent p...

पूर्ण विवरण

ग्रंथसूची विवरण
मुख्य लेखक: Baba, Ishaq Abdullahi
स्वरूप: थीसिस
भाषा:अंग्रेज़ी
प्रकाशित: 2022
विषय:
ऑनलाइन पहुंच:http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf
_version_ 1846217644684869632
author Baba, Ishaq Abdullahi
author_facet Baba, Ishaq Abdullahi
author_sort Baba, Ishaq Abdullahi
description The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number of predictor variables. In addressing this problem, some robust procedures for high dimensional dataset via the RFCH algorithm are developed. A modified reweighted fast consistent and high breakdown (MRFCH) estimator in high dimensional data based on the diagonal elements of the scatter matrix instead of its entire elements in the computation of robust Mahalanobis distance within the RFCH algorithm is developed. The proposed method inherits the robustness properties of the original RFCH estimators. Simulation results and artificial data examples showed that the proposed MRFCH is more efficient and faster than the MRCD and OGK estimators. Outlier detection and classification are critical issues that affect prediction accuracy if not handled correctly. Mahalanobis distance (MD) measure is one of the most popular multivariate analysis tools used to detect multivariate outlying observations. However, the traditional MD based on the classical mean and covariance rarely identifies all the multivariate outliers in a given dataset, which gives rise to the masking and swamping problems. Therefore, the robust location and covariance matrix based on the MRFCH is used instead of the classical estimators to tackle these problems. The proposed algorithm has been applied to detect outliers in the high dimensional data. The results obtained from the simulation study and real data sets indicate that the proposed method possesses high detection power with minimal misclassification error compared to the MRCD and MDP methods. The classical correlation estimators that employ the sample mean of the dependent and independent variables are known to be affected by outliers. Therefore, the robust weighted correlation coefficient that can reduce the effect of outliers is proposed. The weights based on the RD (MRFCH) are incorporated in establishing the proposed robust correlation to solve the problems. The performance of the proposed method is illustrated using simulation study and on glass vessel data with 1920 variables, cardiomyopathy microarray data with 6319 variables, and octane data with 226 dimensions. The results show that the robust weighted correlation based on RD (MRFCH) is more powerful and efficient than the existing methods, irrespective of dimension, sample size, and contamination levels. Sure screening-based correlation methods are popular tools used to select the most significant variables in the true model in sparse and high dimensional analysis. However, in practice, high leverage points may lead to misleading results in solving variable selection problems. Therefore, a robust sure independence screening procedure based on the weighted correlation algorithm of MRFCH for high dimensional data is developed to address this problem. The simulation study results and real data sets indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the best method compared to other methods in this study.
format Thesis
id oai:psasir.upm.edu.my:104718
institution Universiti Putra Malaysia
language English
publishDate 2022
record_format eprints
spelling oai:psasir.upm.edu.my:1047182023-10-05T06:36:21Z http://psasir.upm.edu.my/id/eprint/104718/ Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data Baba, Ishaq Abdullahi The reweighted fast, consistent and high breakdown (RFCH) estimator is a multivariate procedure used to estimate the robust location and scatter matrix. It is incorporated in the robust Mahalanobis distance to detect the presence of high leverage points in a dataset. The method showed excellent performance compared to its competitors. However, it cannot be applied when the sample size is less than the number of predictor variables. In addressing this problem, some robust procedures for high dimensional dataset via the RFCH algorithm are developed. A modified reweighted fast consistent and high breakdown (MRFCH) estimator in high dimensional data based on the diagonal elements of the scatter matrix instead of its entire elements in the computation of robust Mahalanobis distance within the RFCH algorithm is developed. The proposed method inherits the robustness properties of the original RFCH estimators. Simulation results and artificial data examples showed that the proposed MRFCH is more efficient and faster than the MRCD and OGK estimators. Outlier detection and classification are critical issues that affect prediction accuracy if not handled correctly. Mahalanobis distance (MD) measure is one of the most popular multivariate analysis tools used to detect multivariate outlying observations. However, the traditional MD based on the classical mean and covariance rarely identifies all the multivariate outliers in a given dataset, which gives rise to the masking and swamping problems. Therefore, the robust location and covariance matrix based on the MRFCH is used instead of the classical estimators to tackle these problems. The proposed algorithm has been applied to detect outliers in the high dimensional data. The results obtained from the simulation study and real data sets indicate that the proposed method possesses high detection power with minimal misclassification error compared to the MRCD and MDP methods. The classical correlation estimators that employ the sample mean of the dependent and independent variables are known to be affected by outliers. Therefore, the robust weighted correlation coefficient that can reduce the effect of outliers is proposed. The weights based on the RD (MRFCH) are incorporated in establishing the proposed robust correlation to solve the problems. The performance of the proposed method is illustrated using simulation study and on glass vessel data with 1920 variables, cardiomyopathy microarray data with 6319 variables, and octane data with 226 dimensions. The results show that the robust weighted correlation based on RD (MRFCH) is more powerful and efficient than the existing methods, irrespective of dimension, sample size, and contamination levels. Sure screening-based correlation methods are popular tools used to select the most significant variables in the true model in sparse and high dimensional analysis. However, in practice, high leverage points may lead to misleading results in solving variable selection problems. Therefore, a robust sure independence screening procedure based on the weighted correlation algorithm of MRFCH for high dimensional data is developed to address this problem. The simulation study results and real data sets indicate that the proposed MRFCHCS+LAD-SCAD estimator was found to be the best method compared to other methods in this study. 2022-01 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf Baba, Ishaq Abdullahi (2022) Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data. Doctoral thesis, Universiti Putra Malaysia. Algorithms Robust control
spellingShingle Algorithms
Robust control
Baba, Ishaq Abdullahi
Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_full Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_fullStr Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_full_unstemmed Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_short Robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
title_sort robust diagnostics and variable selection procedure based on modified reweighted fast consistent and high breakdown estimator for high dimensional data
topic Algorithms
Robust control
url http://psasir.upm.edu.my/id/eprint/104718/1/ISHAQ%20ABDULLAHI%20BABA%20-%20IR.pdf
url-record http://psasir.upm.edu.my/id/eprint/104718/
work_keys_str_mv AT babaishaqabdullahi robustdiagnosticsandvariableselectionprocedurebasedonmodifiedreweightedfastconsistentandhighbreakdownestimatorforhighdimensionaldata