Optimizing lossless compression by normalized data length in Huffman Algorithm

Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16...

Full description

Bibliographic Details
Main Author:	Tonny, Hidayat
Format:	Thesis
Language:	English English
Published:	2022
Subjects:	Q Science (General) QA Mathematics
Online Access:	http://eprints.utem.edu.my/id/eprint/26986/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177
Abstract	Abstract here

_version_	1855750108872704000
author	Tonny, Hidayat
author_facet	Tonny, Hidayat
author_sort	Tonny, Hidayat
description	Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research.
format	Thesis
id	utem-26986
institution	Universiti Teknikal Malaysia Melaka
language	English English
publishDate	2022
record_format	EPrints
record_pdf	Restricted
spelling	utem-269862024-01-16T14:45:48Z http://eprints.utem.edu.my/id/eprint/26986/ Optimizing lossless compression by normalized data length in Huffman Algorithm Tonny, Hidayat Q Science (General) QA Mathematics Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research. 2022 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/26986/1/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf text en http://eprints.utem.edu.my/id/eprint/26986/2/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf Tonny, Hidayat (2022) Optimizing lossless compression by normalized data length in Huffman Algorithm. Doctoral thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177
spellingShingle	Q Science (General) QA Mathematics Tonny, Hidayat Optimizing lossless compression by normalized data length in Huffman Algorithm
thesis_level	PhD
title	Optimizing lossless compression by normalized data length in Huffman Algorithm
title_full	Optimizing lossless compression by normalized data length in Huffman Algorithm
title_fullStr	Optimizing lossless compression by normalized data length in Huffman Algorithm
title_full_unstemmed	Optimizing lossless compression by normalized data length in Huffman Algorithm
title_short	Optimizing lossless compression by normalized data length in Huffman Algorithm
title_sort	optimizing lossless compression by normalized data length in huffman algorithm
topic	Q Science (General) QA Mathematics
url	http://eprints.utem.edu.my/id/eprint/26986/ https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177
work_keys_str_mv	AT tonnyhidayat optimizinglosslesscompressionbynormalizeddatalengthinhuffmanalgorithm

Optimizing lossless compression by normalized data length in Huffman Algorithm

Similar Items