Optimizing lossless compression by normalized data length in Huffman Algorithm

Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16...

Full description

Bibliographic Details
Main Author: Tonny, Hidayat
Format: Thesis
Language:English
English
Published: 2022
Subjects:
Online Access:http://eprints.utem.edu.my/id/eprint/26986/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177
Abstract Abstract here
_version_ 1855750108872704000
author Tonny, Hidayat
author_facet Tonny, Hidayat
author_sort Tonny, Hidayat
description Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research.
format Thesis
id utem-26986
institution Universiti Teknikal Malaysia Melaka
language English
English
publishDate 2022
record_format EPrints
record_pdf Restricted
spelling utem-269862024-01-16T14:45:48Z http://eprints.utem.edu.my/id/eprint/26986/ Optimizing lossless compression by normalized data length in Huffman Algorithm Tonny, Hidayat Q Science (General) QA Mathematics Due to the grown need of storage space, the demand for efficient compression scheme becomes increasingly important. One of the lossless data compression goals is to archive raw audio data to ensure the file is restored to the original form when it is to be reused. Generally, raw data is stored as 16-bit (65,536 difference possible values). Huffman Algorithms is currently still very effective at compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, however its performance cannot be determined if it is performed on data that has several variables and probabilities. Based on the literature review, the measurement of the compression performance for files archives is to use the Compression Ratio (CR) and Compression Time (CT) indicators. These two indicators are used to calculate and analyse the file size reduction and the ability of the file to be reconstructed back to its original form without compromising its quality. This research produces a new scheme called Quaternary Arity (4-ary) Modification Quadtree (MQ) or 4-ary/MQ based on entropy coding which has its roots in other variants of Huffman schemes such as Binary / Static, Quadtree, Octatree, and Hexatree. The 4-ary/MQ method employs the characteristics of the Quadtree structure and extends the Dynamic Huffman coding mechanism (FGK rule) in node arrangement while adopting the Adaptive Huffman method that uses additional variable data. The novelty of this scheme is the work of adding additional variables to maintain the branch root to ensure it is always consistent with four branches. A descriptive analysis of the 4-ary/MQ was performed on several audio datasets (Music, Mono Music, Stereo Music, Ripping CD, Speech, Noise, Sound Effects, and Instruments) to compare with the Huffman Schematic Variant. A comparative analysis with several lossless compression applications has significantly shown that CR is more optimal than PKZIP, WinZip, 7-Zip, and Monkeys Audio. It was found that the 4-ary/MQ compression benefits the compressed data that is stored in local storage media as well as for hosting and optimizing bandwidth. The new algorithm also has a good performance in producing optimal CR with fast CT in most of the 16-bit WAV audio datasets. The proposed new algorithm has more optimal CR than the various variants of the Huffman-based lossless application. It is also expected that this new algorithm scheme may potentially work well on data above 16-bit for future research. 2022 Thesis NonPeerReviewed text en http://eprints.utem.edu.my/id/eprint/26986/1/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf text en http://eprints.utem.edu.my/id/eprint/26986/2/Optimizing%20lossless%20compression%20by%20normalized%20data%20length%20in%20Huffman%20Algorithm.pdf Tonny, Hidayat (2022) Optimizing lossless compression by normalized data length in Huffman Algorithm. Doctoral thesis, Universiti Teknikal Malaysia Melaka. https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177
spellingShingle Q Science (General)
QA Mathematics
Tonny, Hidayat
Optimizing lossless compression by normalized data length in Huffman Algorithm
thesis_level PhD
title Optimizing lossless compression by normalized data length in Huffman Algorithm
title_full Optimizing lossless compression by normalized data length in Huffman Algorithm
title_fullStr Optimizing lossless compression by normalized data length in Huffman Algorithm
title_full_unstemmed Optimizing lossless compression by normalized data length in Huffman Algorithm
title_short Optimizing lossless compression by normalized data length in Huffman Algorithm
title_sort optimizing lossless compression by normalized data length in huffman algorithm
topic Q Science (General)
QA Mathematics
url http://eprints.utem.edu.my/id/eprint/26986/
https://plh.utem.edu.my/cgi-bin/koha/opac-detail.pl?biblionumber=122177
work_keys_str_mv AT tonnyhidayat optimizinglosslesscompressionbynormalizeddatalengthinhuffmanalgorithm