Sunday, March 31, 2019
Types Of Data Compression Computer Science Essay
Types Of Data Compression Computer Science analyseData compression has come of age in the fit 20 years. Both the quantity and the quality of the body of literature in this report provide ample proof of this. There be m any cognize incessantitys for information compression. They are base on different ideas, are twin for different types of info, and produce different results, but they are e actu each(prenominal)y(prenominal) ground on the analogous principle, namely they compress information by removing redundancies from the master key entropy in the solution appoint. This report discusses the different types of info compression, the advantages of data compression and the procedures of data compression.2.0 DATA capsuleData compression is essential in this age because of the amount of data that is transferred within a sure(p) ne 2rk. It makes the transfer of data relatively easy 1. This section explains and compares lossy and lossless compression techniques.2.1 LOSSLESS DATA COMPRESSIONlossless data compression makes use of data compression algorithmic rules that allows the exact original data to be reconstructed from the squiffy data. This fag end be contrasted to lossy data compression, which does non allow the exact original data to be reconstructed from the rigorous data. lossless data compression is utilise in many operations 2.Lossless compression is used when it is vital that the original and the decompressed data be identical, or when no assurance can be made on whether certain deviation is uncritical.Most lossless compression programs implements two kinds of algorithms matchless which generates a statistical model for the stimulant data, and an another(prenominal) which maps the input data to place pull ins using this model in such a modal value that probable (e.g. frequently encountered) data will produce shorter output than tall(a) data. Often, only the former algorithm is named, while the second is implied (through common use, standardisation etc.) or unspecified 3.2.2 LOSSY DATA COMPRESSIONA lossy data compression technique is unrivalled where contraction data and its decompressing retrieves data that may will be different from the original, but is soaked enough to be useful in some way.There are two basic lossy compression schemesFirst is lossy transmute jurisprudencecs, where samples of picture or sound are taken, chopped into small segments, alter into a newfound basis space, and quantized. The resulting quantized values are thence entropy work outd 4.Second is lossy predictive jurisprudencecs, where previous and/or ulterior decoded data is used to predict the current sound sample or image frame.In some systems the two methods are used, with transform codecs being used to compress the error signals generated by the predictive stage.The advantage of lossy methods over lossless methods is that in some cases a lossy method can produce a much smaller compressed send than any know n lossless method, while still meeting the requirements of the application 4.Lossless compression schemes are reversible in-order for the original data can be reconstructed, while lossy schemes accept some loss of data in order to achieve higher compression.In practice, lossy data compression will also come to a point where compressing again does not work, although an extremely lossy algorithm, which for example ceaselessly removes the last byte of a file, will always compress a file up to the point where it is empty 5.2.3 LOSSLESS vs. LOSSY DATA COMPRESSIONLossless and lossy data compressions are two methods which are use to compressed data. Each technique has its individual used. A compression in the midst of the two techniques can be summarised as follow 4-5Lossless technique keeps the seminal fluid as it is during compression while a change of the original ascendent is expected in lossy technique but very cobblers last to the origin.Lossless technique is reversible procedur e which means that the original data can be reconstructed. However, the lossy technique is irreversible due to the disconnected of some data during extraction.Lossless technique produces biggerr compressed file compared with lossy technique.Lossy technique is mostly used for images and sound.3.0 DATA COMPRESSION TECHNIQUESData compression is known as storing data in a way which requires fewer spaces than the typical. Generally, it is saving of space by the reduction in data size 6. This section explains Huffman cryptograph and Lempel-Ziv-Welch (LZW) compression techniques.3.1 HUFFMAN CODINGHuffman cryptograph is an entropy encoding method used for lossless data compression. The terminal figure means the use of a variable-length code table for encoding a source image (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each accomplishable value of the source sign. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper A Method for the Construction of Minimum-Redundancy Codes 4.Huffman coding implements a special method for choosing the representation for each symbol, resulting in a prefix code (sometimes called prefix-free codes, that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols 5.The technique works by creating a binary maneuver of invitees. These can be stored in a regular array, the size of which depends on the number of symbols, n. A lymph gland can be either a leaf node or an internal node. Initially, all nodes are leaf nodes, which contain the symbol itself, the weight (frequency of appearance) of the symbol and optionally, a link to a parent node which makes it easy to read the code (in reverse) starting from a leaf node. Internal nodes contain symbol weight, associate to two child nodes and the optional link to a parent node.The process practically starts with the leaf nodes containing the probabilities of the symbol they represent, and then a new node whose children are the 2 nodes with smallest probability is created, such that the new nodes probability is equal to the sum of the childrens probability. With the 2 nodes combined into one node (thus not considering them anymore), and with the new node being now considered, the procedure is repeated until only one node remains, the Huffman tree 4.The simplest construction algorithm is one where a priority queues where the node with lowest probability is given highest priority 51. Create a leaf node for each symbol and add it to the priority queue.2. time there is more than one node in the queue leave out the two nodes of highest priority (lowest probability) from the queue.Create a new internal node with these two nodes a s children and with probability equal to the sum of the two nodes probabilities. work the new node to the queue.3. The remaining node is the root node and the tree is complete 7.Figure (1).3.2 LEMPEL-ZIV-WELCH (LVW) COMPRESSIONLempel-Ziv-Welch (LZW) is a data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as a development of the LZ78 algorithm published by Lempel and Ziv in 1978. The algorithm is designed to be fast to implement but is not usually optimal because it performs only limited analysis of the data.LZW can also be called asubstitutionalor vocabulary-based encoding algorithm. The algorithm normally builds adata dictionary(also called atranslation tableorstring table) of data occurring in an uncompressed data spud. Patterns of data (substrings) are identified in the data stream and are matched to entries in the dictionary. If the substring is not present in the dictionary, a code say is created based on the d ata content of the substring, and it is stored in the dictionary. The phrase is then compose to the compressed output stream 8.When a reoccurrence of a substring is found in the data, the phrase of the substring already stored in the dictionary is written to the output. Because the phrase value has a physical size that is smaller than the substring it represents, data compression is achieved.Decoding LZW data is the reverse of encoding. The decompressor reads the code from the stream and adds the code to the data dictionary if it is not already there. The code is then translated into the string it represents and is written to the uncompressed output stream 8.LZW goes beyond most dictionary-based compressors because it is not requisite to keep the dictionary to decode the LZW data stream. This can save quite an a bit of space when storing the LZW-encoded data 9.TIFF, among other file formats, applies the same method for graphic files. In TIFF, the pel data is packed into bytes t o begin with being presented to LZW, so an LZW source byte great power be a pixel value, part of a pixel value, or several pixel values, depending on the images bit depth and number of colour channels.GIF requires each LZW input symbol to be a pixel value. Because GIF allows 1- to 8-bit deep images, there are in the midst of 2 and 256 LZW input symbols in GIF, and the LZW dictionary is initialized accordingly. It is not important how the pixels might have been packed into storage LZW will deal with them as a sequence of symbols 9.The TIFF approach does not work very easy for odd-size pixels, because packing the pixels into bytes creates byte sequences that do not match the original pixel sequences, and any patterns in the pixels are obscured. If pixel boundaries and byte boundaries agree (e.g., two 4-bit pixels per byte, or one 16-bit pixel every two bytes), then TIFFs method works surface 10.The GIF approach works better for odd-size bit depths, but it is difficult to direct i t to more than eight bits per pixel because the LZW dictionary mustiness become very large to achieve useful compression on large input alphabets.If variable-width codes were implemented, the encoder and decoder must be careful to change the width at the same points in the encoded data, or they will disagree about where the boundaries between individual codes fall in the stream 11.4.0 CONCLUSIONIn conclusion, because of the fact that one cant hope to compress everything, all compression algorithms must assume that there is some bias on the input messages so that some inputs are more likely than others, i.e. that there will always be some unbalanced probability distribution over the possible messages. Most compression algorithms base this bias on the structure of the messages i.e., an assumption that repeated characters are more likely than random characters, or that large white patches occur in typical images. Compression is therefore all about probability.