Tool versatility: the LZW compression algorithm

Tool versatility: the LZW compression algorithm#

data compression image

Regular computer users of the last 3 decades will almost certainly be familiar with the zip file: you have a computer file that is too large to email, or too large for some other use case, for example, so you "zip" the file and send the zip instead. The zip file is smaller than the original, because it is a version of the original in which the data has been compressed.

In computers, this type of data compression had its first broad availability thanks to being provided with the Unix operating system, beginning about 1986. At this stage, the zipping -- the compressing -- was accomplished using the LZW compression algorithm. This algorithm employs a dynamic dictionary model to achieve lossless compression, and its ability to compress takes advantage of how extremely often sequences(patterns) are repeated in data.

LZW starts with a small, unchanging base dictionary for which each entry has an index ID number. In the base dictionary, each entry is a 1-character sequence from which we can expect all sequences thereafter encountered in the input to be built. When presented with input, LZW then expands that initial dictionary by reading the input in order and adding the new sequences it finds to the dictionary. The compression stems from the fact that multi-character sequences will be found, and they will be represented in the output by just their dictionary ID. Consequently, the greater the sequence repetition in the source data, the greater the compression of the output will be.

Where the versatility comes in#

In the most rudimentary sense, LZW is just a tool, an algorithm, in the realm of information theory. So we shouldn't be surprised it has other uses than data compression. One very different use stems from LZW's input-length to output-length ratio being, by proxy, a measurement of the input's structural complexity -- in partcular, the degree to which it lacks repeating patterns.

Having a measurement basis for complexity can be essential in many realms of science, and, though LZW provides a measure for a particular type of complexity, that measure is well-matched to some research and practical-application use cases.

One example is the perturbation complexity index (PCI) developed by Marcello Massimini and his research team. LZW has been used to compute the complexity measurement in PCI in an approach they came to informally call "zap and zip". PCI is intended as a level-of-consciousness measurement for human subjects. A transcranial magnetic stimulation (TMS), the so-called "zap", is applied to the subject's brain, and this produces an echo within the brain that is measurable via EEG. Using the LZW algorithm to measure that echo, the "zip", has been shown to be a diagnostically-viable indication of consciousness level. This indicator is critical in some special cases where the typically-used "behavioral assessments" of consciousness may be unreliable or not even an option.

Leave a comment
Submissions are subject to review and approval

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search