Data Compression/compressed file systems

file compression vs compressed file systems edit

FIXME: Is there a better place in this book to discuss compressed file systems and flash memory?

Many data compression algorithms are whole-file oriented. They assume that the entire original file is available up-front, and people will want to decompress the entire thing from beginning to end.

Compressed file systems (and especially flash-memory file systems) break that assumption. The ZFS file system uses the LZJB compression algorithm. Many internet routers and internet firewalls use a compressed flash-memory file system, often the cramfs file system. (See Reverse Engineering/File Formats#Compression, Encryption & Scrambling for an example). Some "solid-state hard drives" (SSDs) internally use compression algorithms to reduce the amount of flash "used up" by files -- this doesn't actually give users any more storage space, but the larger amount of empty storage space can be used internally by the SSD in ways that extend the lifetime of the drive.

These systems require relatively rapid random-access to data stored in the file system -- decompressing everything from the beginning would be too slow.

One approach is to use an index that, given some file name we want to look up (or a "logical block number"), it tells where exactly on the disk (or in the bulk flash) to start decompressing, combined with a streaming compression algorithm -- so the decompressor can jump into the middle of the compressed data and start decompressing from there.

A much more difficult requirement of these systems is to allow files to be edited. (This is so difficult that some compressed file systems, such as cramfs, such as python executable zipfiles[1], don't allow it -- they must be created all-at-once from a directory of uncompressed files. After a cramfs system is created, it can only be mounted as read-only).

The earliest applications of compressed file systems were designed to allow people to store more data on the relatively tiny, expensive storage systems available at the time.

As we mentioned in the Data Compression/Evaluating Compression Effectiveness section, Some people use data compression techniques even when they have so much RAM and disk space that there's no real need to make files smaller. For example, people designing video games often want to load a lot of data quickly from storage when loading the next level. If we use some simple data compression techniques to make the number of bytes on disk a little smaller, we can spend a little of the time we save on decompression and still come out ahead.[2]

Several people have built what is effectively a virtual file system that can read and write into a standard ".tgz" or ".zip" file: TrueZIP,[3] Java Zip File System Provider,[4] etc.

FIXME: go into a little more detail on why unmodified file-oriented compression algorithms won't work for compressed file systems, and what techniques are used to either (a) modify those algorithms so they are suitable, or (b) other algorithms that are useful for compressed flash file systems, or (c) combinations of both.

FIXME: say a few words about w: SquashFS.

FIXME: say a few words about w: initramfs/w: initrd.

virtual memory compression edit

FIXME: say a few words about zswap and zRAM and similar ideas related to virtual memory compression.

... using lz4 (a variant of lzo) ...

  1. Radomir Dopieralski. "Your Python Application as a Single File"
  2. Malte Skarupke. "Quickly Loading Things From Disk".
  3. TrueZIP
  4. Java Zip File System Provider