Last modified on 9 February 2014, at 19:22

Guide to Unix/Commands/File Compression

gzipEdit

gzip compresses files. Each single file is compressed into a single file. The compressed file consists of a GNU zip header and deflated data.

If given a file as an argument, gzip compresses the file, adds a ".gz" suffix, and deletes the original file. With no arguments, gzip compresses the standard input and writes the compressed file to standard output.

Some useful options are:

-c  Write compressed file to stdout. Do not delete original file.
-d  Act like gunzip.
-1  Performance: Use fast compression (somewhat bigger result)
-9  Performance: Use best compression (somewhat slower)

Examples:

Compress the file named README. Creates README.gz and deletes README.

$ gzip README

Compress the file called README. The standard output (which is the compressed file) is redirected by the shell to gzips/README.gz. Keeps README.

$ gzip -c README > gzips/README.gz

Use gzip without arguments to compress README.

$ < README gzip > gzips/README.gz

gunzipEdit

gunzip uncompresses a file that was compressed with "gzip" or "compress". It tries to handle both the GNU zip format of gzip and the older Unix compress format. It does this by recognizing the extension (".gz" or ".Z" or several others) of a file.

Some useful options are:

-c  Write uncompressed data to stdout. Do not delete original file.

Undo the effect of gzip README.gz by replacing the compressed version of the file with the original, uncompressed version. Creates README and deletes README.gz.

$ gunzip README.gz

Write the uncompressed contents of README.gz to standard output. Pipe it into a pager for easy reading of a compressed file.

$ gunzip -c README.gz | more

Another way to do that is:

$ gunzip < README.gz | more

Some people name files package.tgz as short for package.tar.gz.

zcatEdit

zcat is same thing as uncompress -c, though on many systems it is actually same as "gzcat" and gunzip -c.

gzcatEdit

gzcat is same as gunzip -c which is gzip -dc.

tarEdit

tar archives without compression.

An archive contains one or more files or directories. (If archiving multiple files, it might be better to put them in one directory, so extracting will put the files into their own directory.)

Modes:

-c  create an archive (files to archive, archive from files)
-x  extract an archive (archive to files, files from archive)

Options:

-f FILE  name of archive - must specify unless using tape drive for archive
-v       be verbose, list all files being archived/extracted
-z       create/extract archive with gzip/gunzip
-j       create/extract archive with bzip2/bunzip2
-J       create/extract archive with XZ


Examples:

Compress (gzip) and package (tar) the directory myfiles to create myfiles.tar.gz:

$ tar -czvf myfiles.tar.gz myfiles

Uncompress (gzip) and unpack compressed package, extracting contents from myfiles:

$ tar -xzvf myfiles.tar.gz

There are two different conventions concerning gzipped tarballs. One often encounters .tar.gz. The other popular choice is .tgz. Slackware packages use the latter convention.

If you have access to a tape device or other backup medium, then you can use it instead of an archive file. If the material to be archived exceeds the capacity of the backup medium, the program will prompt the user to insert a new tape or diskette.

Use the following command to back up the myfiles directory to floppies:

$ tar -cvf /dev/fd0 myfiles

Restore that backup with:

$ tar -xvf /dev/fd0

You can also specify standard input or output -f - instead of an archive file or device. It is possible to use copy between directories by piping two "tar" commands together. For example, suppose we have two directories, from-stuff and to-stuff

$ ls -F
from-stuff/
to-stuff/

As described in Running Linux, one can mirror everything from from-stuff to to-stuff this way:

$ tar cf - . | (cd ../to-stuff; tar xvf -)

Reference: Welsh, Matt, Matthias Kalle Dalheimer and Lar Kaufman (1999), Running Linux. Third edition, O'Reilly and Associates.

cpioEdit

cpio is used for creating archives. When creating an archive, a list of files is fed to its standard-input (rather than specifying the files on the commandline). This file-list is typically created by ls, find or locate and then piped directly to cpio; but it can also first be filtered/edited with commands like *grep, sed, sort and others. A (pre-edited) list stored as a file can also be used, by using cat to feed the pipeline or simply by redirecting the shell's standard-input (<).

cpio works in one of three modes:

  • cpio -o - Copy-Out mode: Files are copied out from the filesystem to create an archive. Usually the archive is created by simply using the shell to redirect cpio's output to a file (with >).
  • cpio -i - Copy-In mode: Files from an existing archive are restored/extracted, and copied back in to the filesystem.
  • cpio -p - Pass-Through mode: cpio is used to copy files from one location in the directory-tree to another, without an actual archiving being made.

In addition comes:

  • cpio -t - List archive: The content of an archive is listed without extracting it.
  • cpio -tv - Here the verbose-option (-v) will cause a "long listing", with permissions, size and ownership.

Adding the verbose-option (-v) in Copy-In, Copy-Out and Pass-Through mode, will cause cpio to list the files as they're extracted/archived/copied.

Using ls to create an archive (verbosely) with all doc-files in the current directory:

$ ls *.doc | cpio -ov > word-docs.cpio

Using find to create an archive with all txt-files in and below the current directory:

$ find . -name "*.txt" | cpio -ov > text-files.cpio

Using find and fgrep to create an archive of just the txt-files containing the word wiki (any case):

$ find . -name "*.txt" -exec fgrep -l -i "wiki" {} \; | cpio -ov > wiki.cpio

For fgrep the option -i means "ignore case", and the option -l cause it to just list the filenames of files matching the pattern.

Using an existing list of files:

$ cpio -ov < file-list.txt > archive.cpio

Using several list of files, but first after sort-ing and uniq-ing them:

$ cat files1 files2 files3 | sort | uniq | cpio -ov > myfiles.cpio

To add more files, use the append-option (-A). Specify the file with the file-option (-F):

$ cat files4 | cpio -ovA -F myfiles.cpio

To extract files (being verbose):

$ cpio -iv < myfiles.cpio

cpio doesn't create directories by default, so use the option -d to make it.

To extract files, while creating directories as needed:

$ cpio -ivd < myfiles.cpio

To list the content of an archive, short listing:

$ cpio -t < myfiles.cpio

To list the content of an archive, long listing:

$ cpio -tv < myfiles.cpio

paxEdit

pax is like "tar" but with different command-line syntax. Because "pax" does not assume the tape device, some prefer it to "tar".

bzip2Edit

bzip2 and bunzip2 are similar to "gzip"/"gunzip" but with a different compression method. Compression is generally better but slower than "gzip". Decompression is somewhat fast.

An option of -1 through -9 can be used to specify how good bzip2 should compress. The number tells how large "chunks" in steps of 100kB should compress at a time, so using bzip2 -5 foo.bar will compress foo.bar in chunks of 500kB each. Generally, larger chunks means better compression (but probably slower). Only undamged "chunks" can be recovered with bzip2recover from a damaged bzip2-file, so if you've compressed 900kB chunks, you'll loose 900kB of your file if one chunk is damaged - but only 100kB if you used 100kB chunks (bzip2 -1). By default bzip2 uses 900kB chunks for best possible compression.

bzcat is same as bunzip2 -c which is bzip2 -dc.

zipEdit

zip is an archive which compresses the members individually. (Imagine gzip of every file before tar-ing them, but with a different format.) The "zip" format is a common archiving file format used on Microsoft Windows PCs.

Like for gzip the quality of the compression can be specified by giving a number between 1 and 9 as an option (e.g. zip -5). 1 is quickest, but gives a low-quality compression. 9 gives the highest quality of compression, but is slow. In addition 0 can be used (i.e. zip -0) to specify that the files should just be "stored" and not compressed (a compression of 0%), thus making it possible to use zip to make uncompressed archives.

Note that a zip-archive contains individualy compressed files collected into a single file. This is the opposite of how it's done for most other compressed Unix-archives (e.g. tar.gz and tar.bz2), where the files/directories are first collected into a single file -- an archive (e.g. cpio or tar), and then this single file is compressed (e.g. using gzip or bzip2).

compressEdit

compress is a compressed file format that is popular on UNIX systems. Files compressed with compress will have a ".Z" extension appended to its name.