Archiving and compression

The traditional Unix archiving and compression tools are separated according to the Unix philosophy:

  • A file archiver combines several files into one archive file, e.g. tar.
  • A compression tool compresses and decompresses data, e.g. gzip.

These tools are often used in sequence by firstly creating an archive file and then compressing it.

Of course there are also tools that do both, which tend to additionally offer encryption, error detection and recovery.

Archiving only

NamePackageManualsDescription
GNU tartartar(1), infoCore utility for manipulating the ubiquitous tar archives (tarballs), which are used by pacman and the AUR.
libarchivelibarchivebsdtar(1)
bsdcpio(1)
Implementation of tar and cpio that also offers a library. Used by pacman and mkinitcpio.
arbinutilsar(1)Legacy Unix archiver before tar. Today only used for creating static library files.
cpiocpiocpio(1)File archiver via stdin/stdout, supports cpio and tar formats.
DARdarAURArchiver to backup large live filesystems, takes care of hard links, extended attributes, sparse files and inode types.
Tip: Both GNU and BSD tar automatically do decompression delegation for bzip2, compress, gzip, lzip, lzma, lzop, zstd, and xz compressed archives. Only BSD tar supports lz4 natively (but GNU tar can do an equivalent with --use-compress-program=lz4/-Ilz4). When creating archives both support the -a switch to automatically filter the created archive through the right compression program based on the file extension. While BSD tar recognizes compression formats based on the format, GNU tar only guesses based on the file extension.

See also #Archiving only usage.

Compression tools

Compression only

These compression programs implement their own file format.

NamePackageManualExtTar extDescriptionParallel implementations
bzip2.bz2, .bz.tbz2, .tbzUses the Burrows–Wheeler algorithm., pbzip2
gzip.gz, .z.tgz, .tazGNU zip, based on DEFLATE algorithm.
lrziplrzip(1).lrzImproved version of rzip, uses multiple algorithms.is multithreaded
LZ4.lz4Written in C, focused on compression and decompression speed.is multithreaded
lzip.lzUses LZMA.
lzoplzop.lzop.tzoUses the LZO library ().
xz.xz, .lzma.txz, .tlzUses LZMA, default for GNU and kernel archive files.is multithreaded, , pxz-gitAUR
zstd.zstUses Zstandard algorithm.is multithreaded
  • Parallel implementations offer improved speeds by using multiple CPU cores.
  • Tar extensions refers to compressed archives where and the compression tool is used, e.g. .tzo is .
  • See also #Compression only usage.

Archiving and compression

NamePackagesManualsExtDescription
p7zip.7zThe third-party POSIX port of 7-zip's command-line.
7-Zip-The upstream Linux version of a file archiver with a high compression ratio.
RAR, .rarBoth the format and the rar utility are proprietary.
ZIPzip, , .zipWidely used outside of the Linux world.
Unarchiverunar(1), manyCommand-line tool of a Mac application, supports over 40 archive formats.
ZPAQ.zpaqA high compression ratio archiver written in C++, uses several algorithms.
LHa, lha(1).lzh (on Amiga: .lha)LZH/LHA archiver, supports the lh7-method.

See also #Archiving and compression usage.

Feature charts

Some of the tools above are capable of handling multiple formats, allowing for fewer installed packages.

Decompress

Namegzipbzip2ZIPLHa/LZHRARcompressCABARJ
  1. 's gunzip can only decompress single member ZIP files.

Usage comparison

Archiving only usage

NameCreate archiveExtract archiveList content
tar(1)tar -tvf archive.tar
cpio(1)

Compression only usage

NameCompressDecompressDecompress to stdout
bzip2 -d file.bz2
lrzip(1)
lrzip -d file.lrz

Archiving and compression usage

NameCompressDecompressDecompress to stdoutList content
7z a archive.7z file1 file2
rar a archive.rar file1 file2
,
lha(1)lha ao7 archive.lzh file1 file2minimal: verbose:

Convenience tools

  • atool Script for managing file archives of various types.
https://www.nongnu.org/atool/ || atool

Determining archive format

To extract an archive, its file format needs to be determined. If the file is properly named you can deduce its format from the file extension.

Otherwise you can use the tool, see .

Esoteric, rare or deprecated tools

NamePackagesExtDescription
ARCarcAUR.arc, .arkWas very popular during the early days of the dial-up BBS. Superseded by ZIP.
ARJ.arjAn archiver used on DOS/Windows in mid-1990s. This is an open source clone.
compress.ZThe classic unix compression utility which can handle the ancient .Z archive.
PAR2.par2Parity archiver for increased data integrity. See also Parchive.
shar.sharCreates self-extracting archives that are valid shell scripts.
Zoo.zooWas mostly popular on the OpenVMS operating system before PKZIP became popular.

Device mapper compression

There is work being done to mainline (integrate into the Linux kernel project) the recently open-sourced VDO project, which provides a deduplication and compression device mapper layer in the interest of increasing storage efficiency. The following packages are available:

  • kvdo A pair of kernel modules which provide pools of deduplicated and/or compressed block storage
https://github.com/dm-vdo/kvdo || kvdo-dkmsAUR

Compression libraries

  • zlib Compression library implementing the deflate compression method found in gzip and PKZIP.
https://www.zlib.net/ || zlib

    Troubleshooting

    Garbled Japanese Filenames

    Japanese versions of Windows encode ZIP archives with Shift-JIS. By default, these archives will suffer from mojibake filenames when extracted. To extract properly, use `unzip` in the command-line using the shift-jis option.

    $ unzip -O shift-jis nihongo.zip

    See also

    This article is issued from Archlinux. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.