INTRODUCTION
-This is wimlib version 1.7.0-BETA (June 2014). wimlib is a C library for
+This is wimlib version 1.8.1 (May 2015). wimlib is a C library for
creating, modifying, extracting, and mounting files in the Windows Imaging
-Format (WIM files). These files are normally created using the ImageX
-(imagex.exe) or Dism (Dism.exe) utilities on Windows, but wimlib is distributed
-with a free implementation of ImageX called "wimlib-imagex" for both UNIX-like
-systems and Windows.
+Format (WIM files). wimlib and its command-line frontend 'wimlib-imagex'
+provide a free and cross-platform alternative to Microsoft's WIMGAPI, ImageX,
+and DISM.
INSTALLATION
-To install wimlib and wimlib-imagex on Windows you simply need to download and
-extract the ZIP file containing the latest binaries from the SourceForge page
-(http://sourceforge.net/projects/wimlib/), which you may have already done.
+To install wimlib and wimlib-imagex on Windows, simply download and extract the
+ZIP file containing the latest binaries from the SourceForge page
+(http://sourceforge.net/projects/wimlib/). You probably have already done this!
To install wimlib and wimlib-imagex on UNIX-like systems (with Linux being the
primary supported and tested platform), you must compile the source code, which
COMPRESSION RATIO
-wimlib (and wimlib-imagex) can create XPRESS, LZX, or LZMS compressed WIM
-archives. wimlib includes its own compression codecs and does not use the
-compression API available on some versions of Windows. The below table provides
-the results (file size, in bytes, and time to create, in seconds) of capturing a
-WIM containing an x86 Windows PE image, using various compression types and
-options. When applicable, the results with the equivalent Microsoft
-implementation in WIMGAPI, which is the library used by ImageX and Dism, are
-included.
-
- ===========================================================================
- | Compression type || wimlib (v1.6.1) | WIMGAPI (Windows 8) |
- ===========================================================================
- | None [1] || 531,979,435 in 18s | 531,980,333 in 24s |
- | XPRESS [2] || 207,369,912 in 22s | 209,886,010 in 39s |
- | LZX (quick) [3] || 194,876,901 in 29s | N/A |
- | LZX (normal) [4] || 187,962,713 in 158s | 188,163,523 in 125s |
- | LZX (slow) [5] || 186,913,423 in 358s | N/A |
- | LZMS (non-solid) [6] || 176,880,594 in 182s | N/A |
- | LZMS (solid) [7] || 136,507,304 in 494s | 126,735,608 in 623s |
- ===========================================================================
+wimlib (and wimlib-imagex) can create XPRESS, LZX, or LZMS compressed WIM files.
+wimlib's compression codecs usually outperform and outcompress their Microsoft
+equivalents. Although results will vary depending on the data being compressed,
+the table below shows results for a common use case: creating an x86 Windows PE
+image ("boot.wim"). Each row shows the compression type, the size of the
+resulting WIM file in bytes, and the time it took to create the file. When
+possible, the results with the Microsoft equivalent are included.
+
+ =============================================================================
+ | Compression || wimlib (v1.8.0) | WIMGAPI (Windows 8.1) |
+ =============================================================================
+ | None [1] || 361,314,224 in 2.4s | 361,315,338 in 4.5s |
+ | XPRESS [2] || 138,218,750 in 3.0s | 140,457,436 in 6.0s |
+ | XPRESS (slow) [3] || 135,173,511 in 8.9s | N/A |
+ | LZX (quick) [4] || 130,207,195 in 3.8s | N/A |
+ | LZX (normal) [5] || 126,522,539 in 10.4s | 127,293,240 in 19.2s |
+ | LZX (slow) [6] || 126,042,313 in 17.3s | N/A |
+ | LZMS (non-solid) [7] || 116,150,682 in 25.3s | N/A |
+ | LZMS (solid) [8] || 88,107,484 in 61.7s | 88,769,830 in 102.3s |
+ | "WIMBoot" [9] || 167,023,719 in 3.5s | 169,109,211 in 10.4s |
+ | "WIMBoot" (slow) [10] || 165,027,583 in 7.9s | N/A |
+ =============================================================================
Notes:
- [1] '--compress=none' for wimlib-imagex;
- '/compress none' or no option for ImageX.
-
- [2] '--compress=fast' or '--compress=XPRESS' for wimlib-imagex;
- '/compress fast' or no option for ImageX.
- Compression chunk size is 32768 (the default for XPRESS).
-
- [3] No compression option specified to wimlib-imagex; no known equivalent for
- WIMGAPI (ImageX uses XPRESS compression if no option specified).
- Compression chunk size is 32768 (the default for LZX).
-
- [4] '--compress=maximum' or '--compress=LZX' for wimlib-imagex;
- '/compress maximum' for ImageX.
- Compression chunk size is 32768 (the default for LZX).
-
- [5] '--compress=maximum --compress-slow' for wimlib-imagex;
- no known equivalent for WIMGAPI.
- Compression chunk size is 32768 (the default for LZX).
-
- [6] '--compress=recovery' or '--compress=LZMS' for wimlib-imagex;
- no known way to create the equivalent with WIMGAPI.
- Compression chunk size is 131072 (the default for LZMS). Note: this
- compression type is not generally recommended due to its limited
- compatibility with the MS implementations.
-
- [7] '--compress=recovery --solid' or '--compress=LZMS --solid' for
- wimlib-imagex; WIMCreateFile with WIM_COMPRESSION_LZMS and flag
- 0x20000000 for WIMGAPI. Compression chunk size in packed resources is
- 33554432 for wimlib, 67108864 for WIMGAPI. Note: this compression type
- is not generally recommended due to its limited compatibility with the MS
- implementations. Also, due to the large chunk size, wimlib uses about
- 500MB of memory per thread when compressing in this format.
-
-The above timings were done on Windows 8 (x86) so that side-by-side comparisons
-with the Microsoft implementation would be possible; however, wimlib may have
-even better performance on other operating systems such as Linux. The system
-had 2 CPUs and 2 GiB of memory available. All times were done with the page
-cache warmed, so the times primarily measure the performance of the compression
-algorithms and not the time to read data from disk, which presumably is similar
-in each implementation.
-
-Below are results for compressing the Canterbury corpus using wimlib (v1.6.1),
-WIMGAPI (Windows 8), and some other formats/programs, including the archive size
-only. Note that the Canterbury corpus includes no duplicate files or hard
-links, which WIM handles better than most other formats by storing only distinct
-data streams.
-
- =================================================
- | Format | Size (bytes) |
- =================================================
- | tar | 2,826,240 |
- | WIM (WIMGAPI, None) | 2,814,278 |
- | WIM (wimlib, None) | 2,813,856 |
- | WIM (WIMGAPI, XPRESS) | 825,410 |
- | WIM (wimlib, XPRESS) | 792,024 |
- | tar.gz (gzip, default) | 738,796 |
- | ZIP (Info-ZIP, default) | 735,334 |
- | tar.gz (gzip, -9) | 733,971 |
- | ZIP (Info-ZIP, -9) | 732,297 |
- | WIM (wimlib, LZX quick) | 722,196 |
- | WIM (WIMGAPI, LZX) | 651,766 |
- | WIM (wimlib, LZX normal) | 639,464 |
- | WIM (wimlib, LZX slow) | 633,144 |
- | WIM (wimlib, LZMS non-solid) | 590,252 |
- | tar.bz2 (bzip, default) | 565,008 |
- | tar.bz2 (bzip, -9) | 565,008 |
- | WIM (wimlib, LZMS solid) | 534,218 |
- | WIM (wimlib, LZMS solid, slow) | 529,904 |
- | WIM (WIMGAPI, LZMS solid) | 521,232 |
- | tar.xz (xz, default) | 486,916 |
- | tar.xz (xz, -9) | 486,904 |
- | 7z (7-zip, default) | 484,700 |
- | 7z (7-zip, -9) | 483,239 |
- =================================================
+ [1] '--compress=none' for wimlib-imagex; '/compress:none' for DISM.
+
+ [2] '--compress=XPRESS' for wimlib-imagex; '/compress:fast' for DISM.
+ Compression chunk size defaults to 32768 bytes in both cases.
+
+ [3] '--compress=XPRESS:80' for wimlib-imagex; no known equivalent for DISM.
+ Compression chunk size defaults to 32768 bytes.
+
+ [4] '--compress=LZX:20' for wimlib-imagex; no known equivalent for DISM.
+ Compression chunk size defaults to 32768 bytes.
+
+ [5] '--compress=LZX' or '--compress=LZX:50' or no option for wimlib-imagex;
+ '/compress:maximum' for DISM.
+ Compression chunk size defaults to 32768 bytes in both cases.
+
+ [6] '--compress=LZX:100' for wimlib-imagex; no known equivalent for DISM.
+ Compression chunk size defaults to 32768 bytes.
+
+ [7] '--compress=LZMS' for wimlib-imagex; no known equivalent for DISM.
+ Compression chunk size defaults to 131072 bytes.
+
+ [8] '--solid' for wimlib-imagex. Should be '/compress:recovery' for DISM,
+ but only works for /Export-Image, not /Capture-Image. Compression chunk
+ size in solid resources defaults to 67108864 bytes in both cases.
+
+ [9] '--wimboot' for wimlib-imagex; '/wimboot' for DISM.
+ This is really XPRESS compression with 4096 byte chunks, so the same as
+ '--compress=XPRESS --chunk-size=4096'.
+
+ [10] '--wimboot --compress=XPRESS:80' for wimlib-imagex;
+ no known equivalent for DISM.
+ Same format as [9], but trying harder to get a good compression ratio.
+
+Note: wimlib-imagex's --compress option also accepts the "fast", "maximum", and
+"recovery" aliases for XPRESS, LZX, and LZMS, respectively.
+
+Testing environment:
+
+ - 64 bit binaries
+ - Windows 8.1 virtual machine running on Linux with VT-x
+ - 4 CPUs and 4 GiB memory given to virtual machine
+ - SSD-backed virtual disk
+ - All tests done with page cache warmed
+
+The compression ratio provided by wimlib is also competitive with commonly used
+archive formats. Below are file sizes that result when the Canterbury corpus is
+compressed with wimlib (v1.8.0), WIMGAPI (Windows 8.1), and some other
+formats/programs:
+
+ =====================================================
+ | Format | Size (bytes) |
+ =====================================================
+ | tar | 2,826,240 |
+ | WIM (WIMGAPI, None) | 2,814,254 |
+ | WIM (wimlib, None) | 2,814,216 |
+ | WIM (WIMGAPI, XPRESS) | 825,536 |
+ | WIM (wimlib, XPRESS) | 789,296 |
+ | tar.gz (gzip, default) | 738,796 |
+ | ZIP (Info-ZIP, default) | 735,334 |
+ | tar.gz (gzip, -9) | 733,971 |
+ | ZIP (Info-ZIP, -9) | 732,297 |
+ | WIM (wimlib, LZX quick) | 690,110 |
+ | WIM (WIMGAPI, LZX) | 651,866 |
+ | WIM (wimlib, LZX normal) | 624,634 |
+ | WIM (wimlib, LZX slow) | 620,728 |
+ | WIM (wimlib, LZMS non-solid) | 581,046 |
+ | tar.bz2 (bzip, default) | 565,008 |
+ | tar.bz2 (bzip, -9) | 565,008 |
+ | WIM (WIMGAPI, LZMS solid) | 521,366 |
+ | WIM (wimlib, LZMS solid) | 515,800 |
+ | tar.xz (xz, default) | 486,916 |
+ | tar.xz (xz, -9) | 486,904 |
+ | 7z (7-zip, default) | 484,700 |
+ | 7z (7-zip, -9) | 483,239 |
+ =====================================================
+
+Note: WIM does even better on directory trees containing duplicate files, which
+the Canterbury corpus doesn't have.
NTFS SUPPORT
dependencies were already included and this section is irrelevant.
* libxml2 (required)
- This is a commonly used free library to read and write XML files. You
- likely already have it installed as a dependency for some other program.
- For more information see http://xmlsoft.org/.
-
-* libfuse (optional but highly recommended)
- Unless configured with --without-fuse, wimlib requires a non-ancient
- version of libfuse to be installed. Most Linux distributions already
- include this, but make sure you have the libfuse package installed, and
- also libfuse-dev if your distribution distributes header files
- separately. FUSE also requires a kernel module. If the kernel module
- is available it will automatically be loaded if you try to mount a WIM
- file. For more information see http://fuse.sourceforge.net/. FUSE is
- also available for FreeBSD.
-
-* libntfs-3g (optional but highly recommended)
- Unless configured with --without-ntfs-3g, wimlib requires the library
- and headers for libntfs-3g version 2011-4-12 or later to be installed.
- Versions dated 2010-3-6 and earlier do not work because they are missing
- the header xattrs.h (and the file xattrs.c, which contains functions we
- need). libntfs-3g version 2013-1-13 is compatible only with wimlib
- 1.2.4 and later.
+ This is a commonly used free library to read and write XML documents.
+ Almost all Linux distributions should include this; however, you may
+ need to install the header files, which might be in a package named
+ "libxml2-dev" or similar. For more information see http://xmlsoft.org/.
+
+* libfuse (optional but recommended)
+ Unless configured --without-fuse, wimlib requires a non-ancient version
+ of libfuse. Most Linux distributions already include this, but make
+ sure you have the libfuse package installed, and also libfuse-dev if
+ your distribution distributes header files separately. FUSE also
+ requires a kernel module. If the kernel module is available it should
+ automatically be loaded if you try to mount a WIM image. For more
+ information see http://fuse.sourceforge.net/.
+
+* libattr (optional but recommended)
+ Unless configured --without-fuse, wimlib also requires libattr. Almost
+ all Linux distributions should include this; however, you may need to
+ install the header files, which might be in a package named "attr-dev",
+ "libattr1-dev", or similar.
+
+* libntfs-3g (optional but recommended)
+ Unless configured --without-ntfs-3g, wimlib requires the library and
+ headers for libntfs-3g version 2011-4-12 or later to be installed.
* OpenSSL / libcrypto (optional)
- wimlib can use the SHA1 message digest code from OpenSSL instead of
- compiling in yet another SHA1 implementation. (See LICENSE section.)
+ wimlib can use the SHA-1 message digest implementation from libcrypto
+ (usually provided by OpenSSL) instead of compiling in yet another SHA-1
+ implementation.
* cdrkit (optional)
* mtools (optional)
--without-ntfs-3g
If libntfs-3g is not available or is not version 2011-4-12 or later,
wimlib can be built without it, in which case it will not be possible to
- apply or capture images directly to/from NTFS volumes.
+ capture or apply WIM images directly from/to NTFS volumes.
---without-fuse
- If libfuse or the FUSE kernel module is not available, wimlib can be
- compiled with --without-fuse. This will remove the ability to mount and
- unmount WIM files.
-
---without-libcrypto
- Build in functions for SHA1 rather than using external SHA1 functions
- from libcrypto (part of OpenSSL). The default is to use libcrypto if it
- is found on the system.
+ The default is --with-ntfs-3g when building for any UNIX-like system,
+ and --without-ntfs-3g when building for Windows.
---disable-multithreaded-compression
- By default, data will be compressed using multiple threads when writing
- a WIM, unless only 1 processor is detected. Specify this option to
- disable support for this.
+--without-fuse
+ The --without-fuse option completely disables support for mounting WIM
+ images. This removes dependencies on libfuse, librt, and libattr. The
+ wimmount, wimmountrw, and wimunmount commands will not work.
---enable-ssse3-sha1
- Use a very fast assembly language implementation of SHA1 from Intel.
- Only use this if the build target supports the SSSE3 instructions.
+ The default is --with-fuse when building for Linux, and --without-fuse
+ otherwise.
---disable-error-messages
- Save some space by removing all error messages from the library.
+--without-libcrypto
+ Build in functions for SHA-1 rather than using external SHA-1 functions
+ from libcrypto (usually provided by OpenSSL).
---disable-assertions
- Remove assertions included by default.
+ The default is to use libcrypto if it is found on your system.
PORTABILITY
-wimlib has primarily been tested on Linux and Windows (primarily Windows 7, but
-also Windows XP and Windows 8).
+wimlib works on both UNIX-like systems (Linux, Mac OS X, FreeBSD, etc.) and
+Windows (XP and later).
+
+As much code as possible is shared among all supported platforms, but there
+necessarily are some differences in what features are supported on each platform
+and how they are implemented. Most notable is that file tree scanning and
+extraction are implemented separately for Windows, UNIX, and UNIX (NTFS-3g
+mode), to ensure a fast and feature-rich implementation of each platform/mode.
-wimlib may work on FreeBSD and Mac OS X. However, this is not well tested. If
-you do not have libntfs-3g 2011-4-12 or later available, you must configure
-wimlib with --without-ntfs-3g. On FreeBSD, before mounting a WIM you need to
-load the POSIX message queue module (run `kldload mqueuefs').
+wimlib is mainly used on x86 and x86_64 CPUs, but it should also work on a
+number of other GCC-supported 32-bit or 64-bit architectures. It has been
+tested on the ARM architecture.
-The code has primarily been tested on x86 and x86_64 CPUs, but it's written to
-be portable to other architectures and I've also tested it on ARM. However,
-although the code is written to correctly deal with endianness, it has not yet
-actually been tested on a big-endian architecture.
+Currently, gcc and clang are the only supported compilers. A few nonstandard
+extensions are used in the code.
REFERENCES
The WIM file format is partially specified in a document that can be found in
the Microsoft Download Center. However, this document really only provides an
-overview of the format and is not a formal specification.
+overview of the format and is not a formal specification. It also does not
+cover later extensions of the format, such as solid resources.
With regards to the supported compression formats:
- Microsoft has official documentation for XPRESS that is of reasonable quality.
-- Microsoft has official documentation for LZX but it contains errors.
+- Microsoft has official documentation for LZX, but in two different documents,
+ neither of which is completely applicable to its use in the WIM format, and
+ the first of which contains multiple errors.
- There does not seem to be any official documentation for LZMS, so my comments
- and code in src/lzms-decompress.c may in fact be the best documentation
+ and code in src/lzms_decompress.c may in fact be the best documentation
available for this particular compression format.
+The algorithms used by wimlib's compression and decompression codecs are
+inspired by a variety of sources, including open source projects and computer
+science papers.
+
The code in ntfs-3g_apply.c and ntfs-3g_capture.c uses the NTFS-3g library,
which is a library for reading and writing to NTFS filesystems (the filesystem
used by recent versions of Windows). See
http://www.tuxera.com/community/ntfs-3g-download/ for more information.
-The LZX decompressor (lzx-decompress.c) was originally based on code from the
-cabextract project (http://www.cabextract.org.uk). The LZX compressor
-(lzx-compress.c) was originally based on code written by Matthew Russotto
-(www.russotto.net/chm/). However I have since rewritten and made many
-improvements to both the decompressor and compressor.
-
-lz_hash.c contains LZ77 match-finding code that uses hash chains. It is based
-on code from zlib but I have since rewritten it.
-
-lz_bt.c contains LZ77 match-finding code that uses binary trees. It is based on
-code from liblzma but I have since rewritten it.
-
A limited number of other free programs can handle some parts of the WIM
file format:
other archive formats). However, wimlib is designed specifically to handle
WIM files and provides features previously only available in Microsoft's
implementation, such as the ability to mount WIMs read-write as well as
- read-only, the ability to create compressed WIMs, and the correct handling
- of security descriptors and hard links.
+ read-only, the ability to create compressed WIMs, the correct handling of
+ security descriptors and hard links, support for LZMS compression, and
+ support for solid archives.
* ImagePyX (https://github.com/maxpat78/ImagePyX) is a Python program that
provides similar capabilities to wimlib-imagex. One thing to note, though,
is that it does not support compression and decompression by itself, but
wimlib comes with no warranty whatsoever. Please submit a bug report (to
ebiggers3@gmail.com) if you find a bug in wimlib and/or wimlib-imagex.
-
-Be aware that some parts of the WIM file format are poorly documented or even
-completely undocumented, so I've just had to do the best I can to read and write
-WIMs that appear to be compatible with Microsoft's software.