X-Git-Url: https://wimlib.net/git/?p=wimlib;a=blobdiff_plain;f=README;h=86cde18751c810092d56205b0d92237d358e3e24;hp=5173583f7cba782554a853a9783f5b89db5dadbc;hb=ba077a8185be13cf296043a828c20e08d5407af7;hpb=70297d04c28e8134366b359076bb1f3d73c667c2 diff --git a/README b/README index 5173583f..86cde187 100644 --- a/README +++ b/README @@ -1,17 +1,16 @@ INTRODUCTION -This is wimlib version 1.7.0-BETA (June 2014). wimlib is a C library for +This is wimlib version 1.7.4-BETA (December 2014). wimlib is a C library for creating, modifying, extracting, and mounting files in the Windows Imaging -Format (WIM files). These files are normally created using the ImageX -(imagex.exe) or Dism (Dism.exe) utilities on Windows, but wimlib is distributed -with a free implementation of ImageX called "wimlib-imagex" for both UNIX-like -systems and Windows. +Format (WIM files). wimlib and its command-line frontend 'wimlib-imagex' +provide a free and cross-platform alternative to Microsoft's WIMGAPI, ImageX, +and DISM. INSTALLATION -To install wimlib and wimlib-imagex on Windows you simply need to download and -extract the ZIP file containing the latest binaries from the SourceForge page -(http://sourceforge.net/projects/wimlib/), which you may have already done. +To install wimlib and wimlib-imagex on Windows, simply download and extract the +ZIP file containing the latest binaries from the SourceForge page +(http://sourceforge.net/projects/wimlib/). You probably have already done this! To install wimlib and wimlib-imagex on UNIX-like systems (with Linux being the primary supported and tested platform), you must compile the source code, which @@ -65,102 +64,117 @@ commands and their syntax. For additional documentation: COMPRESSION RATIO -wimlib (and wimlib-imagex) can create XPRESS, LZX, or LZMS compressed WIM -archives. wimlib includes its own compression codecs and does not use the -compression API available on some versions of Windows. The below table provides -the results (file size, in bytes, and time to create, in seconds) of capturing a -WIM containing an x86 Windows PE image, using various compression types and -options. When applicable, the results with the equivalent Microsoft -implementation in WIMGAPI, which is the library used by ImageX and Dism, are -included. - - =========================================================================== - | Compression type || wimlib (v1.6.1) | WIMGAPI (Windows 8) | - =========================================================================== - | None [1] || 531,979,435 in 18s | 531,980,333 in 24s | - | XPRESS [2] || 207,369,912 in 22s | 209,886,010 in 39s | - | LZX (quick) [3] || 194,876,901 in 29s | N/A | - | LZX (normal) [4] || 187,962,713 in 158s | 188,163,523 in 125s | - | LZX (slow) [5] || 186,913,423 in 358s | N/A | - | LZMS (non-solid) [6] || 176,880,594 in 182s | N/A | - | LZMS (solid) [7] || 136,507,304 in 494s | 126,735,608 in 623s | - =========================================================================== +wimlib (and wimlib-imagex) can create XPRESS, LZX, or LZMS compressed WIM files. +wimlib includes its own compression codecs and does not use the compression API +available on some versions of Windows. + +I have gradually been improving the compression codecs in wimlib. For XPRESS +and LZX, they now usually outperform and outcompress the equivalent Microsoft +implementations. Although results will vary depending on the data being +compressed, in the table below I present the results for a common use case: +compressing an x86 Windows PE image. Each row displays the compression type, +the size of the resulting WIM file in bytes, and how many seconds it took to +create the file. When applicable, the results with the equivalent Microsoft +implementation in WIMGAPI is included. + + ============================================================================= + | Compression || wimlib (v1.7.4) | WIMGAPI (Windows 8.1) | + ============================================================================= + | None [1] || 361,314,224 in 2.4s | 361,315,338 in 4.5s | + | XPRESS [2] || 138,218,750 in 3.0s | 140,457,436 in 6.0s | + | XPRESS (slow) [3] || 135,173,511 in 8.9s | N/A | + | LZX (quick) [4] || 130,332,007 in 4.1s | N/A | + | LZX (normal) [5] || 126,714,807 in 12.5s | 127,293,240 in 19.2s | + | LZX (slow) [6] || 126,150,743 in 20.5s | N/A | + | LZMS (non-solid) [7] || 121,909,792 in 11.9s | N/A | + | LZMS (solid) [8] || 93,650,936 in 45.0s | 88,771,192 in 109.2 | + | "WIMBoot" [9] || 167,023,719 in 3.5s | 169,109,211 in 10.4s | + | "WIMBoot" (slow) [10] || 165,027,583 in 7.9s | N/A | + ============================================================================= Notes: - [1] '--compress=none' for wimlib-imagex; - '/compress none' or no option for ImageX. - - [2] '--compress=fast' or '--compress=XPRESS' for wimlib-imagex; - '/compress fast' or no option for ImageX. - Compression chunk size is 32768 (the default for XPRESS). - - [3] No compression option specified to wimlib-imagex; no known equivalent for - WIMGAPI (ImageX uses XPRESS compression if no option specified). - Compression chunk size is 32768 (the default for LZX). - - [4] '--compress=maximum' or '--compress=LZX' for wimlib-imagex; - '/compress maximum' for ImageX. - Compression chunk size is 32768 (the default for LZX). - - [5] '--compress=maximum --compress-slow' for wimlib-imagex; - no known equivalent for WIMGAPI. - Compression chunk size is 32768 (the default for LZX). - - [6] '--compress=recovery' or '--compress=LZMS' for wimlib-imagex; - no known way to create the equivalent with WIMGAPI. - Compression chunk size is 131072 (the default for LZMS). Note: this - compression type is not generally recommended due to its limited - compatibility with the MS implementations. - - [7] '--compress=recovery --solid' or '--compress=LZMS --solid' for - wimlib-imagex; WIMCreateFile with WIM_COMPRESSION_LZMS and flag - 0x20000000 for WIMGAPI. Compression chunk size in packed resources is - 33554432 for wimlib, 67108864 for WIMGAPI. Note: this compression type - is not generally recommended due to its limited compatibility with the MS - implementations. Also, due to the large chunk size, wimlib uses about - 500MB of memory per thread when compressing in this format. - -The above timings were done on Windows 8 (x86) so that side-by-side comparisons -with the Microsoft implementation would be possible; however, wimlib may have -even better performance on other operating systems such as Linux. The system -had 2 CPUs and 2 GiB of memory available. All times were done with the page -cache warmed, so the times primarily measure the performance of the compression -algorithms and not the time to read data from disk, which presumably is similar -in each implementation. - -Below are results for compressing the Canterbury corpus using wimlib (v1.6.1), -WIMGAPI (Windows 8), and some other formats/programs, including the archive size -only. Note that the Canterbury corpus includes no duplicate files or hard -links, which WIM handles better than most other formats by storing only distinct -data streams. - - ================================================= - | Format | Size (bytes) | - ================================================= - | tar | 2,826,240 | - | WIM (WIMGAPI, None) | 2,814,278 | - | WIM (wimlib, None) | 2,813,856 | - | WIM (WIMGAPI, XPRESS) | 825,410 | - | WIM (wimlib, XPRESS) | 792,024 | - | tar.gz (gzip, default) | 738,796 | - | ZIP (Info-ZIP, default) | 735,334 | - | tar.gz (gzip, -9) | 733,971 | - | ZIP (Info-ZIP, -9) | 732,297 | - | WIM (wimlib, LZX quick) | 722,196 | - | WIM (WIMGAPI, LZX) | 651,766 | - | WIM (wimlib, LZX normal) | 639,464 | - | WIM (wimlib, LZX slow) | 633,144 | - | WIM (wimlib, LZMS non-solid) | 590,252 | - | tar.bz2 (bzip, default) | 565,008 | - | tar.bz2 (bzip, -9) | 565,008 | - | WIM (wimlib, LZMS solid) | 534,218 | - | WIM (wimlib, LZMS solid, slow) | 529,904 | - | WIM (WIMGAPI, LZMS solid) | 521,232 | - | tar.xz (xz, default) | 486,916 | - | tar.xz (xz, -9) | 486,904 | - | 7z (7-zip, default) | 484,700 | - | 7z (7-zip, -9) | 483,239 | - ================================================= + [1] '--compress=none' for wimlib-imagex; '/compress:none' for DISM. + + [2] '--compress=XPRESS' for wimlib-imagex; '/compress:fast' for DISM. + Compression chunk size defaults to 32768 bytes in both cases. + + [3] '--compress=XPRESS:80' for wimlib-imagex; no known equivalent for DISM. + Compression chunk size defaults to 32768 bytes. + + [4] '--compress=LZX:20' for wimlib-imagex; no known equivalent for DISM. + Compression chunk size defaults to 32768 bytes. + + [5] '--compress=LZX' or '--compress=LZX:50' or no option for wimlib-imagex; + '/compress:maximum' for DISM. + Compression chunk size defaults to 32768 bytes in both cases. + + [6] '--compress=LZX:100' for wimlib-imagex; no known equivalent for DISM. + Compression chunk size defaults to 32768 bytes. + + [7] '--compress=LZMS' for wimlib-imagex; no known equivalent for DISM. + Compression chunk size defaults to 131072 bytes. + + [8] '--solid' for wimlib-imagex. Should be '/compress:recovery' for DISM, + but only works for /Export-Image, not /Capture-Image. Compression chunk + size in solid blocks defaults to 33554432 for wimlib, 67108864 for DISM. + + [9] '--wimboot' for wimlib-imagex; '/wimboot' for DISM. + This is really XPRESS compression with 4096 byte chunks, so the same as + '--compress=XPRESS --chunk-size=4096'. + + [10] '--wimboot --compress=XPRESS:80' for wimlib-imagex; + no known equivalent for DISM. + Same format as [9], but trying harder to get a good compression ratio. + +Note: wimlib-imagex's --compress option also accepts the "fast", "maximum", and +"recovery" aliases for XPRESS, LZX, and LZMS, respectively. + +Testing environment: + + - 64 bit binaries + - Windows 8.1 virtual machine running on Linux with VT-x + - 4 CPUs and 4 GiB memory given to virtual machine + - SSD-backed virtual disk + - All tests done with page cache warmed + +The compression ratio provided by wimlib is also competitive with commonly used +archive formats. Below are file sizes that result when the Canterbury corpus is +compressed with wimlib (v1.7.2), WIMGAPI (Windows 8.1), and some other +formats/programs: + + ===================================================== + | Format | Size (bytes) | + ===================================================== + | tar | 2,826,240 | + | WIM (WIMGAPI, None) | 2,814,254 | + | WIM (wimlib, None) | 2,814,216 | + | WIM (WIMGAPI, XPRESS) | 825,536 | + | WIM (wimlib, XPRESS) | 790,016 | + | tar.gz (gzip, default) | 738,796 | + | ZIP (Info-ZIP, default) | 735,334 | + | tar.gz (gzip, -9) | 733,971 | + | ZIP (Info-ZIP, -9) | 732,297 | + | WIM (wimlib, LZX quick) | 704,006 | + | WIM (WIMGAPI, LZX) | 651,866 | + | WIM (wimlib, LZX normal) | 632,614 | + | WIM (wimlib, LZX slow) | 625,050 | + | WIM (wimlib, LZMS non-solid) | 581,960 | + | tar.bz2 (bzip, default) | 565,008 | + | tar.bz2 (bzip, -9) | 565,008 | + | WIM (wimlib, LZX solid) | 532,700 | + | WIM (wimlib, LZMS solid) | 525,990 | + | WIM (wimlib, LZX solid, slow) | 525,140 | + | WIM (wimlib, LZMS solid, slow) | 523,728 | + | WIM (WIMGAPI, LZMS solid) | 521,366 | + | WIM (wimlib, LZX solid, very slow) | 520,832 | + | tar.xz (xz, default) | 486,916 | + | tar.xz (xz, -9) | 486,904 | + | 7z (7-zip, default) | 484,700 | + | 7z (7-zip, -9) | 483,239 | + ===================================================== + +Note: WIM does even better on directory trees containing duplicate files, which +the Canterbury corpus doesn't have. NTFS SUPPORT @@ -221,31 +235,34 @@ downloaded the Windows binary distribution of wimlib and wimlib-imagex then all dependencies were already included and this section is irrelevant. * libxml2 (required) - This is a commonly used free library to read and write XML files. You - likely already have it installed as a dependency for some other program. - For more information see http://xmlsoft.org/. - -* libfuse (optional but highly recommended) - Unless configured with --without-fuse, wimlib requires a non-ancient - version of libfuse to be installed. Most Linux distributions already - include this, but make sure you have the libfuse package installed, and - also libfuse-dev if your distribution distributes header files - separately. FUSE also requires a kernel module. If the kernel module - is available it will automatically be loaded if you try to mount a WIM - file. For more information see http://fuse.sourceforge.net/. FUSE is - also available for FreeBSD. - -* libntfs-3g (optional but highly recommended) - Unless configured with --without-ntfs-3g, wimlib requires the library - and headers for libntfs-3g version 2011-4-12 or later to be installed. - Versions dated 2010-3-6 and earlier do not work because they are missing - the header xattrs.h (and the file xattrs.c, which contains functions we - need). libntfs-3g version 2013-1-13 is compatible only with wimlib - 1.2.4 and later. + This is a commonly used free library to read and write XML documents. + Almost all Linux distributions should include this; however, you may + need to install the header files, which might be in a package named + "libxml2-dev" or similar. For more information see http://xmlsoft.org/. + +* libfuse (optional but recommended) + Unless configured --without-fuse, wimlib requires a non-ancient version + of libfuse. Most Linux distributions already include this, but make + sure you have the libfuse package installed, and also libfuse-dev if + your distribution distributes header files separately. FUSE also + requires a kernel module. If the kernel module is available it should + automatically be loaded if you try to mount a WIM image. For more + information see http://fuse.sourceforge.net/. + +* libattr (optional but recommended) + Unless configured --without-fuse, wimlib also requires libattr. Almost + all Linux distributions should include this; however, you may need to + install the header files, which might be in a package named "attr-dev", + "libattr1-dev", or similar. + +* libntfs-3g (optional but recommended) + Unless configured --without-ntfs-3g, wimlib requires the library and + headers for libntfs-3g version 2011-4-12 or later to be installed. * OpenSSL / libcrypto (optional) - wimlib can use the SHA1 message digest code from OpenSSL instead of - compiling in yet another SHA1 implementation. (See LICENSE section.) + wimlib can use the SHA-1 message digest implementation from libcrypto + (usually provided by OpenSSL) instead of compiling in yet another SHA-1 + implementation. * cdrkit (optional) * mtools (optional) @@ -268,79 +285,69 @@ This section documents the most important options that may be passed to the --without-ntfs-3g If libntfs-3g is not available or is not version 2011-4-12 or later, wimlib can be built without it, in which case it will not be possible to - apply or capture images directly to/from NTFS volumes. + capture or apply WIM images directly from/to NTFS volumes. ---without-fuse - If libfuse or the FUSE kernel module is not available, wimlib can be - compiled with --without-fuse. This will remove the ability to mount and - unmount WIM files. - ---without-libcrypto - Build in functions for SHA1 rather than using external SHA1 functions - from libcrypto (part of OpenSSL). The default is to use libcrypto if it - is found on the system. + The default is --with-ntfs-3g when building for any UNIX-like system, + and --without-ntfs-3g when building for Windows. ---disable-multithreaded-compression - By default, data will be compressed using multiple threads when writing - a WIM, unless only 1 processor is detected. Specify this option to - disable support for this. +--without-fuse + The --without-fuse option completely disables support for mounting WIM + images. This removes dependencies on libfuse, librt, and libattr. The + wimmount, wimmountrw, and wimunmount commands will not work. ---enable-ssse3-sha1 - Use a very fast assembly language implementation of SHA1 from Intel. - Only use this if the build target supports the SSSE3 instructions. + The default is --with-fuse when building for Linux, and --without-fuse + otherwise. ---disable-error-messages - Save some space by removing all error messages from the library. +--without-libcrypto + Build in functions for SHA-1 rather than using external SHA-1 functions + from libcrypto (usually provided by OpenSSL). ---disable-assertions - Remove assertions included by default. + The default is to use libcrypto if it is found on your system. PORTABILITY -wimlib has primarily been tested on Linux and Windows (primarily Windows 7, but -also Windows XP and Windows 8). +wimlib works on both UNIX-like systems (Linux, Mac OS X, FreeBSD, etc.) and +Windows (XP and later). + +As much code as possible is shared among all supported platforms, but there +necessarily are some differences in what features are supported on each platform +and how they are implemented. Most notable is that file tree scanning and +extraction are implemented separately for Windows, UNIX, and UNIX (NTFS-3g +mode), to ensure a fast and feature-rich implementation of each platform/mode. -wimlib may work on FreeBSD and Mac OS X. However, this is not well tested. If -you do not have libntfs-3g 2011-4-12 or later available, you must configure -wimlib with --without-ntfs-3g. On FreeBSD, before mounting a WIM you need to -load the POSIX message queue module (run `kldload mqueuefs'). +wimlib is mainly used on x86 and x86_64 CPUs, but it should also work on a +number of other GCC-supported 32-bit or 64-bit architectures. It has been +tested on the ARM architecture. -The code has primarily been tested on x86 and x86_64 CPUs, but it's written to -be portable to other architectures and I've also tested it on ARM. However, -although the code is written to correctly deal with endianness, it has not yet -actually been tested on a big-endian architecture. +Currently, gcc and clang are the only supported compilers. A few nonstandard +extensions are used in the code. REFERENCES The WIM file format is partially specified in a document that can be found in the Microsoft Download Center. However, this document really only provides an -overview of the format and is not a formal specification. +overview of the format and is not a formal specification. It also does not +cover later extensions of the format, such as solid blocks. With regards to the supported compression formats: - Microsoft has official documentation for XPRESS that is of reasonable quality. -- Microsoft has official documentation for LZX but it contains errors. +- Microsoft has official documentation for LZX, but in two different documents, + neither of which is completely applicable to its use in the WIM format, and + the first of which contains multiple errors. - There does not seem to be any official documentation for LZMS, so my comments - and code in src/lzms-decompress.c may in fact be the best documentation + and code in src/lzms_decompress.c may in fact be the best documentation available for this particular compression format. +The algorithms used by wimlib's compression and decompression codecs are +inspired by a variety of sources, including open source projects and computer +science papers. + The code in ntfs-3g_apply.c and ntfs-3g_capture.c uses the NTFS-3g library, which is a library for reading and writing to NTFS filesystems (the filesystem used by recent versions of Windows). See http://www.tuxera.com/community/ntfs-3g-download/ for more information. -The LZX decompressor (lzx-decompress.c) was originally based on code from the -cabextract project (http://www.cabextract.org.uk). The LZX compressor -(lzx-compress.c) was originally based on code written by Matthew Russotto -(www.russotto.net/chm/). However I have since rewritten and made many -improvements to both the decompressor and compressor. - -lz_hash.c contains LZ77 match-finding code that uses hash chains. It is based -on code from zlib but I have since rewritten it. - -lz_bt.c contains LZ77 match-finding code that uses binary trees. It is based on -code from liblzma but I have since rewritten it. - A limited number of other free programs can handle some parts of the WIM file format: @@ -348,8 +355,9 @@ file format: other archive formats). However, wimlib is designed specifically to handle WIM files and provides features previously only available in Microsoft's implementation, such as the ability to mount WIMs read-write as well as - read-only, the ability to create compressed WIMs, and the correct handling - of security descriptors and hard links. + read-only, the ability to create compressed WIMs, the correct handling of + security descriptors and hard links, support for LZMS compression, and + support for solid archives. * ImagePyX (https://github.com/maxpat78/ImagePyX) is a Python program that provides similar capabilities to wimlib-imagex. One thing to note, though, is that it does not support compression and decompression by itself, but @@ -377,7 +385,3 @@ functionality. wimlib comes with no warranty whatsoever. Please submit a bug report (to ebiggers3@gmail.com) if you find a bug in wimlib and/or wimlib-imagex. - -Be aware that some parts of the WIM file format are poorly documented or even -completely undocumented, so I've just had to do the best I can to read and write -WIMs that appear to be compatible with Microsoft's software.