- * LZX is a LZ77 and Huffman-code based compression format that has many
- * similarities to the DEFLATE format used in zlib. The compression ratio is as
- * good or better than DEFLATE. However, in WIM files only up to 32768 bytes of
- * data can ever compressed be in the same LZX block, so a .tar.gz file could
- * potentially be smaller than a WIM file that uses LZX compression because it
- * can use a larger LZ77 window size.
- *
- * Some notes on the LZX compression format as used in Windows Imaging (WIM)
- * files:
- *
- * A compressed WIM resource consists of a table of chunk offsets followed by
- * the compressed chunks themselves. All compressed chunks except possibly the
- * last decompress to WIM_CHUNK_SIZE (= 32768) bytes. This is quite similar to
- * the cabinet (.cab) file format, but they are not the same. According to the
- * cabinet format documentation, the LZX block size is independent from the
- * CFDATA blocks, and a LZX block may span several CFDATA blocks. However, in
- * WIMs, LZX blocks do not appear to ever span multiple WIM chunks. Note that
- * this means any WIM chunk may be decompressed or compressed independently from
- * any other chunk, which is convenient.
- *
- * A LZX compressed WIM chunk contains one or more LZX blocks of the aligned,
- * verbatim, or uncompressed block types. For aligned and verbatim blocks, the
- * size of the block in uncompressed bytes is specified by a bit following the 3
- * bits that specify the block type, possibly followed by an additional 16 bits.
- * '1' means to use the default block size (equal to 32768, the size of a WIM
- * chunk--- and this seems to only be valid for the first LZX block in a WIM
- * chunk), while '0' means that the block size is provided by the next 16 bits.
- *
- * The cabinet format, as documented, allows for the possibility that a
- * compressed CFDATA chunk is up to 6144 bytes larger than the data it
- * uncompresses to. However, in the WIM format it appears that every chunk that
- * would be 32768 bytes or more when compressed is actually stored fully
- * uncompressed.
- *
- * The 'e8' preprocessing step that changes x86 call instructions to use
- * absolute offsets instead of relative offsets relies on a filesize parameter.
- * There is no such parameter for this in the WIM files (even though the size of
- * the file resource could be used for this purpose), and instead a magic file
- * size of 12000000 is used. The 'e8' preprocessing is always done, and there
- * is no bit to indicate whether it is done or not.
+ * LZX is an LZ77 and Huffman-code based compression format that has many
+ * similarities to DEFLATE (the format used by zlib/gzip). The compression
+ * ratio is as good or better than DEFLATE. See lzx-compress.c for a format
+ * overview, and see https://en.wikipedia.org/wiki/LZX_(algorithm) for a
+ * historical overview. Here I make some pragmatic notes.
+ *
+ * The old specification for LZX is the document "Microsoft LZX Data Compression
+ * Format" (1997). It defines the LZX format as used in cabinet files. Allowed
+ * window sizes are 2^n where 15 <= n <= 21. However, this document contains
+ * several errors, so don't read too much into it...
+ *
+ * The new specification for LZX is the document "[MS-PATCH]: LZX DELTA
+ * Compression and Decompression" (2014). It defines the LZX format as used by
+ * Microsoft's binary patcher. It corrects several errors in the 1997 document
+ * and extends the format in several ways --- namely, optional reference data,
+ * up to 2^25 byte windows, and longer match lengths.
+ *
+ * WIM files use a more restricted form of LZX. No LZX DELTA extensions are
+ * present, the window is not "sliding", E8 preprocessing is done
+ * unconditionally with a fixed file size, and the maximum window size is always
+ * 2^15 bytes (equal to the size of each "chunk" in a compressed WIM resource).
+ * This code is primarily intended to implement this form of LZX. But although
+ * not compatible with WIMGAPI, this code also supports maximum window sizes up
+ * to 2^21 bytes.
+ *
+ * TODO: Add support for window sizes up to 2^25 bytes.