Eric Biggers [Thu, 3 Jul 2014 23:25:13 +0000 (18:25 -0500)]
Place common decompression/compression code in public domain
Not much except Huffman coding in these files anymore, and that should be
completely free especially since it's been over 60 years since it was
invented...
Eric Biggers [Tue, 24 Jun 2014 01:04:57 +0000 (20:04 -0500)]
win32_apply.c: Don't use BEGIN_STREAM_STATUS_SKIP_STREAM
This doesn't work correctly when extracting the stream from a pipe or a
solid block. Just read the data and don't do anything with it --- at
least this double checks that it's actually valid.
Eric Biggers [Mon, 16 Jun 2014 02:49:27 +0000 (21:49 -0500)]
sha1-ssse3.asm: Fix building on Windows
For some reason the Intel original doesn't actually build for Windows
because it requests too high alignment per section. It should be
sufficient to retain the alignment directives in the code itself.
Eric Biggers [Sun, 15 Jun 2014 16:34:57 +0000 (11:34 -0500)]
Re-visit SHA-1 code
- Fixed build failures when configured with --enable-ssse3-sha1.
- Actually calculate the message digest correctly in the SSSE3-optimized
version! The Intel code just does block transformations, not arbitrary
updates; the previous code did not reflect this.
- Use an appropriate fallback when the CPU does not support SSSE3
instructions; don't just call abort()!
- Improve sha1_update() and sha1_final(). They should now be slightly
faster, as well as easier to understand.
- Use beXX_to_cpu() and cpu_to_beXX() macros instead of hard-coding
endian conversions.
Eric Biggers [Sat, 14 Jun 2014 04:46:58 +0000 (23:46 -0500)]
finish_write(): Don't use old integrity table if already overwritten
When updating a WIM in-place without modifying the lookup table, the new
XML data will spill into the old integrity table if the new XML data is
longer than the old XML data. Temporarily fix this by not using the old
integrity table in this case.
Eric Biggers [Sat, 14 Jun 2014 04:41:22 +0000 (23:41 -0500)]
dentry.c: Cast name length u16 => u32 whenever adding 2
Due to integer promotion this won't make a difference if an 'int' is 4+
bytes anyway, but make the intention clear: this computation should not
overflow.
Eric Biggers [Sat, 14 Jun 2014 04:33:08 +0000 (23:33 -0500)]
make_canonical_huffman_code(): Stricter validation of max_codeword_len
max_codeword_len must be long enough to give a distinct codeword to each
symbol. As we also check that num_syms >= 2, use this check instead of
max_codeword_len > 0.
Eric Biggers [Thu, 12 Jun 2014 04:15:40 +0000 (23:15 -0500)]
Add aligned malloc and free
Don't make dependent on OS support, since that would break the custom
memory allocation functions and also would need to be different between
UNIX and Windows anyway.
Eric Biggers [Wed, 11 Jun 2014 01:55:10 +0000 (20:55 -0500)]
Don't exclude out-of-tree absolute symlinks in reparse point fix mode
Excluding such links may not be the expected behavior. In addition,
Microsoft's documentation for ImageX seems to be incorrect when it states
that ImageX excludes such links. Actually, it does not. So, for the
sake of consistency and also doing something that may more sense anyway,
just retain such links, but leave their targets as-is.
Eric Biggers [Sat, 7 Jun 2014 21:46:39 +0000 (16:46 -0500)]
lzx-compress.c: Honor cache_limit
This ensures the match cache is never overrun. If for some reason we
average more than LZX_CACHE_PER_POS (currently 8) matches per position,
_excluding skipped positions_, just don't return any more matches.
Eric Biggers [Mon, 2 Jun 2014 03:01:14 +0000 (22:01 -0500)]
Switch from suffix array match-finder to binary tree match-finder
This uses less memory (8 bytes overhead per position vs. 14), is faster,
requires less code (no libdivsufsort), and in some tests actually results
in a better compression ratio.
A binary tree match-finder was used in wimlib v1.5.2 but it didn't seem
as good then, probably because it was combined with a slow block division
algorithm for LZX.
Repeat offsets are now handled differently. The binary tree match-finder
itself doesn't have any logic for repeat offsets at all; instead, the
match-choosing code (now implemented separately for LZX and LZMS) now
does special checks for matches at repeat offsets.
Since less memory is used for match-finding now, increase the default
LZMS pack chunk size from 2^25 (33554432) to 2^26 (67108864).