Eric Biggers [Tue, 24 Jun 2014 01:04:57 +0000 (20:04 -0500)]
win32_apply.c: Don't use BEGIN_STREAM_STATUS_SKIP_STREAM
This doesn't work correctly when extracting the stream from a pipe or a
solid block. Just read the data and don't do anything with it --- at
least this double checks that it's actually valid.
Eric Biggers [Mon, 16 Jun 2014 02:49:27 +0000 (21:49 -0500)]
sha1-ssse3.asm: Fix building on Windows
For some reason the Intel original doesn't actually build for Windows
because it requests too high alignment per section. It should be
sufficient to retain the alignment directives in the code itself.
Eric Biggers [Sun, 15 Jun 2014 16:34:57 +0000 (11:34 -0500)]
Re-visit SHA-1 code
- Fixed build failures when configured with --enable-ssse3-sha1.
- Actually calculate the message digest correctly in the SSSE3-optimized
version! The Intel code just does block transformations, not arbitrary
updates; the previous code did not reflect this.
- Use an appropriate fallback when the CPU does not support SSSE3
instructions; don't just call abort()!
- Improve sha1_update() and sha1_final(). They should now be slightly
faster, as well as easier to understand.
- Use beXX_to_cpu() and cpu_to_beXX() macros instead of hard-coding
endian conversions.
Eric Biggers [Sat, 14 Jun 2014 04:46:58 +0000 (23:46 -0500)]
finish_write(): Don't use old integrity table if already overwritten
When updating a WIM in-place without modifying the lookup table, the new
XML data will spill into the old integrity table if the new XML data is
longer than the old XML data. Temporarily fix this by not using the old
integrity table in this case.
Eric Biggers [Sat, 14 Jun 2014 04:41:22 +0000 (23:41 -0500)]
dentry.c: Cast name length u16 => u32 whenever adding 2
Due to integer promotion this won't make a difference if an 'int' is 4+
bytes anyway, but make the intention clear: this computation should not
overflow.
Eric Biggers [Sat, 14 Jun 2014 04:33:08 +0000 (23:33 -0500)]
make_canonical_huffman_code(): Stricter validation of max_codeword_len
max_codeword_len must be long enough to give a distinct codeword to each
symbol. As we also check that num_syms >= 2, use this check instead of
max_codeword_len > 0.
Eric Biggers [Thu, 12 Jun 2014 04:15:40 +0000 (23:15 -0500)]
Add aligned malloc and free
Don't make dependent on OS support, since that would break the custom
memory allocation functions and also would need to be different between
UNIX and Windows anyway.
Eric Biggers [Wed, 11 Jun 2014 01:55:10 +0000 (20:55 -0500)]
Don't exclude out-of-tree absolute symlinks in reparse point fix mode
Excluding such links may not be the expected behavior. In addition,
Microsoft's documentation for ImageX seems to be incorrect when it states
that ImageX excludes such links. Actually, it does not. So, for the
sake of consistency and also doing something that may more sense anyway,
just retain such links, but leave their targets as-is.
Eric Biggers [Sat, 7 Jun 2014 21:46:39 +0000 (16:46 -0500)]
lzx-compress.c: Honor cache_limit
This ensures the match cache is never overrun. If for some reason we
average more than LZX_CACHE_PER_POS (currently 8) matches per position,
_excluding skipped positions_, just don't return any more matches.
Eric Biggers [Mon, 2 Jun 2014 03:01:14 +0000 (22:01 -0500)]
Switch from suffix array match-finder to binary tree match-finder
This uses less memory (8 bytes overhead per position vs. 14), is faster,
requires less code (no libdivsufsort), and in some tests actually results
in a better compression ratio.
A binary tree match-finder was used in wimlib v1.5.2 but it didn't seem
as good then, probably because it was combined with a slow block division
algorithm for LZX.
Repeat offsets are now handled differently. The binary tree match-finder
itself doesn't have any logic for repeat offsets at all; instead, the
match-choosing code (now implemented separately for LZX and LZMS) now
does special checks for matches at repeat offsets.
Since less memory is used for match-finding now, increase the default
LZMS pack chunk size from 2^25 (33554432) to 2^26 (67108864).
Eric Biggers [Sat, 31 May 2014 15:12:28 +0000 (10:12 -0500)]
inode_fixup.c: Fix check for directory hard links
We shouldn't assume that the attributes are consistent, so we should
check both ways for directory hard links. Specifically, in the case
where the being-inserted dentry is not marked as a directory but for some
reason it shares an inode number with a dentry marked as a directory, we
want to detect that as a directory hard link.
Eric Biggers [Thu, 29 May 2014 05:42:54 +0000 (00:42 -0500)]
Optimize Huffman code generation
This commit significantly improves the performance of length-limited
canonical Huffman code generation by introducing several optimizations
based on the 7-Zip implementation.
Altlhough Huffman code generation is not the main bottleneck of any of
the compression algorithms implemented here, an optimized implementation
can still improve overall performance by several percent. In addition,
it significantly speeds up LZMS decompression, which requires frequent
rebuilding of adaptive Huffman codes.
Some peformance comparisons:
- The average time taken to generate all three LZX codes (main,
length, and aligned offset) when compressing a Windows PE
filesystem with LZX decreased from 73.9 us to 17.3 us.
- The time taken to compress a Windows PE filesystem with LZX, using
the "fast" LZX algorithm (hash chain match finder and lazy
parsing), decreased from 12.3 s to 11.6 s (a 5.7% improvement).
- The time taken to decompress a Windows PE filesystem with LZMS,
using solid blocks, decreased from 12.3 s to 9.0 s (a 27%
improvement).
If the position footer is unconditionally calculated as the match offset
minus the position base value, the (ultimately unused) position footer
for repeat matches can overflow the number of bits in which it is stored
in the intermediate representation used by this implementation. For now,
use the old version, which would set the position footers of repeat
matches to 0.
Eric Biggers [Tue, 27 May 2014 21:51:24 +0000 (16:51 -0500)]
Faster Huffman symbol decoding
When decoding a codeword short enough for a direct mapping, we can read
the codeword length at the same time we read the symbol itself. This
speeds up Huffman decoding slightly.
This commit also updates and improves the comments for
make_huffman_decode_table().