Eric Biggers [Sat, 2 Aug 2014 00:52:40 +0000 (19:52 -0500)]
win32_capture.c: Ignore unnamed data stream of reparse point
This data will not be archived; only the reparse data will. So there's
no need to create a 'struct wim_lookup_table_entry' for it.
If a reparse point had an unnamed data stream, the assertion introduced
in commit 321750354891a0968ca0c3664417eae73b9414aa would be triggered.
This is just as well, because this case wasn't being handled in the most
logical way.
These are essentially updated versions of
WIMLIB_PROGRESS_MSG_EXTRACT_{DIR_STRUCTURE,TIMESTAMPS} which were removed
in v1.7.0. The new versions will file periodically, so they can be used
as cancellation points where the library user provides
WIMLIB_PROGRESS_STATUS_ABORT.
Eric Biggers [Fri, 25 Jul 2014 04:35:44 +0000 (23:35 -0500)]
lz_hash_chains.c: Hand-inline do_search()
On x86_64 this improves performance slightly (XPRESS compression ~3%
faster). Probably a lot of the difference is due to redundant checks
against 'max_len' getting removed.
Eric Biggers [Sat, 26 Jul 2014 05:58:40 +0000 (00:58 -0500)]
lzms-common.c: Don't process first byte in x86 filter
Microsoft's LZMS compressor and decompressor seemingly skip this byte.
This resulted in different behavior on the following uncompressed data:
0000000 e8 6b e8 fb ff 5e 1f 03 e8 63 e8 fb ff 5e c0 02 0000016 e8 5b e8 fb ff 5e 51 01 e8 53 e8 fb ff 5e c4 00
wimlib would do a translation following the e8 byte at offset 16, since
it would enable x86 translations following the identification of matching
absolute addresses following the potential opcodes at offsets 0 and 8.
But as far as I can tell, the Microsoft implementation just skips byte 0
entirely and doesn't consider it as beginning a potential instruction.
Eric Biggers [Fri, 25 Jul 2014 01:04:29 +0000 (20:04 -0500)]
Check return value of wimlib_global_init() when called in lib
On Windows, wimlib_global_init() can fail if functions are missing from
ntdll. It's best to fail fast in this case rather than plowing ahead and
assuming the user would have already called wimlib_global_init()
themselves if they cared.
Eric Biggers [Sat, 19 Jul 2014 22:11:59 +0000 (17:11 -0500)]
Merge compression updates
- New internal match-finding API (might release as stand-alone library
sometime)
- Add some new match-finding algorithms
- Get rid of lz_hash.c / lz_analyze_block()
- Add optimal parsing to XPRESS
- Optimize get_matches() / skip_bytes() calls in XPRESS and LZX
compressors
- Get rid of decompressor parameters
- Get rid of compressor parameters exposed in API (use compression levels
instead)
Eric Biggers [Thu, 3 Jul 2014 23:25:13 +0000 (18:25 -0500)]
Place common decompression/compression code in public domain
Not much except Huffman coding in these files anymore, and that should be
completely free especially since it's been over 60 years since it was
invented...
Eric Biggers [Tue, 24 Jun 2014 01:04:57 +0000 (20:04 -0500)]
win32_apply.c: Don't use BEGIN_STREAM_STATUS_SKIP_STREAM
This doesn't work correctly when extracting the stream from a pipe or a
solid block. Just read the data and don't do anything with it --- at
least this double checks that it's actually valid.
Eric Biggers [Mon, 16 Jun 2014 02:49:27 +0000 (21:49 -0500)]
sha1-ssse3.asm: Fix building on Windows
For some reason the Intel original doesn't actually build for Windows
because it requests too high alignment per section. It should be
sufficient to retain the alignment directives in the code itself.
Eric Biggers [Sun, 15 Jun 2014 16:34:57 +0000 (11:34 -0500)]
Re-visit SHA-1 code
- Fixed build failures when configured with --enable-ssse3-sha1.
- Actually calculate the message digest correctly in the SSSE3-optimized
version! The Intel code just does block transformations, not arbitrary
updates; the previous code did not reflect this.
- Use an appropriate fallback when the CPU does not support SSSE3
instructions; don't just call abort()!
- Improve sha1_update() and sha1_final(). They should now be slightly
faster, as well as easier to understand.
- Use beXX_to_cpu() and cpu_to_beXX() macros instead of hard-coding
endian conversions.
Eric Biggers [Sat, 14 Jun 2014 04:46:58 +0000 (23:46 -0500)]
finish_write(): Don't use old integrity table if already overwritten
When updating a WIM in-place without modifying the lookup table, the new
XML data will spill into the old integrity table if the new XML data is
longer than the old XML data. Temporarily fix this by not using the old
integrity table in this case.
Eric Biggers [Sat, 14 Jun 2014 04:41:22 +0000 (23:41 -0500)]
dentry.c: Cast name length u16 => u32 whenever adding 2
Due to integer promotion this won't make a difference if an 'int' is 4+
bytes anyway, but make the intention clear: this computation should not
overflow.
Eric Biggers [Sat, 14 Jun 2014 04:33:08 +0000 (23:33 -0500)]
make_canonical_huffman_code(): Stricter validation of max_codeword_len
max_codeword_len must be long enough to give a distinct codeword to each
symbol. As we also check that num_syms >= 2, use this check instead of
max_codeword_len > 0.