Eric Biggers [Sun, 15 Jun 2014 16:34:57 +0000 (11:34 -0500)]
Re-visit SHA-1 code
- Fixed build failures when configured with --enable-ssse3-sha1.
- Actually calculate the message digest correctly in the SSSE3-optimized
version! The Intel code just does block transformations, not arbitrary
updates; the previous code did not reflect this.
- Use an appropriate fallback when the CPU does not support SSSE3
instructions; don't just call abort()!
- Improve sha1_update() and sha1_final(). They should now be slightly
faster, as well as easier to understand.
- Use beXX_to_cpu() and cpu_to_beXX() macros instead of hard-coding
endian conversions.
Eric Biggers [Sun, 15 Jun 2014 05:15:24 +0000 (00:15 -0500)]
avl_tree.c: Remove avl_set_balance_factor()
Eric Biggers [Sat, 14 Jun 2014 20:44:47 +0000 (15:44 -0500)]
Speed up LZ77 match copying
Eric Biggers [Sat, 14 Jun 2014 17:16:11 +0000 (12:16 -0500)]
xpress-decompress.c: Store 'len_hdr' and 'offset_bsr' in unsigned ints
This speeds up XPRESS decompression by about 2%.
Eric Biggers [Sat, 14 Jun 2014 06:10:45 +0000 (01:10 -0500)]
finish_write(): Read old integrity table into memory if needed
This is a better fix for the problem, since it doesn't prevent the old
table from being used when it can be.
Eric Biggers [Sat, 14 Jun 2014 04:46:58 +0000 (23:46 -0500)]
finish_write(): Don't use old integrity table if already overwritten
When updating a WIM in-place without modifying the lookup table, the new
XML data will spill into the old integrity table if the new XML data is
longer than the old XML data. Temporarily fix this by not using the old
integrity table in this case.
Eric Biggers [Sat, 14 Jun 2014 04:45:56 +0000 (23:45 -0500)]
extract.c: Don't compile unneeded code when WITH_NTFS_3G undefined
Eric Biggers [Sat, 14 Jun 2014 04:44:01 +0000 (23:44 -0500)]
extract.c: Do endian conversion when checking pipable WIM header
Eric Biggers [Sat, 14 Jun 2014 04:41:22 +0000 (23:41 -0500)]
dentry.c: Cast name length u16 => u32 whenever adding 2
Due to integer promotion this won't make a difference if an 'int' is 4+
bytes anyway, but make the intention clear: this computation should not
overflow.
Eric Biggers [Sat, 14 Jun 2014 04:40:34 +0000 (23:40 -0500)]
read_dentry_tree(): Use dentry_set_name()
Does the same thing but is shorter.
Eric Biggers [Sat, 14 Jun 2014 04:39:35 +0000 (23:39 -0500)]
compress_serial.c: Don't store compressed length in context
It can be a local variable.
Eric Biggers [Sat, 14 Jun 2014 04:39:00 +0000 (23:39 -0500)]
compress_parallel.c: Don't bail if not all threads can be created
If pthread_create() fails but at least 2 threads were created, use them
instead of falling back to serial compression.
Eric Biggers [Sat, 14 Jun 2014 04:37:37 +0000 (23:37 -0500)]
compress_parallel.c: Use more appropriate type for shift
Eric Biggers [Sat, 14 Jun 2014 04:36:09 +0000 (23:36 -0500)]
chunk_compressor: Use u32 for chunk uncompressed size
All chunk sizes must fit in a 32-bit integer because they cannot exceed
'out_chunk_size'.
Eric Biggers [Sat, 14 Jun 2014 04:33:08 +0000 (23:33 -0500)]
make_canonical_huffman_code(): Stricter validation of max_codeword_len
max_codeword_len must be long enough to give a distinct codeword to each
symbol. As we also check that num_syms >= 2, use this check instead of
max_codeword_len > 0.
Eric Biggers [Sat, 14 Jun 2014 04:31:33 +0000 (23:31 -0500)]
compress.c: Always include base compressor size
When the struct compressor_ops does not provide get_needed_memory(),
still include the size of the struct wimlib_compressor.
Eric Biggers [Sat, 14 Jun 2014 04:30:24 +0000 (23:30 -0500)]
{de,}compress.c: Sort ops by type number
Eric Biggers [Sat, 14 Jun 2014 04:29:28 +0000 (23:29 -0500)]
capture_common.c: Fix comment
Eric Biggers [Fri, 13 Jun 2014 04:54:24 +0000 (23:54 -0500)]
Update NEWS
Eric Biggers [Fri, 13 Jun 2014 04:34:39 +0000 (23:34 -0500)]
dentry.c: Fix comment
Eric Biggers [Fri, 13 Jun 2014 04:24:36 +0000 (23:24 -0500)]
wimlib.h: Make beginning of docs a bit more friendly
Eric Biggers [Fri, 13 Jun 2014 04:05:54 +0000 (23:05 -0500)]
wimlib.h: Update docs
Eric Biggers [Fri, 13 Jun 2014 04:04:05 +0000 (23:04 -0500)]
Remove WIMLIB_COMPRESSION_TYPE_INVALID from library
This is actually only used in wimlib-imagex
Eric Biggers [Fri, 13 Jun 2014 04:00:22 +0000 (23:00 -0500)]
wimlib_get_compressor_needed_memory(): Include struct wimlib_compressor
Eric Biggers [Fri, 13 Jun 2014 01:43:22 +0000 (20:43 -0500)]
make-windows-release: Include all COPYING files
Eric Biggers [Fri, 13 Jun 2014 01:34:27 +0000 (20:34 -0500)]
Update Makefile.am
Eric Biggers [Fri, 13 Jun 2014 01:24:50 +0000 (20:24 -0500)]
Update Linux packaging
Eric Biggers [Fri, 13 Jun 2014 01:22:49 +0000 (20:22 -0500)]
Update README
Eric Biggers [Fri, 13 Jun 2014 01:21:16 +0000 (20:21 -0500)]
Update library license: Add LGPLv3 exception
Eric Biggers [Fri, 13 Jun 2014 01:17:28 +0000 (20:17 -0500)]
Place programs in examples/ in public domain
Eric Biggers [Thu, 12 Jun 2014 05:17:37 +0000 (00:17 -0500)]
lzms-compress.c: Don't do redundant work in cost calculations
Eric Biggers [Thu, 12 Jun 2014 03:42:46 +0000 (22:42 -0500)]
lzx-compress.c: Don't compute match/literal array before actually needed
Eric Biggers [Thu, 12 Jun 2014 03:39:29 +0000 (22:39 -0500)]
lzx-compress.c: Don't do redundant work in cost calculations
Eric Biggers [Thu, 12 Jun 2014 03:34:33 +0000 (22:34 -0500)]
lzx.h: Align 'struct lzx_lru_queue' on x86_64
Eric Biggers [Thu, 12 Jun 2014 04:19:08 +0000 (23:19 -0500)]
{lzx,lzms-decompress.c}: Allocate context with DECODE_TABLE_ALIGNMENT
Eric Biggers [Thu, 12 Jun 2014 04:15:40 +0000 (23:15 -0500)]
Add aligned malloc and free
Don't make dependent on OS support, since that would break the custom
memory allocation functions and also would need to be different between
UNIX and Windows anyway.
Eric Biggers [Wed, 11 Jun 2014 02:32:38 +0000 (21:32 -0500)]
xml.c: Don't count reparse point data and directory streams in <TOTALBYTES>
Eric Biggers [Wed, 11 Jun 2014 01:55:10 +0000 (20:55 -0500)]
Don't exclude out-of-tree absolute symlinks in reparse point fix mode
Excluding such links may not be the expected behavior. In addition,
Microsoft's documentation for ImageX seems to be incorrect when it states
that ImageX excludes such links. Actually, it does not. So, for the
sake of consistency and also doing something that may more sense anyway,
just retain such links, but leave their targets as-is.
Eric Biggers [Wed, 11 Jun 2014 01:40:46 +0000 (20:40 -0500)]
win32_apply.c: Fix reparse point fixup of device-direct links (no trailing slash)
After extraction these should point to the capture directory.
This was broken in 1.6.2 as well, but in a different way!
Eric Biggers [Wed, 11 Jun 2014 01:20:56 +0000 (20:20 -0500)]
unix_apply.c: Honor i_not_rpfixed
This was a regression from 1.6.2
Eric Biggers [Sun, 8 Jun 2014 14:56:54 +0000 (09:56 -0500)]
extract.c: Remove unused internal flag
Eric Biggers [Sun, 8 Jun 2014 03:43:00 +0000 (22:43 -0500)]
Remove some dead assignments
Eric Biggers [Sun, 8 Jun 2014 03:28:30 +0000 (22:28 -0500)]
mount_image.c: Don't pass NULL to mq_send()
Eric Biggers [Sun, 8 Jun 2014 03:25:37 +0000 (22:25 -0500)]
xpress-decompress.c: Remove unused 'lens' parameter to xpress_lz_decode()
Eric Biggers [Sun, 8 Jun 2014 03:16:16 +0000 (22:16 -0500)]
wimboot.c, win32_apply.c: Bracket file by #ifdef __WIN32__
This makes compiling all non-Windows source files manually (e.g. with
clang-scan-build.sh) easier.
Eric Biggers [Sun, 8 Jun 2014 03:15:38 +0000 (22:15 -0500)]
decompress_common.c: Add util.h back
Eric Biggers [Sun, 8 Jun 2014 02:36:31 +0000 (21:36 -0500)]
Update date
Eric Biggers [Sun, 8 Jun 2014 02:32:06 +0000 (21:32 -0500)]
lzms-compress.c: Don't underrun window when checking recent offsets
In LZMS, not all recent offsets are initialized to 1, unlike in LZX.
Eric Biggers [Sun, 8 Jun 2014 02:30:10 +0000 (21:30 -0500)]
A few comment fixes
Eric Biggers [Sun, 8 Jun 2014 02:28:22 +0000 (21:28 -0500)]
Remove a few unnecessary includes
Eric Biggers [Sun, 8 Jun 2014 01:04:23 +0000 (20:04 -0500)]
Update public domain dedications
Eric Biggers [Sat, 7 Jun 2014 22:31:51 +0000 (17:31 -0500)]
wimexport: Document --wimboot option
Eric Biggers [Sat, 7 Jun 2014 22:09:11 +0000 (17:09 -0500)]
wimextract: Rename --no-wildcards to --no-globs; update man page
Eric Biggers [Sat, 7 Jun 2014 21:54:33 +0000 (16:54 -0500)]
wimextract: Suggest --nullglob
Eric Biggers [Sat, 7 Jun 2014 21:48:32 +0000 (16:48 -0500)]
Update NEWS
Eric Biggers [Sat, 7 Jun 2014 21:46:39 +0000 (16:46 -0500)]
lzx-compress.c: Honor cache_limit
This ensures the match cache is never overrun. If for some reason we
average more than LZX_CACHE_PER_POS (currently 8) matches per position,
_excluding skipped positions_, just don't return any more matches.
Eric Biggers [Sat, 7 Jun 2014 21:34:14 +0000 (16:34 -0500)]
Merge branch 'lz_bt'
Eric Biggers [Mon, 2 Jun 2014 03:01:14 +0000 (22:01 -0500)]
Switch from suffix array match-finder to binary tree match-finder
This uses less memory (8 bytes overhead per position vs. 14), is faster,
requires less code (no libdivsufsort), and in some tests actually results
in a better compression ratio.
A binary tree match-finder was used in wimlib v1.5.2 but it didn't seem
as good then, probably because it was combined with a slow block division
algorithm for LZX.
Repeat offsets are now handled differently. The binary tree match-finder
itself doesn't have any logic for repeat offsets at all; instead, the
match-choosing code (now implemented separately for LZX and LZMS) now
does special checks for matches at repeat offsets.
Since less memory is used for match-finding now, increase the default
LZMS pack chunk size from 2^25 (
33554432) to 2^26 (
67108864).
Eric Biggers [Thu, 5 Jun 2014 13:10:18 +0000 (08:10 -0500)]
lzms-common.c, lzms-compress.c: Use pthread_once()
Eric Biggers [Thu, 5 Jun 2014 03:19:18 +0000 (22:19 -0500)]
Fix creating large solid blocks
Dumb bug that only appeared when at least 4 GiB of data was being
archived...!
Eric Biggers [Wed, 4 Jun 2014 02:09:19 +0000 (21:09 -0500)]
lzms-compress.c: Fix typo
Eric Biggers [Sun, 1 Jun 2014 20:42:02 +0000 (15:42 -0500)]
Share most e8 processing code between LZX compressor and decompressor
Eric Biggers [Sun, 1 Jun 2014 13:31:50 +0000 (08:31 -0500)]
lzx-compress.c: Clarify there must be at least one optimization pass
Eric Biggers [Sun, 1 Jun 2014 13:25:55 +0000 (08:25 -0500)]
lzx-compress.c: Use pointers in lzx_optimize_block()
This makes the performance less reliant on the compiler recognizing that
the 'struct lzx_block_spec' does not change throughout the loop.
Eric Biggers [Sat, 31 May 2014 15:12:28 +0000 (10:12 -0500)]
inode_fixup.c: Fix check for directory hard links
We shouldn't assume that the attributes are consistent, so we should
check both ways for directory hard links. Specifically, in the case
where the being-inserted dentry is not marked as a directory but for some
reason it shares an inode number with a dentry marked as a directory, we
want to detect that as a directory hard link.
Eric Biggers [Thu, 29 May 2014 05:42:54 +0000 (00:42 -0500)]
Optimize Huffman code generation
This commit significantly improves the performance of length-limited
canonical Huffman code generation by introducing several optimizations
based on the 7-Zip implementation.
Altlhough Huffman code generation is not the main bottleneck of any of
the compression algorithms implemented here, an optimized implementation
can still improve overall performance by several percent. In addition,
it significantly speeds up LZMS decompression, which requires frequent
rebuilding of adaptive Huffman codes.
Some peformance comparisons:
- The average time taken to generate all three LZX codes (main,
length, and aligned offset) when compressing a Windows PE
filesystem with LZX decreased from 73.9 us to 17.3 us.
- The time taken to compress a Windows PE filesystem with LZX, using
the "fast" LZX algorithm (hash chain match finder and lazy
parsing), decreased from 12.3 s to 11.6 s (a 5.7% improvement).
- The time taken to decompress a Windows PE filesystem with LZMS,
using solid blocks, decreased from 12.3 s to 9.0 s (a 27%
improvement).
Eric Biggers [Sat, 31 May 2014 00:34:34 +0000 (19:34 -0500)]
Revert "lzx-compress.c: Simplify calculation of position footer"
This reverts commit
3adc1ac1ebe221427857d8f6fd06cfb823b4bea6.
If the position footer is unconditionally calculated as the match offset
minus the position base value, the (ultimately unused) position footer
for repeat matches can overflow the number of bits in which it is stored
in the intermediate representation used by this implementation. For now,
use the old version, which would set the position footers of repeat
matches to 0.
Eric Biggers [Thu, 29 May 2014 00:16:33 +0000 (19:16 -0500)]
lzx-compress.c: Simplify output of length footer
Eric Biggers [Thu, 29 May 2014 00:05:19 +0000 (19:05 -0500)]
lzx-compress.c: Simplify calculation of position footer
Eric Biggers [Wed, 28 May 2014 15:44:03 +0000 (10:44 -0500)]
lzx-decompress.c: Simplify handling of recent offsets
The recent offsets can all be handled by the same code. This should
remove at least one branch from the generated code.
Eric Biggers [Wed, 28 May 2014 00:20:31 +0000 (19:20 -0500)]
Remove unused 'num_syms' argument to read_huffsym()
The updated decoder no longer requires that the number of symbols in the
alphabet be provided when decoding each symbol.
Eric Biggers [Tue, 27 May 2014 21:51:24 +0000 (16:51 -0500)]
Faster Huffman symbol decoding
When decoding a codeword short enough for a direct mapping, we can read
the codeword length at the same time we read the symbol itself. This
speeds up Huffman decoding slightly.
This commit also updates and improves the comments for
make_huffman_decode_table().
Eric Biggers [Tue, 27 May 2014 16:23:22 +0000 (11:23 -0500)]
lzx-decompress.c: One fewer branch in undo_call_insn_translation()
Eric Biggers [Tue, 27 May 2014 16:03:07 +0000 (11:03 -0500)]
LZX, XPRESS decompression: Return 0 bits on overrun
If the compressed data is invalid such that the compressed data buffer is
overrun, it's simpler to just return 0 bits instead of explicitly
checking the return value at every call site of bitstream_read_bits() and
read_huffsym().
This doesn't necessarily mean that invalid data will go undetected. Just
for LZX decompression, chances are there will be another problem if all
0's start being returned (e.g. invalid match or invalid Huffman tree).
For WIM operations like extraction, the uncompressed data is checked with
SHA-1 message digests anyway, so it's virtually impossible for corruption
to go undetected.
Also, the LZMS decompressor already does this.
Eric Biggers [Tue, 27 May 2014 14:28:59 +0000 (09:28 -0500)]
lzx-decompress.c: Add SSE2 version of undo_call_insn_preprocessing()
Eric Biggers [Tue, 27 May 2014 02:25:29 +0000 (21:25 -0500)]
resource.c: Fix two comments
Eric Biggers [Tue, 27 May 2014 02:22:23 +0000 (21:22 -0500)]
resource.c: Don't call lseek() if not necessary
To be reading a pipable resource from a pipe, is_pipable must be set in
the 'struct wim_resource_spec', so check that first before calling
filedes_is_seekable().
Eric Biggers [Tue, 27 May 2014 01:51:24 +0000 (20:51 -0500)]
extract.c: Remove unused internal flag
Eric Biggers [Mon, 26 May 2014 23:39:12 +0000 (18:39 -0500)]
Update NEWS
Eric Biggers [Mon, 26 May 2014 23:31:03 +0000 (18:31 -0500)]
xml.c: Export <ARCH> and <WIMBOOT> nodes correctly
The XML data still needs to be handled better in general, but this fixes
the bug that this information was missing from clone_image_info().
Eric Biggers [Mon, 26 May 2014 22:52:58 +0000 (17:52 -0500)]
Check for case where too many identical files are being extracted
Hoping to add a real fix for this, but for now at least avoid the buffer
overflow in UNIX and NTFS-3g extraction modes.
Eric Biggers [Mon, 26 May 2014 21:21:47 +0000 (16:21 -0500)]
mount_image.c: Don't use tchar when not necessary
Eric Biggers [Mon, 26 May 2014 21:09:38 +0000 (16:09 -0500)]
mount_image.c: Add error.h outside WITH_FUSE conditional
Eric Biggers [Mon, 26 May 2014 20:42:50 +0000 (15:42 -0500)]
Fix file locking
- Lock in_fd only
- Unlock WIM immediately after commiting mounted image (before
fuse_main() returns)
- Don't lock WIM for read-only mount
Eric Biggers [Mon, 26 May 2014 16:35:23 +0000 (11:35 -0500)]
test-imagex-mount: Always use tmp.mnt for mounts
Eric Biggers [Mon, 26 May 2014 15:32:42 +0000 (10:32 -0500)]
mount_image.c: Use setxattr wimfs.unmount_info, getxattr wimfs.unmount
This lets us return the unmount status directly, instead of sending it
over the message queue (which is subject to problems, like it being
full).
Also, if commit fails, leave the image mounted, unless doing a forced
unmount.
Eric Biggers [Mon, 26 May 2014 14:35:10 +0000 (09:35 -0500)]
mount_image.c: Fix comment
Eric Biggers [Mon, 26 May 2014 14:25:55 +0000 (09:25 -0500)]
inode.h: Fix comment
Eric Biggers [Mon, 26 May 2014 14:12:38 +0000 (09:12 -0500)]
Remove references to libwim9
Eric Biggers [Mon, 26 May 2014 14:06:52 +0000 (09:06 -0500)]
Revert "lzx-compress.c: Disable verification by default"
This reverts commit
5448b9cd60e9b1ebf4efcd2d1b2aac346b2e829c.
Switched from libdivsufsort to libdivsufsort-lite; should be the same,
but just in case I'm leaving verification on for the "slow" algorithm.
Eric Biggers [Mon, 26 May 2014 14:01:20 +0000 (09:01 -0500)]
tagged_items.c: Include header size when searching items
Eric Biggers [Mon, 26 May 2014 05:28:04 +0000 (00:28 -0500)]
Track divsufsort.h
Eric Biggers [Mon, 26 May 2014 05:15:16 +0000 (00:15 -0500)]
Use libdivsufsort-lite, not the full libdivsufsort
Eric Biggers [Mon, 26 May 2014 04:47:24 +0000 (23:47 -0500)]
Remove xattr configuration option
Eric Biggers [Mon, 26 May 2014 04:41:18 +0000 (23:41 -0500)]
lzx-compress.c: Disable verification by default
The algorithm seems to be sufficiently well tested now. And the data is
checked with SHA-1 message digests anyway. This slightly improves LZX
compression performance.
Eric Biggers [Mon, 26 May 2014 04:28:57 +0000 (23:28 -0500)]
dentry.h: Remove unneeded forwards declarations
Eric Biggers [Mon, 26 May 2014 04:21:05 +0000 (23:21 -0500)]
Move wim_pathname_to_stream() to mount_image.c
Eric Biggers [Mon, 26 May 2014 04:04:20 +0000 (23:04 -0500)]
Remove unused function lte_filename_valid()
Eric Biggers [Mon, 26 May 2014 03:44:04 +0000 (22:44 -0500)]
struct wim_dentry: Union subdir_offset and tmp_list
Eric Biggers [Mon, 26 May 2014 03:40:49 +0000 (22:40 -0500)]
struct wim_dentry: Remove 'length' field