Eric Biggers [Sun, 14 Apr 2019 06:21:42 +0000 (23:21 -0700)]
lcpit_matchfinder: fix limiting nice_match_len
The "normal" mode of the lcp-interval tree matchfinder supports finding
matches up to LCP_MAX bytes. The "huge" mode, which is needed on
buffers larger than 64 MiB, supports up to HUGE_LCP_MAX bytes.
nice_match_len must be limited to the appropriate one of these values.
But nice_match_len is limited by lcpit_matchfinder_init(). That's
wrong, because it only knows whether huge mode *might* be used later,
based on max_bufsize. Which mode to use is actually decided on a
buffer-by-buffer basis by lcpit_matchfinder_load_buffer().
Thus, limit nice_match_len in lcpit_matchfinder_load_buffer() instead.
This fixes a crash or incorrect output during LZMS compression with a
compression level > 50 and a chunk size > 64 MiB.
Eric Biggers [Wed, 28 Feb 2018 03:31:58 +0000 (19:31 -0800)]
split.c: fix finding extension of first split WIM part
Silly old bug: wimlib_split() considered the first dot in the SWM path
to begin the filename extension. But of course, there can be other dots
in the path; we need to look for the last dot in the last component.
Eric Biggers [Sun, 21 Jan 2018 21:47:10 +0000 (13:47 -0800)]
wimlib-imagex: add --include-integrity option
The --check option currently does two things: verify the integrity table
of the input WIM(s), and include an integrity table in the output
WIM(s). Some users would like to do the latter only, especially if
there are large input WIM(s).
Add an option --include-integrity which does this.
Eric Biggers [Sun, 21 Jan 2018 21:47:10 +0000 (13:47 -0800)]
wimlib-imagex: try harder to optimize out opening template WIM
As an optimization, 'wimcapture' and 'wimappend' don't separately open
the template WIM for --update-of if no filename is specified in that
option, which makes it default to either the single base WIM
(--delta-from), or the WIM being appended to.
Extend that optimization to cases where the filename is specified in
--update-of and it exactly matches the filename of the WIM being
appended to or any of the base WIMs.
Eric Biggers [Sun, 21 Jan 2018 21:47:10 +0000 (13:47 -0800)]
Make stream_hash() return NULL for unhashed streams
Otherwise it will return a bogus value from the union with ->back_inode
and ->back_stream_id. Most callers ensured this cannot happen, but a
couple did not. It should be explicitly prevented or handled.
Eric Biggers [Sun, 21 Jan 2018 21:47:09 +0000 (13:47 -0800)]
Capture and apply extended attributes on Windows
DISM recently started supporting capturing and applying xattrs on
Windows (though, it is broken when applying multiple xattrs per file).
Make wimlib support the same, using the same on-disk format. Unlike
DISM it is on by default, not controlled by an option, since there
doesn't seem to be a good reason to make it an option.
Also deprecate the tagged item wimlib was using to store xattrs on Linux
and switch over to the format used by WIMGAPI to store xattrs on
Windows, so that new WIM images use the same xattr format on both
platforms. One caveat is that on Linux XATTR_SIZE_MAX is 65536 whereas
in the new WIM tagged item format we can only store up to 65535 bytes.
That is unlikely to matter though.
As future work, the NTFS-3G capture and apply backends should be updated
to support xattrs too.
Eric Biggers [Sun, 16 Jul 2017 06:26:33 +0000 (23:26 -0700)]
unaligned: use may_alias attribute
gcc7 miscompiles the "undo" mode of translate_if_needed() in
lzms_common.c because the get_unaligned_le16() was incorrectly being
moved before the put_unaligned_le32(). Fix it by marking the special
"unaligned" structs with the may_alias attribute.
Eric Biggers [Sun, 16 Jul 2017 06:26:33 +0000 (23:26 -0700)]
Use dynamically-sized path buffer when scanning files
This is needed to guarantee that no buffer overflow can occur when
scanning a deep directory structure. The new way also avoids using
PATH_MAX, which fixes a build error on systems that don't define it.
Mike Swanson [Sun, 4 Jun 2017 22:35:34 +0000 (15:35 -0700)]
update_image.c: Ignore Windows 10 Recycle Bin directories.
On Windows 10 (possibly earlier versions?), the \$RECYCLE.BIN or
\$Recycle.Bin directories are created in the root of a volume.
Both case variants here so capture of an NTFS volume from Linux
should work, but wouldn't make a difference when capturing on
Windows.
Eric Biggers [Wed, 19 Apr 2017 06:58:03 +0000 (23:58 -0700)]
Improved year 2038 safety
Make wimlib on 32-bit Windows year 2038 safe by doing the following:
- Build both the library and program with 64-bit time_t, being careful
to avoid changing the timespec struct exposed in the API.
- Update wimlib's API to include an extended seconds field in
wimlib_dir_entry for each timestamp, and set it when tv_sec is 32-bit.
- When needing the current time, call GetSystemTimeAsFileTime() instead
of MinGW's gettimeofday().
This also has the advantage that due to switching to the 64-bit time_t
functions, 32-bit wimlib-imagex.exe now prints timestamps prior to year
1970 correctly.
Unfortunately, despite the API improvement, we cannot at this time make
wimlib fully Y2038-safe on 32-bit UNIX, due to lack of OS support.
Eric Biggers [Sun, 29 Jan 2017 05:18:21 +0000 (21:18 -0800)]
avl_tree.h: avoid bad function pointer cast
Casting the type of the 'cmp' function, while under normal circumstances
compiled correctly, was not technically correct and was not compatible
with some control flow integrity (CFI) implementations.
Eric Biggers [Sun, 15 Jan 2017 21:34:36 +0000 (13:34 -0800)]
lzx_compress: optimize storing information in lzx_sequence
Pack the literal run length and match length ourselves instead of using
bitfields, and store the actual match length instead of the adjusted
match length. Also make matchlen=0 represent end-of-block, and store
the full main symbol, not just the match header.
Eric Biggers [Sun, 15 Jan 2017 01:00:13 +0000 (17:00 -0800)]
Don't generate GUID in wimlib_create_new_wim()
It's not necessary to generate a GUID in wimlib_create_new_wim() because
one is generated later by wimlib_write(), and nothing seems to assume
that a WIMStruct not yet backed by a file has a valid GUID. This saves
a call to get_random_bytes(). Also remove some unnecessary
initializations to 0.
Eric Biggers [Sat, 14 Jan 2017 08:56:39 +0000 (00:56 -0800)]
lzx_compress: fix corruption with long literal run
The last round of updates to the LZX compressor made it start being able
to use larger blocks, up to ~100KB. Unfortunately it was overlooked
that this allows literal runs > 65535 bytes while in one place the
length of a literal run was still being stored in a u16. Therefore, on
incompressible input data this could be wrapped around, causing
incorrect compression. Fix this by enlarging the variable.
Eric Biggers [Sun, 8 Jan 2017 06:34:32 +0000 (22:34 -0800)]
hc_matchfinder: use well-defined initialization of best_matchptr
The initial value of best_matchptr is not truly used, but since we do
always compute 'in_next - best_matchptr', assign an initial value which
avoids undefined behavior.
Eric Biggers [Tue, 27 Dec 2016 23:24:55 +0000 (17:24 -0600)]
Add basic infrastructure for storing xattr items
Define a new tagged metadata item to hold a list of names and values of
Linux-style extended attributes, and prepare for supporting
capture/apply of extended attributes.
I considered making the xattrs a stream instead, referenced from the
tagged item which would just hold a hash. This would have allowed
xattrs to be deduplicated between files. However, I ultimately decided
against this because WIMGAPI and older versions of wimlib would discard
the streams on optimize/export, and extraction would be much more
complicated because xattr streams could come up for extraction before
other streams --- which would be especially problematic for symlinks.
Eric Biggers [Tue, 27 Dec 2016 23:24:55 +0000 (17:24 -0600)]
tagged_items updates
- Expose tagged_item functions in new header tagged_items.h
- Make object_id functions inline functions in object_id.h
- Make inode_get_tagged_item() return stored length, not aligned length
- Add a new function inode_set_tagged_data() which removes existing
items before setting the new one, and use it for inode_set_object_id()
- Make inode_add_tagged_item() append item rather than prepend
- Keep items 8-byte aligned in memory
Eric Biggers [Tue, 27 Dec 2016 02:27:29 +0000 (20:27 -0600)]
Improve random number generation
wimlib used rand() to generate random numbers, e.g. for GUIDs. This was
neither cryptographically secure nor thread-safe. Use getrandom(),
/dev/urandom, or RtlGenRandom() instead.
Eric Biggers [Sat, 17 Dec 2016 03:47:44 +0000 (19:47 -0800)]
join.c: clean up verify_swm_set()
UBSAN complained when the parts_to_swms array had 0 length. Clean this
up by sorting the parts first, making the verification simpler. Also
don't bother checking compression_type and chunk_size anymore; checking
guid should be sufficient, and it doesn't really matter if the
compression formats are different since now everything will be written
out correctly anyway.
Eric Biggers [Thu, 15 Dec 2016 04:49:55 +0000 (20:49 -0800)]
Extract sparse files as sparse
When extracting a stream belonging to an inode with
FILE_ATTRIBUTE_SPARSE_FILE set, before writing any data, mark the
extracted stream as sparse if needed and skip preallocating space.
Then, skip writing zero regions. This makes it so that sparse files are
still sparse after extraction.