Eric Biggers [Sun, 31 Jul 2022 02:03:42 +0000 (19:03 -0700)]
Use MIT license instead of CC0
CC0 has continued to fall out of favor due to the patents clause
(https://lwn.net/ml/fedora-legal/CAC1cPGw1xScGAXo-0NRs92zFB7ptRxTt=oCYi0BxfZDfAgUtYQ@mail.gmail.com).
Years ago I released some source files in this project (not the whole
project) under CC0. Use the MIT license for these files instead.
Note that this requires claiming copyright on the latest version of
these files. Of course, previous versions of these files remain public
domain where legally recognized; this is *not* in any way an attempt to
"revoke" the public domain status of previous versions.
wimlib_iterate_dir_tree() on a modified-but-not-committed image is very
slow because it checksums all unhashed blobs. This was originally
implemented by commit 681faad85f73 ("wimlib_iterate_dir_tree(): checksum
unhashed blobs"), presumably to make the sha1_hash field always valid.
However, I can't remember a real use case for this. The current
behavior is causing problems, so let's just revert it and update the
documentation accordingly.
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=572
Eric Biggers [Tue, 23 Nov 2021 01:55:42 +0000 (17:55 -0800)]
export_image.c: allow duplicate image names in source WIM
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=568. DISM can
create WIM files containing images with duplicate names, whereas wimlib
enforces unique image names in certain cases such as adding images,
exporting images, and changing image names. This behavior generally
seems fine, but the "export" check is too strict: an export of "all"
images will fail if the source WIM contains duplicate names.
Fix this by making wimlib_export_image() allow duplicate image names in
the source WIM, provided that they don't collide with image names that
already exist in the destination WIM.
Eric Biggers [Tue, 3 Aug 2021 04:53:42 +0000 (21:53 -0700)]
configure.ac: fix trailing newline issue
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=562.
m4_esyscmd() needs to be m4_esyscmd_s(), so that the version string
doesn't get a trailing newline. It works for me either way, but that's
probably because in autoconf 2.70, AC_INIT started trimming extra
whitespace from its arguments (as per the release notes at
https://lists.gnu.org/archive/html/autotools-announce/2020-12/msg00001.html).
So presumably this fix is needed for older versions of autoconf.
Eric Biggers [Sat, 10 Jul 2021 22:47:57 +0000 (17:47 -0500)]
configure.ac: generate version number from git commit and tags
This should hopefully make it less confusing when building from the git
repository. Previously, when doing so the version number would always
be that of the last official release.
Eric Biggers [Sat, 10 Jul 2021 22:51:26 +0000 (17:51 -0500)]
nasm.m4: use AS_MESSAGE_LOG_FD
Address the following warning when running autoreconf:
configure.ac:191: warning: The macro `AC_FD_CC' is obsolete.
configure.ac:191: You should run autoupdate.
./lib/autoconf/general.m4:399: AC_FD_CC is expanded from...
m4/nasm.m4:4: AC_PROG_NASM is expanded from...
configure.ac:191: the top level
Eric Biggers [Mon, 5 Jul 2021 06:03:50 +0000 (23:03 -0700)]
Warn rather than abort if SHA-1 is same but size is different
Assertions should only be used for bugs in wimlib, but this scenario can
also happen if there is a SHA-1 collision, or if the SHA-1 hash provided
by the filesystem for a WIM-backed file on Windows is wrong.
Eric Biggers [Tue, 29 Jun 2021 07:42:11 +0000 (00:42 -0700)]
win32: update WOF ioctl definitions
Use the "official" Microsoft struct and field names, and only define
things when they aren't already defined (since some of them were
recently added to MinGW's winioctl.h, causing build errors).
Eric Biggers [Fri, 2 Apr 2021 04:07:53 +0000 (21:07 -0700)]
Fix slow progress updating for wimsplit
wimsplit only prints a progress message when starting each WIM part.
That could be very infrequently since each part could be gigabytes.
Fix it to update the progress regularly as data is written, like the
other wimlib-imagex commands do.
This required changing the library to report
WIMLIB_PROGRESS_MSG_WRITE_STREAMS messages from wimlib_split() and
include the completed compressed size in them.
Reported at https://www.reddit.com/r/pcmasterrace/comments/hagu4k/wimlibimagex_split_stuck_at_0
Eric Biggers [Tue, 27 Oct 2020 03:17:02 +0000 (20:17 -0700)]
win32_replacements.c: fix handle closing in win32_wglob()
The handle returned by FindFirstFileW() needs to be closed by
FindClose(), not by CloseHandle().
This is a very old bug, which presumably wasn't noticed before because
ordinarily it just leaked the handle. However, this bug caused a SEH
exception when wimlib was run under a debugger.
Eric Biggers [Sun, 23 Aug 2020 19:37:12 +0000 (12:37 -0700)]
COPYING: clarify the license
Some of the language in COPYING is potentially unclear. For example,
there is some ambiguity in when each license option of GPL and LGPL is
allowed. Clarify the language.
Note, this commit isn't intended to actually change the license at all.
It just clarifies what I intended.
Eric Biggers [Tue, 2 Jun 2020 04:26:04 +0000 (21:26 -0700)]
win32_capture: avoid unnecessary fallback to recursive scan
When doing the fast MFT scan (via FSCTL_QUERY_FILE_LAYOUT) and we find a
directory that needs to fall back to the standard scan, we actually only
need to fall back for the directory itself -- not also its children.
Optimize things accordingly.
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=533
Eric Biggers [Sun, 24 May 2020 18:22:36 +0000 (11:22 -0700)]
Remove obsolete Linux packaging files
There are now official Debian and Fedora packages for wimlib. So the
in-tree packaging files are redundant. Also I haven't tested them in a
long time, so there's a good chance they don't work properly anymore.
Eric Biggers [Fri, 22 May 2020 05:35:29 +0000 (22:35 -0700)]
Use memcpy() for unaligned accesses
For unaligned memory accesses, with modern compilers memcpy() is
compiled just as efficiently as __attribute__((packed)). This also
avoids using a nonstandard extension and potentially running into the
gcc 10 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994.
Eric Biggers [Sun, 14 Apr 2019 06:21:42 +0000 (23:21 -0700)]
lcpit_matchfinder: fix limiting nice_match_len
The "normal" mode of the lcp-interval tree matchfinder supports finding
matches up to LCP_MAX bytes. The "huge" mode, which is needed on
buffers larger than 64 MiB, supports up to HUGE_LCP_MAX bytes.
nice_match_len must be limited to the appropriate one of these values.
But nice_match_len is limited by lcpit_matchfinder_init(). That's
wrong, because it only knows whether huge mode *might* be used later,
based on max_bufsize. Which mode to use is actually decided on a
buffer-by-buffer basis by lcpit_matchfinder_load_buffer().
Thus, limit nice_match_len in lcpit_matchfinder_load_buffer() instead.
This fixes a crash or incorrect output during LZMS compression with a
compression level > 50 and a chunk size > 64 MiB.
Eric Biggers [Wed, 28 Feb 2018 03:31:58 +0000 (19:31 -0800)]
split.c: fix finding extension of first split WIM part
Silly old bug: wimlib_split() considered the first dot in the SWM path
to begin the filename extension. But of course, there can be other dots
in the path; we need to look for the last dot in the last component.
Eric Biggers [Sun, 21 Jan 2018 21:47:10 +0000 (13:47 -0800)]
wimlib-imagex: add --include-integrity option
The --check option currently does two things: verify the integrity table
of the input WIM(s), and include an integrity table in the output
WIM(s). Some users would like to do the latter only, especially if
there are large input WIM(s).
Add an option --include-integrity which does this.
Eric Biggers [Sun, 21 Jan 2018 21:47:10 +0000 (13:47 -0800)]
wimlib-imagex: try harder to optimize out opening template WIM
As an optimization, 'wimcapture' and 'wimappend' don't separately open
the template WIM for --update-of if no filename is specified in that
option, which makes it default to either the single base WIM
(--delta-from), or the WIM being appended to.
Extend that optimization to cases where the filename is specified in
--update-of and it exactly matches the filename of the WIM being
appended to or any of the base WIMs.
Eric Biggers [Sun, 21 Jan 2018 21:47:10 +0000 (13:47 -0800)]
Make stream_hash() return NULL for unhashed streams
Otherwise it will return a bogus value from the union with ->back_inode
and ->back_stream_id. Most callers ensured this cannot happen, but a
couple did not. It should be explicitly prevented or handled.
Eric Biggers [Sun, 21 Jan 2018 21:47:09 +0000 (13:47 -0800)]
Capture and apply extended attributes on Windows
DISM recently started supporting capturing and applying xattrs on
Windows (though, it is broken when applying multiple xattrs per file).
Make wimlib support the same, using the same on-disk format. Unlike
DISM it is on by default, not controlled by an option, since there
doesn't seem to be a good reason to make it an option.
Also deprecate the tagged item wimlib was using to store xattrs on Linux
and switch over to the format used by WIMGAPI to store xattrs on
Windows, so that new WIM images use the same xattr format on both
platforms. One caveat is that on Linux XATTR_SIZE_MAX is 65536 whereas
in the new WIM tagged item format we can only store up to 65535 bytes.
That is unlikely to matter though.
As future work, the NTFS-3G capture and apply backends should be updated
to support xattrs too.
Eric Biggers [Sun, 16 Jul 2017 06:26:33 +0000 (23:26 -0700)]
unaligned: use may_alias attribute
gcc7 miscompiles the "undo" mode of translate_if_needed() in
lzms_common.c because the get_unaligned_le16() was incorrectly being
moved before the put_unaligned_le32(). Fix it by marking the special
"unaligned" structs with the may_alias attribute.
Eric Biggers [Sun, 16 Jul 2017 06:26:33 +0000 (23:26 -0700)]
Use dynamically-sized path buffer when scanning files
This is needed to guarantee that no buffer overflow can occur when
scanning a deep directory structure. The new way also avoids using
PATH_MAX, which fixes a build error on systems that don't define it.
Mike Swanson [Sun, 4 Jun 2017 22:35:34 +0000 (15:35 -0700)]
update_image.c: Ignore Windows 10 Recycle Bin directories.
On Windows 10 (possibly earlier versions?), the \$RECYCLE.BIN or
\$Recycle.Bin directories are created in the root of a volume.
Both case variants here so capture of an NTFS volume from Linux
should work, but wouldn't make a difference when capturing on
Windows.
Eric Biggers [Wed, 19 Apr 2017 06:58:03 +0000 (23:58 -0700)]
Improved year 2038 safety
Make wimlib on 32-bit Windows year 2038 safe by doing the following:
- Build both the library and program with 64-bit time_t, being careful
to avoid changing the timespec struct exposed in the API.
- Update wimlib's API to include an extended seconds field in
wimlib_dir_entry for each timestamp, and set it when tv_sec is 32-bit.
- When needing the current time, call GetSystemTimeAsFileTime() instead
of MinGW's gettimeofday().
This also has the advantage that due to switching to the 64-bit time_t
functions, 32-bit wimlib-imagex.exe now prints timestamps prior to year
1970 correctly.
Unfortunately, despite the API improvement, we cannot at this time make
wimlib fully Y2038-safe on 32-bit UNIX, due to lack of OS support.
Eric Biggers [Sun, 29 Jan 2017 05:18:21 +0000 (21:18 -0800)]
avl_tree.h: avoid bad function pointer cast
Casting the type of the 'cmp' function, while under normal circumstances
compiled correctly, was not technically correct and was not compatible
with some control flow integrity (CFI) implementations.
Eric Biggers [Sun, 15 Jan 2017 21:34:36 +0000 (13:34 -0800)]
lzx_compress: optimize storing information in lzx_sequence
Pack the literal run length and match length ourselves instead of using
bitfields, and store the actual match length instead of the adjusted
match length. Also make matchlen=0 represent end-of-block, and store
the full main symbol, not just the match header.
Eric Biggers [Sun, 15 Jan 2017 01:00:13 +0000 (17:00 -0800)]
Don't generate GUID in wimlib_create_new_wim()
It's not necessary to generate a GUID in wimlib_create_new_wim() because
one is generated later by wimlib_write(), and nothing seems to assume
that a WIMStruct not yet backed by a file has a valid GUID. This saves
a call to get_random_bytes(). Also remove some unnecessary
initializations to 0.
Eric Biggers [Sat, 14 Jan 2017 08:56:39 +0000 (00:56 -0800)]
lzx_compress: fix corruption with long literal run
The last round of updates to the LZX compressor made it start being able
to use larger blocks, up to ~100KB. Unfortunately it was overlooked
that this allows literal runs > 65535 bytes while in one place the
length of a literal run was still being stored in a u16. Therefore, on
incompressible input data this could be wrapped around, causing
incorrect compression. Fix this by enlarging the variable.
Eric Biggers [Sun, 8 Jan 2017 06:34:32 +0000 (22:34 -0800)]
hc_matchfinder: use well-defined initialization of best_matchptr
The initial value of best_matchptr is not truly used, but since we do
always compute 'in_next - best_matchptr', assign an initial value which
avoids undefined behavior.