Eric Biggers [Mon, 27 Mar 2023 00:25:46 +0000 (17:25 -0700)]
Improve fuzz testing
- Convert fuzzing scripts from afl-fuzz to libFuzzer
- Add xml and wim fuzzers, including malloc failure injection
- Fuzz for 2 minutes as part of the GitHub Actions CI
Eric Biggers [Mon, 27 Mar 2023 00:25:46 +0000 (17:25 -0700)]
README: remove an unnecessary notice
The "copyright years may be listed using range notation" notice is
recommended by the GNU project, but most other people don't consider it
to be necessary or meaningful.
Eric Biggers [Mon, 27 Mar 2023 00:25:46 +0000 (17:25 -0700)]
Improve the make-windows-release script
- Automatically bootstrap the repository if needed
- Add --no-zip and --no-docs options
- Add --install-msys2-packages to install the needed MSYS2 packages
- Autodetect the architecture when using MSYS2
- Use 'strip' instead of ${ARCH}-w64-mingw32-strip, for compatibility
with MSYS2
- Make it pass 'shellcheck'
- Other cleanups
Eric Biggers [Mon, 27 Mar 2023 00:25:46 +0000 (17:25 -0700)]
Eliminate the dependency on libxml2
libxml2 is the only remaining third-party library that Windows builds of
wimlib need. It's a bit of a pain to have to download it, build it, and
trick libtool into linking it into the resulting DLL. It then
constitutes a significant part of the size of the resulting DLL, even
with the minimal libxml2 configuration options being used.
In reality, WIM files only use a small subset of XML containing the most
commonly used XML features. Using a full-featured XML library (that
supports "features" like External Entities that we have to remember to
disable) is a bit dangerous and not actually necessary. 7-Zip's WIM
support, for example, just uses a very minimal home-brew XML processor.
Another issue is that the libxml2 API always uses UTF-8, which causes
the conversion UTF-16LE => UTF-8 => UTF-16LE to be needed on Windows.
This isn't really an "issue", per se, but it shouldn't be necessary.
Finally, wimlib was integrating with libxml2 at a low level via the tree
API, and it overlooked some things. For example, libxml2 trees have
separate CDATA and TEXT nodes, but wimlib was only looking at TEXT, so
CDATA was ignored. It was also possible for wimlib to create a document
containing control characters, which is not valid XML so it could not be
read. These weren't very important issues, but the point is, just using
an XML library doesn't solve quite as many problems as one would hope...
Therefore, just add a simple XML 1.0 processor directly in the source
code. It handles all XML features that are used in WIM files, plus a
bit more for futureproofing. It's also faster than libxml2.
Eric Biggers [Mon, 20 Mar 2023 03:59:17 +0000 (20:59 -0700)]
Consistently use _WIN32 instead of __WIN32__
_WIN32 works with all compilers, while __WIN32__ is MinGW-specific.
This project used __WIN32__ in files that only support MinGW, and _WIN32
in other files such as the library header and example programs. One
place even used WIN32. Avoid this unnecessary complication by just
always using _WIN32.
Eric Biggers [Sat, 18 Mar 2023 07:17:54 +0000 (00:17 -0700)]
Call wimlib_global_init() when creating compressors and decompressors
All nontrivial API functions are supposed to call wimlib_global_init().
wimlib_create_compressor() and wimlib_create_decompressor() did not.
Make them do so, so that CPU feature detection can be moved to
wimlib_global_init().
Eric Biggers [Sun, 31 Jul 2022 02:03:42 +0000 (19:03 -0700)]
Use MIT license instead of CC0
CC0 has continued to fall out of favor due to the patents clause
(https://lwn.net/ml/fedora-legal/CAC1cPGw1xScGAXo-0NRs92zFB7ptRxTt=oCYi0BxfZDfAgUtYQ@mail.gmail.com).
Years ago I released some source files in this project (not the whole
project) under CC0. Use the MIT license for these files instead.
Note that this requires claiming copyright on the latest version of
these files. Of course, previous versions of these files remain public
domain where legally recognized; this is *not* in any way an attempt to
"revoke" the public domain status of previous versions.
wimlib_iterate_dir_tree() on a modified-but-not-committed image is very
slow because it checksums all unhashed blobs. This was originally
implemented by commit 681faad85f73 ("wimlib_iterate_dir_tree(): checksum
unhashed blobs"), presumably to make the sha1_hash field always valid.
However, I can't remember a real use case for this. The current
behavior is causing problems, so let's just revert it and update the
documentation accordingly.
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=572
Eric Biggers [Tue, 23 Nov 2021 01:55:42 +0000 (17:55 -0800)]
export_image.c: allow duplicate image names in source WIM
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=568. DISM can
create WIM files containing images with duplicate names, whereas wimlib
enforces unique image names in certain cases such as adding images,
exporting images, and changing image names. This behavior generally
seems fine, but the "export" check is too strict: an export of "all"
images will fail if the source WIM contains duplicate names.
Fix this by making wimlib_export_image() allow duplicate image names in
the source WIM, provided that they don't collide with image names that
already exist in the destination WIM.
Eric Biggers [Tue, 3 Aug 2021 04:53:42 +0000 (21:53 -0700)]
configure.ac: fix trailing newline issue
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=562.
m4_esyscmd() needs to be m4_esyscmd_s(), so that the version string
doesn't get a trailing newline. It works for me either way, but that's
probably because in autoconf 2.70, AC_INIT started trimming extra
whitespace from its arguments (as per the release notes at
https://lists.gnu.org/archive/html/autotools-announce/2020-12/msg00001.html).
So presumably this fix is needed for older versions of autoconf.
Eric Biggers [Sat, 10 Jul 2021 22:47:57 +0000 (17:47 -0500)]
configure.ac: generate version number from git commit and tags
This should hopefully make it less confusing when building from the git
repository. Previously, when doing so the version number would always
be that of the last official release.
Eric Biggers [Sat, 10 Jul 2021 22:51:26 +0000 (17:51 -0500)]
nasm.m4: use AS_MESSAGE_LOG_FD
Address the following warning when running autoreconf:
configure.ac:191: warning: The macro `AC_FD_CC' is obsolete.
configure.ac:191: You should run autoupdate.
./lib/autoconf/general.m4:399: AC_FD_CC is expanded from...
m4/nasm.m4:4: AC_PROG_NASM is expanded from...
configure.ac:191: the top level
Eric Biggers [Mon, 5 Jul 2021 06:03:50 +0000 (23:03 -0700)]
Warn rather than abort if SHA-1 is same but size is different
Assertions should only be used for bugs in wimlib, but this scenario can
also happen if there is a SHA-1 collision, or if the SHA-1 hash provided
by the filesystem for a WIM-backed file on Windows is wrong.
Eric Biggers [Tue, 29 Jun 2021 07:42:11 +0000 (00:42 -0700)]
win32: update WOF ioctl definitions
Use the "official" Microsoft struct and field names, and only define
things when they aren't already defined (since some of them were
recently added to MinGW's winioctl.h, causing build errors).
Eric Biggers [Fri, 2 Apr 2021 04:07:53 +0000 (21:07 -0700)]
Fix slow progress updating for wimsplit
wimsplit only prints a progress message when starting each WIM part.
That could be very infrequently since each part could be gigabytes.
Fix it to update the progress regularly as data is written, like the
other wimlib-imagex commands do.
This required changing the library to report
WIMLIB_PROGRESS_MSG_WRITE_STREAMS messages from wimlib_split() and
include the completed compressed size in them.
Reported at https://www.reddit.com/r/pcmasterrace/comments/hagu4k/wimlibimagex_split_stuck_at_0
Eric Biggers [Tue, 27 Oct 2020 03:17:02 +0000 (20:17 -0700)]
win32_replacements.c: fix handle closing in win32_wglob()
The handle returned by FindFirstFileW() needs to be closed by
FindClose(), not by CloseHandle().
This is a very old bug, which presumably wasn't noticed before because
ordinarily it just leaked the handle. However, this bug caused a SEH
exception when wimlib was run under a debugger.
Eric Biggers [Sun, 23 Aug 2020 19:37:12 +0000 (12:37 -0700)]
COPYING: clarify the license
Some of the language in COPYING is potentially unclear. For example,
there is some ambiguity in when each license option of GPL and LGPL is
allowed. Clarify the language.
Note, this commit isn't intended to actually change the license at all.
It just clarifies what I intended.
Eric Biggers [Tue, 2 Jun 2020 04:26:04 +0000 (21:26 -0700)]
win32_capture: avoid unnecessary fallback to recursive scan
When doing the fast MFT scan (via FSCTL_QUERY_FILE_LAYOUT) and we find a
directory that needs to fall back to the standard scan, we actually only
need to fall back for the directory itself -- not also its children.
Optimize things accordingly.
Reported at https://wimlib.net/forums/viewtopic.php?f=1&t=533
Eric Biggers [Sun, 24 May 2020 18:22:36 +0000 (11:22 -0700)]
Remove obsolete Linux packaging files
There are now official Debian and Fedora packages for wimlib. So the
in-tree packaging files are redundant. Also I haven't tested them in a
long time, so there's a good chance they don't work properly anymore.
Eric Biggers [Fri, 22 May 2020 05:35:29 +0000 (22:35 -0700)]
Use memcpy() for unaligned accesses
For unaligned memory accesses, with modern compilers memcpy() is
compiled just as efficiently as __attribute__((packed)). This also
avoids using a nonstandard extension and potentially running into the
gcc 10 bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994.
Eric Biggers [Sun, 14 Apr 2019 06:21:42 +0000 (23:21 -0700)]
lcpit_matchfinder: fix limiting nice_match_len
The "normal" mode of the lcp-interval tree matchfinder supports finding
matches up to LCP_MAX bytes. The "huge" mode, which is needed on
buffers larger than 64 MiB, supports up to HUGE_LCP_MAX bytes.
nice_match_len must be limited to the appropriate one of these values.
But nice_match_len is limited by lcpit_matchfinder_init(). That's
wrong, because it only knows whether huge mode *might* be used later,
based on max_bufsize. Which mode to use is actually decided on a
buffer-by-buffer basis by lcpit_matchfinder_load_buffer().
Thus, limit nice_match_len in lcpit_matchfinder_load_buffer() instead.
This fixes a crash or incorrect output during LZMS compression with a
compression level > 50 and a chunk size > 64 MiB.