Page 1 of 1

Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Posted: Sat Jan 14, 2017 8:13 am
by chungy
I've discovered something alarming: A WIM file will be corrupt even directly after a capture, if using LZX compression and a chunk size greater than 32K. This doesn't always happen for all streams of data, but it seems to trigger most frequently on already-compressed files. As an example, linux-4.9.3.tar.xz is such a file that will not be stored in a WIM correctly. Just put that file into a directory and do "wimcapture --chunk 64k linux linux.wim" and you can see the bug triggered. Larger chunk sizes will trigger the error too.

Re: Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Posted: Sat Jan 14, 2017 9:39 am
by synchronicity
Aargh, thanks for reporting this. The bug was caused by a change to the LZX
compressor introduced in wimlib v1.10.0. Yes, it only affects highly
incompressible data (random or already compressed data). I guess that not many
people have actually been using the large chunk sizes, otherwise someone would
have noticed it sooner. And I guess I didn't do enough testing of large chunk
sizes myself.

I've posted wimlib-1.11.0-BETA7 which fixes the bug.

You can also use 'wimverify' if you want to identify any WIM archives that
contain corrupted file data due to this bug. Unfortunately any corrupted files
in the archives are not really recoverable as the bug resulted in parts of the
file data being omitted. However, any uncorrupted files within the archives
would still be recoverable.

Sorry if any inconvenience was caused!

Re: Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Posted: Sat Jan 14, 2017 12:39 pm
by chungy
Thanks for the quick turn around, much appreciated!