Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Comments, questions, bug reports, etc.
Post Reply
chungy
Posts: 30
Joined: Mon Feb 15, 2016 3:40 am

Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Post by chungy »

I've discovered something alarming: A WIM file will be corrupt even directly after a capture, if using LZX compression and a chunk size greater than 32K. This doesn't always happen for all streams of data, but it seems to trigger most frequently on already-compressed files. As an example, linux-4.9.3.tar.xz is such a file that will not be stored in a WIM correctly. Just put that file into a directory and do "wimcapture --chunk 64k linux linux.wim" and you can see the bug triggered. Larger chunk sizes will trigger the error too.
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Post by synchronicity »

Aargh, thanks for reporting this. The bug was caused by a change to the LZX
compressor introduced in wimlib v1.10.0. Yes, it only affects highly
incompressible data (random or already compressed data). I guess that not many
people have actually been using the large chunk sizes, otherwise someone would
have noticed it sooner. And I guess I didn't do enough testing of large chunk
sizes myself.

I've posted wimlib-1.11.0-BETA7 which fixes the bug.

You can also use 'wimverify' if you want to identify any WIM archives that
contain corrupted file data due to this bug. Unfortunately any corrupted files
in the archives are not really recoverable as the bug resulted in parts of the
file data being omitted. However, any uncorrupted files within the archives
would still be recoverable.

Sorry if any inconvenience was caused!
chungy
Posts: 30
Joined: Mon Feb 15, 2016 3:40 am

Re: Consistent data corruption bug with LZX and non-default (>32K) chunk sizes

Post by chungy »

Thanks for the quick turn around, much appreciated!
Post Reply