[Solved] WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Comments, questions, bug reports, etc.
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

[Solved] WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

Hello! I love wimlib cause I have a bunch of full snapshots of a certain directory and thanks to wimlib's (or well, wim in general) single store mechanism, the archives are very small.

However recently when working on an archive, I stumbled upon an always-reproducible bug - the software keeps crashing. And in some cases when it doesn't crash, it produces a corrupted WIM file (wimverify complains, even 7zip shows some mismatches so the file is definitely corrupted).

How can I help with analysing this?

I have a high end Threadripper system with a lot of ram so I generally compress like this:

wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16

But it crashed on me recently and then again so I went to investigate and I ran a bunch of them at the same time (with different target names of course)

This one worked:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16

This one worked too:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=4

This one crashed
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=8

This one crashed too...
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=1

.... I didn't have any stability issues on my system; I did a memory test and a stress test for a day and didn't notice any issues.

Is there perhaps some flag I can pass to wimlib to generate a dump or something like that to better understand what's happening?

Thanks!

EDIT: So it turns out that while everything was fine with the system when I built it, when I re-ran my stability tests yesterday to make sure everything is OK, I had a few memory "decay" errors (reading memory after writing it, after a certain period of time). The test has failed, which clearly means there was a problem with memory. I have narrowed down and replaced a faulty memory module and all of the commands above now work without issues.
Last edited by Vulpix on Wed May 15, 2019 12:34 pm, edited 1 time in total.
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by synchronicity »

A few questions:

- How much memory does the system have?
- In the --threads=1 case, does it crash all the time or just some of the time? And when it crashes, does it always happen at the same time (same amount of data written so far) or does it happen at different times?
- Does it still reproduce if you change --solid-compress=LZMS:100 to --solid-compress=LZMS:50?
- Does it still reproduce if you change --solid-chunk-size=128M to --solid-chunk-size=64M?
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

Hi!

The system has 128GB of RAM. I'll perform the tests you requested and report back once I have the results.
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

OK, here are the test results:

with: --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
Attempt 1: crash at 6442457292B ~6GB more or less exactly
Attempt 2: crash at 4294973644B ~4GB more or less exactly
Attempt 3: crash at 2147489996B ~2GB more or less exactly
Attempt 4: crash at 6442457292B ~6GB more or less exactly, and exact same number as attempt 1
Attempt 5: crash at 15032391884B ~14GB more or less exactly

with: --include-integrity --recompress --solid --solid-compress=LZMS:50 --solid-chunk-size=128M --threads=16
Attempt 6: crash at 20803754188B , not a multiple of 2GB anymore so something did change

with: --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=64M --threads=16
Attempt 7: Finished successfully, 205315593909B
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by synchronicity »

FYI I had asked for --threads=1 too. That would reveal whether the problem always occurs at the same place in the data.

Anyway, if it works reliably with --solid-compress=LZMS:100 --solid-chunk-size=64M as indicated by your Attempt 7, that's useful to know too; it suggests the problem is exclusive to --solid-chunk-size=128M. Some code is different for chunk sizes > 64M and not many people use it, so the bug could be in there...

Also just to clarify, is this with the Windows build of wimlib and if so which version? Or is this on Linux?

[Edit: please make sure to also run wimverify after all successful tests to check the resulting file.]
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

Hi!

My apologies, I somehow missed that.

I ran tests with --threads=1 as well now, and no; one crashed at 8.5GB, second at 30.5, I didn't run a third.

Also, this is the latest stable build of wimlib for windows x86_64 (v1.13.0), running on latest-build (as of today) windows 10 1703 Pro.

EDIT: I also ran wimverify on the lzms:100 64M file and it came back without issues.
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

I'm still having this issue.

Here is me trying to capture a rather large folder:

Code: Select all

C:\Windows\System32>wimcapture e:\Temp\ D:\Full.wim --check --solid --solid-compress=LZMS:100 --solid-chunk-size=128M
Scanning "e:\Temp\UNGUARDED\ReBax\"
1603 GiB scanned (3129988 files, 271159 directories)
Using LZMS compression with 32 threads
Archiving file data: 244 GiB of 245 GiB (99%) donee
C:\Windows\System32>
-> it crashed. At 99% :D maximum sadness.

Code: Select all

C:\Windows\System32>wimcapture e:\Temp\ D:\Full.wim --check --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
Scanning "e:\Temp\UNGUARDED\ReBax\"
1603 GiB scanned (3129988 files, 271159 directories)
Using LZMS compression with 16 threads
Archiving file data: 97 GiB of 1392 GiB (7%) done
C:\Windows\System32>
this time it crashed at 7% . I tried using fewer threads that time but result was really the same. The "GB" sizes change because this folder contains full backups and thus many files are identical, and so the actual amount of data shrinks as wimlib proceeds through it.

I am now running the compression with --solid-chunk-size=64M, which worked before.
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by synchronicity »

I've found and fixed a bug that may have caused this. It resulted in a crash or incorrect output when LZMS compression was used with both a compression level and a chunk size greater than the defaults.

The fix is in wimlib-1.13.1-BETA1. Can you try it from the Downloads page?

However, if the information you reported is accurate, there may still be another problem as well. Compression level 50 (the default level) isn't affected by the bug I fixed, but you mentioned you still saw a crash with --solid-compress=LZMS:50. Also, the bug I fixed would cause a crash at the end (e.g. 99% done), not earlier as you reported in some cases.

So please let me know if it still crashes or doesn't pass wimverify, and in exactly what cases.
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

Hi!

Thanks for the info, I'll test this.

I've now run into a very strange problem - several times; so I am not sure what to think.

I create a wim, it checks out fine via wimverify, and I even used wimapply to extract its contents and verify all of their sha512 checksum. Everything matched.

However, I then ran wimexport to recompress it into a new wim, and it suddenly complained about corruption at the first gigabyte of data:

Code: Select all

Verifying integrity of "X:\Archive-up-to-2019Q1.wim": 1 GiB of 176 GiB (0%) doneERROR: Exiting with error code 13:
       The WIM file is corrupted (failed integrity check).

[ERROR] A WIM resource is corrupted!B (0%) done
        WIM file: "X:\Archive-up-to-2019Q1.wim"
        Blob uncompressed size: 48130417
        Resource offset in WIM: 2904036
        Resource uncompressed size: 274417702442
        Resource size in WIM: 189775265486
        Resource flags: 0x10
        Resource compression type: LZX
        Resource compression chunk size: 2097152
        Expected SHA-1: 62b1651e052012bb9b85d7abf8a15ade997e2aaf
        Actual SHA-1: 91a73fd5342033ab653c421e7d780891f5733cd2
This is really strange to me because I have verified this file using wimverify before, I have also verified its contents using just regular sha512 checksums, and they all matched.

I'm now running wimapply on the file again to get its data back to see if there is something damaged or not. If there is, I don't understand how it could have happened. The file has a timestamp of when it was created, and there was no reason for it to become damaged.

I'll post here with my findings but the contents are large so it may take a day or two to export and re-check them.
Vulpix
Posts: 14
Joined: Fri Jan 25, 2019 7:01 am

Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count

Post by Vulpix »

Interestingly, with the beta version, I didn't get a crash at all. And data checks out fine.

Code: Select all

Microsoft Windows [Version 10.0.15063]
(c) 2017 Microsoft Corporation. All rights reserved.

d:\Programs\Programs\wimlib>wimcapture X:\ReBax\ Y:\Full.wim --check --solid --solid-compress=LZMS:100 --solid-chunk-size=128M
Scanning "X:\ReBax\"
1673 GiB scanned (3258938 files, 286253 directories)
Using LZMS compression with 32 threads
Archiving file data: 255 GiB of 255 GiB (100%) done
Calculating integrity table for WIM: 168 GiB of 168 GiB (100%) done

d:\Programs\Programs\wimlib>wimverify Y:\Full.wim
Verifying integrity of "Y:\Full.wim": 168 GiB of 168 GiB (100%) done
Verifying metadata for image 1 of 1
Verifying file data: 255 GiB of 255 GiB (100%) done

"Y:\Full.wim" was successfully verified.

d:\Programs\Programs\wimlib>wimapply Y:\Full.wim X:\ReBax --check
Verifying integrity of "Y:\Full.wim": 168 GiB of 168 GiB (100%) done
Applying image 1 ("ReBax") from "Y:\Full.wim" to directory "X:\ReBax"
Creating files: 3545191 of 3545191 (100%) done
Extracting file data: 1673 GiB of 1673 GiB (100%) done
Applying metadata to files: 3545191 of 3545191 (100%) done
Done applying WIM image.

d:\Programs\Programs\wimlib>
d:\Programs\Programs\wimlib>
d:\Programs\Programs\wimlib>
d:\Programs\Programs\wimlib>wimverify Y:\Full.wim
Verifying integrity of "Y:\Full.wim": 168 GiB of 168 GiB (100%) done
Verifying metadata for image 1 of 1
Verifying file data: 255 GiB of 255 GiB (100%) done
I'll try some other things that I previously tried that failed (recompress via wimexport). But these results are encouraging!
Post Reply