[Solved] WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
[Solved] WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
Hello! I love wimlib cause I have a bunch of full snapshots of a certain directory and thanks to wimlib's (or well, wim in general) single store mechanism, the archives are very small.
However recently when working on an archive, I stumbled upon an always-reproducible bug - the software keeps crashing. And in some cases when it doesn't crash, it produces a corrupted WIM file (wimverify complains, even 7zip shows some mismatches so the file is definitely corrupted).
How can I help with analysing this?
I have a high end Threadripper system with a lot of ram so I generally compress like this:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
But it crashed on me recently and then again so I went to investigate and I ran a bunch of them at the same time (with different target names of course)
This one worked:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
This one worked too:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=4
This one crashed
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=8
This one crashed too...
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=1
.... I didn't have any stability issues on my system; I did a memory test and a stress test for a day and didn't notice any issues.
Is there perhaps some flag I can pass to wimlib to generate a dump or something like that to better understand what's happening?
Thanks!
EDIT: So it turns out that while everything was fine with the system when I built it, when I re-ran my stability tests yesterday to make sure everything is OK, I had a few memory "decay" errors (reading memory after writing it, after a certain period of time). The test has failed, which clearly means there was a problem with memory. I have narrowed down and replaced a faulty memory module and all of the commands above now work without issues.
However recently when working on an archive, I stumbled upon an always-reproducible bug - the software keeps crashing. And in some cases when it doesn't crash, it produces a corrupted WIM file (wimverify complains, even 7zip shows some mismatches so the file is definitely corrupted).
How can I help with analysing this?
I have a high end Threadripper system with a lot of ram so I generally compress like this:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
But it crashed on me recently and then again so I went to investigate and I ran a bunch of them at the same time (with different target names of course)
This one worked:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
This one worked too:
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=4
This one crashed
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=8
This one crashed too...
wimexport source.wim all target.wim --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=1
.... I didn't have any stability issues on my system; I did a memory test and a stress test for a day and didn't notice any issues.
Is there perhaps some flag I can pass to wimlib to generate a dump or something like that to better understand what's happening?
Thanks!
EDIT: So it turns out that while everything was fine with the system when I built it, when I re-ran my stability tests yesterday to make sure everything is OK, I had a few memory "decay" errors (reading memory after writing it, after a certain period of time). The test has failed, which clearly means there was a problem with memory. I have narrowed down and replaced a faulty memory module and all of the commands above now work without issues.
Last edited by Vulpix on Wed May 15, 2019 12:34 pm, edited 1 time in total.
-
- Site Admin
- Posts: 474
- Joined: Sun Aug 02, 2015 10:31 pm
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
A few questions:
- How much memory does the system have?
- In the --threads=1 case, does it crash all the time or just some of the time? And when it crashes, does it always happen at the same time (same amount of data written so far) or does it happen at different times?
- Does it still reproduce if you change --solid-compress=LZMS:100 to --solid-compress=LZMS:50?
- Does it still reproduce if you change --solid-chunk-size=128M to --solid-chunk-size=64M?
- How much memory does the system have?
- In the --threads=1 case, does it crash all the time or just some of the time? And when it crashes, does it always happen at the same time (same amount of data written so far) or does it happen at different times?
- Does it still reproduce if you change --solid-compress=LZMS:100 to --solid-compress=LZMS:50?
- Does it still reproduce if you change --solid-chunk-size=128M to --solid-chunk-size=64M?
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
Hi!
The system has 128GB of RAM. I'll perform the tests you requested and report back once I have the results.
The system has 128GB of RAM. I'll perform the tests you requested and report back once I have the results.
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
OK, here are the test results:
with: --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
Attempt 1: crash at 6442457292B ~6GB more or less exactly
Attempt 2: crash at 4294973644B ~4GB more or less exactly
Attempt 3: crash at 2147489996B ~2GB more or less exactly
Attempt 4: crash at 6442457292B ~6GB more or less exactly, and exact same number as attempt 1
Attempt 5: crash at 15032391884B ~14GB more or less exactly
with: --include-integrity --recompress --solid --solid-compress=LZMS:50 --solid-chunk-size=128M --threads=16
Attempt 6: crash at 20803754188B , not a multiple of 2GB anymore so something did change
with: --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=64M --threads=16
Attempt 7: Finished successfully, 205315593909B
with: --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
Attempt 1: crash at 6442457292B ~6GB more or less exactly
Attempt 2: crash at 4294973644B ~4GB more or less exactly
Attempt 3: crash at 2147489996B ~2GB more or less exactly
Attempt 4: crash at 6442457292B ~6GB more or less exactly, and exact same number as attempt 1
Attempt 5: crash at 15032391884B ~14GB more or less exactly
with: --include-integrity --recompress --solid --solid-compress=LZMS:50 --solid-chunk-size=128M --threads=16
Attempt 6: crash at 20803754188B , not a multiple of 2GB anymore so something did change
with: --include-integrity --recompress --solid --solid-compress=LZMS:100 --solid-chunk-size=64M --threads=16
Attempt 7: Finished successfully, 205315593909B
-
- Site Admin
- Posts: 474
- Joined: Sun Aug 02, 2015 10:31 pm
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
FYI I had asked for --threads=1 too. That would reveal whether the problem always occurs at the same place in the data.
Anyway, if it works reliably with --solid-compress=LZMS:100 --solid-chunk-size=64M as indicated by your Attempt 7, that's useful to know too; it suggests the problem is exclusive to --solid-chunk-size=128M. Some code is different for chunk sizes > 64M and not many people use it, so the bug could be in there...
Also just to clarify, is this with the Windows build of wimlib and if so which version? Or is this on Linux?
[Edit: please make sure to also run wimverify after all successful tests to check the resulting file.]
Anyway, if it works reliably with --solid-compress=LZMS:100 --solid-chunk-size=64M as indicated by your Attempt 7, that's useful to know too; it suggests the problem is exclusive to --solid-chunk-size=128M. Some code is different for chunk sizes > 64M and not many people use it, so the bug could be in there...
Also just to clarify, is this with the Windows build of wimlib and if so which version? Or is this on Linux?
[Edit: please make sure to also run wimverify after all successful tests to check the resulting file.]
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
Hi!
My apologies, I somehow missed that.
I ran tests with --threads=1 as well now, and no; one crashed at 8.5GB, second at 30.5, I didn't run a third.
Also, this is the latest stable build of wimlib for windows x86_64 (v1.13.0), running on latest-build (as of today) windows 10 1703 Pro.
EDIT: I also ran wimverify on the lzms:100 64M file and it came back without issues.
My apologies, I somehow missed that.
I ran tests with --threads=1 as well now, and no; one crashed at 8.5GB, second at 30.5, I didn't run a third.
Also, this is the latest stable build of wimlib for windows x86_64 (v1.13.0), running on latest-build (as of today) windows 10 1703 Pro.
EDIT: I also ran wimverify on the lzms:100 64M file and it came back without issues.
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
I'm still having this issue.
Here is me trying to capture a rather large folder:
-> it crashed. At 99% maximum sadness.
this time it crashed at 7% . I tried using fewer threads that time but result was really the same. The "GB" sizes change because this folder contains full backups and thus many files are identical, and so the actual amount of data shrinks as wimlib proceeds through it.
I am now running the compression with --solid-chunk-size=64M, which worked before.
Here is me trying to capture a rather large folder:
Code: Select all
C:\Windows\System32>wimcapture e:\Temp\ D:\Full.wim --check --solid --solid-compress=LZMS:100 --solid-chunk-size=128M
Scanning "e:\Temp\UNGUARDED\ReBax\"
1603 GiB scanned (3129988 files, 271159 directories)
Using LZMS compression with 32 threads
Archiving file data: 244 GiB of 245 GiB (99%) donee
C:\Windows\System32>
Code: Select all
C:\Windows\System32>wimcapture e:\Temp\ D:\Full.wim --check --solid --solid-compress=LZMS:100 --solid-chunk-size=128M --threads=16
Scanning "e:\Temp\UNGUARDED\ReBax\"
1603 GiB scanned (3129988 files, 271159 directories)
Using LZMS compression with 16 threads
Archiving file data: 97 GiB of 1392 GiB (7%) done
C:\Windows\System32>
I am now running the compression with --solid-chunk-size=64M, which worked before.
-
- Site Admin
- Posts: 474
- Joined: Sun Aug 02, 2015 10:31 pm
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
I've found and fixed a bug that may have caused this. It resulted in a crash or incorrect output when LZMS compression was used with both a compression level and a chunk size greater than the defaults.
The fix is in wimlib-1.13.1-BETA1. Can you try it from the Downloads page?
However, if the information you reported is accurate, there may still be another problem as well. Compression level 50 (the default level) isn't affected by the bug I fixed, but you mentioned you still saw a crash with --solid-compress=LZMS:50. Also, the bug I fixed would cause a crash at the end (e.g. 99% done), not earlier as you reported in some cases.
So please let me know if it still crashes or doesn't pass wimverify, and in exactly what cases.
The fix is in wimlib-1.13.1-BETA1. Can you try it from the Downloads page?
However, if the information you reported is accurate, there may still be another problem as well. Compression level 50 (the default level) isn't affected by the bug I fixed, but you mentioned you still saw a crash with --solid-compress=LZMS:50. Also, the bug I fixed would cause a crash at the end (e.g. 99% done), not earlier as you reported in some cases.
So please let me know if it still crashes or doesn't pass wimverify, and in exactly what cases.
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
Hi!
Thanks for the info, I'll test this.
I've now run into a very strange problem - several times; so I am not sure what to think.
I create a wim, it checks out fine via wimverify, and I even used wimapply to extract its contents and verify all of their sha512 checksum. Everything matched.
However, I then ran wimexport to recompress it into a new wim, and it suddenly complained about corruption at the first gigabyte of data:
This is really strange to me because I have verified this file using wimverify before, I have also verified its contents using just regular sha512 checksums, and they all matched.
I'm now running wimapply on the file again to get its data back to see if there is something damaged or not. If there is, I don't understand how it could have happened. The file has a timestamp of when it was created, and there was no reason for it to become damaged.
I'll post here with my findings but the contents are large so it may take a day or two to export and re-check them.
Thanks for the info, I'll test this.
I've now run into a very strange problem - several times; so I am not sure what to think.
I create a wim, it checks out fine via wimverify, and I even used wimapply to extract its contents and verify all of their sha512 checksum. Everything matched.
However, I then ran wimexport to recompress it into a new wim, and it suddenly complained about corruption at the first gigabyte of data:
Code: Select all
Verifying integrity of "X:\Archive-up-to-2019Q1.wim": 1 GiB of 176 GiB (0%) doneERROR: Exiting with error code 13:
The WIM file is corrupted (failed integrity check).
[ERROR] A WIM resource is corrupted!B (0%) done
WIM file: "X:\Archive-up-to-2019Q1.wim"
Blob uncompressed size: 48130417
Resource offset in WIM: 2904036
Resource uncompressed size: 274417702442
Resource size in WIM: 189775265486
Resource flags: 0x10
Resource compression type: LZX
Resource compression chunk size: 2097152
Expected SHA-1: 62b1651e052012bb9b85d7abf8a15ade997e2aaf
Actual SHA-1: 91a73fd5342033ab653c421e7d780891f5733cd2
I'm now running wimapply on the file again to get its data back to see if there is something damaged or not. If there is, I don't understand how it could have happened. The file has a timestamp of when it was created, and there was no reason for it to become damaged.
I'll post here with my findings but the contents are large so it may take a day or two to export and re-check them.
Re: WimLib random crash when compressing @ LZMS:100, solid, 128M chunk with varying thread count
Interestingly, with the beta version, I didn't get a crash at all. And data checks out fine.
I'll try some other things that I previously tried that failed (recompress via wimexport). But these results are encouraging!
Code: Select all
Microsoft Windows [Version 10.0.15063]
(c) 2017 Microsoft Corporation. All rights reserved.
d:\Programs\Programs\wimlib>wimcapture X:\ReBax\ Y:\Full.wim --check --solid --solid-compress=LZMS:100 --solid-chunk-size=128M
Scanning "X:\ReBax\"
1673 GiB scanned (3258938 files, 286253 directories)
Using LZMS compression with 32 threads
Archiving file data: 255 GiB of 255 GiB (100%) done
Calculating integrity table for WIM: 168 GiB of 168 GiB (100%) done
d:\Programs\Programs\wimlib>wimverify Y:\Full.wim
Verifying integrity of "Y:\Full.wim": 168 GiB of 168 GiB (100%) done
Verifying metadata for image 1 of 1
Verifying file data: 255 GiB of 255 GiB (100%) done
"Y:\Full.wim" was successfully verified.
d:\Programs\Programs\wimlib>wimapply Y:\Full.wim X:\ReBax --check
Verifying integrity of "Y:\Full.wim": 168 GiB of 168 GiB (100%) done
Applying image 1 ("ReBax") from "Y:\Full.wim" to directory "X:\ReBax"
Creating files: 3545191 of 3545191 (100%) done
Extracting file data: 1673 GiB of 1673 GiB (100%) done
Applying metadata to files: 3545191 of 3545191 (100%) done
Done applying WIM image.
d:\Programs\Programs\wimlib>
d:\Programs\Programs\wimlib>
d:\Programs\Programs\wimlib>
d:\Programs\Programs\wimlib>wimverify Y:\Full.wim
Verifying integrity of "Y:\Full.wim": 168 GiB of 168 GiB (100%) done
Verifying metadata for image 1 of 1
Verifying file data: 255 GiB of 255 GiB (100%) done