LZMS solid multithreading issue (or bug)

Comments, questions, bug reports, etc.
Post Reply
ST83
Posts: 2
Joined: Tue Apr 01, 2025 7:25 pm

LZMS solid multithreading issue (or bug)

Post by ST83 »

Hello there, and thanx for releasing wimlib!

I am currently using the latest wimlib 1.14.4 64-bit Windows binaries with an Intel i7-4790K processor (4C/8T) and 32 GB DDR3-2400 (31.9 GB usable, 4 GB reserved for a RAM Disk) + 96 GB virtual memory. I've noticed that multi-threading is disabled if LZMS solid compression is used with 256M, 512M or 1G solid chunk sizes.

I've made some compression tests and analyzed the memory usage with Process Explorer with this command line:

Code: Select all

wimoptimize setup.wim --recompress --compress=LZMS:203 --chunk-size=1G --solid --threads=8 --solid-chunk-size=256M
  • Using --solid-chunk-size=1G and --threads=8 or --threads=4 or --threads=2:
    Threads = 1, Private bytes = 15'238'104 K first, then 14'712'880 K, Virtual size = 15'247'400 K first, then 14'729'248 K.
  • Using --solid-chunk-size=512M and --threads=8 or --threads=4 or --threads=2:
    Threads = 1, Private bytes = 7'883'708 K first, then 7'358'388 K, Virtual size = 7'907'368 K first, then 7'383'072 K.
  • Using --solid-chunk-size=256M and --threads=8 or --threads=4 or --threads=2:
    Threads = 1, Private bytes = 4'206'516 K first, then 3'681'284 K, Virtual size = 4'237'352 K first, then 3'719'200 K.
  • Using --solid-chunk-size=128M and --threads=8 (the maximum solid chunk size with multi-threading):
    Threads = 8, Private bytes = 15'245'212 K first, then 14'719'988 K, Virtual size = 15'276'184 K first, then 14'758'032 K.
With 1G, 512M and 256M, wimoptimize always run single-threaded (outputting string "Using LZMS compression with 1 thread").

While not having enough physical RAM could maybe be justifiable with 1G solid chunk size (about 14.5 GB per thread), I don't understand why multi-threading is still disabled with 512M or 256M solid chunk sizes, even reducing threads to 6, 4 or 2, which the physical RAM required is less compared to that required for 1G, worse the fact that virtual RAM is not taken into account at all and my PC has already maxed out all available RAM slots (32 GB is the chipset's limitation).

Plus, ALL of these options below were completely useless:
  • Omitting or changing the non-solid chunk size;
  • Omitting or changing the LZMS level to default 50, 100 or even the worst LZMS:1 (no doubt worse than LZX);
  • Decompressing the archive first with --compress=NONE and then compressing (--recompress slows down the process and can increase RAM usage).
I don't need Microsoft software compatibility, I'm good with wimlib for managing WIM archives and to apply Windows images from scratch (higher/best compression ratio).

Please give a look into this and please fix it, thanx!
synchronicity
Site Admin
Posts: 490
Joined: Sun Aug 02, 2015 10:31 pm

Re: LZMS solid multithreading issue (or bug)

Post by synchronicity »

That's strange. Is it possible that your setup.wim has an uncompressed size between 128 MiB and 256 MiB? That's the only case that I can think of where this would happen. wimlib checks whether the total uncompressed size of the file data is less than or equal to the chunk size (the chunk size that will be used, so that would be the --solid-chunk-size since you're using --solid), and if it is, it sets the number of threads to 1 since it knows there will only be 1 chunk. Otherwise it sets the number of threads according to the number of CPUs, available memory, and memory needed per thread, without considering how many chunks there will be. That would explain the jump from 1 to 8 threads when going from 256 MiB to 128 MiB. But if it's not that, I don't know how it could happen.
ST83
Posts: 2
Joined: Tue Apr 01, 2025 7:25 pm

Re: LZMS solid multithreading issue (or bug)

Post by ST83 »

Hello there, and thanx for the reply!

My apologies, my first WIM reference was a stripped out image #1 of an authentic Windows 10 ESD named "Windows Setup Media" (the files needed for booting into Windows PE Setup mode before launching Setup), which had an uncompressed size of 250.87 MiB.

I've immediately changed the WIM reference with BOOT.WIM and INSTALL.WIM of the latest Windows 10 ISO updated 2024-11, which have an uncompressed size of 1237.50 MiB and 11836.90 MiB, respectively, but there are no big changes anyway: despite I can use 3 threads with 256 MiB solid chunk size with BOOT.WIM, with 1G it still forces 1 thread because physical memory is not enough (still 14.50÷16.10 GiB needed and there will be no way to dump some MiB in the pagefile).

I've made a test compression for BOOT.WIM this morning (before going to workplace) with this command line, along with its output:

Code: Select all

wimoptimize BOOT.WIM --recompress --compress=LZMS:203 --threads=8 --solid --solid-chunk-size=1G

"R:\BOOT.WIM" original size: 795708 KiB
[WARNING] Wanted to use 8 threads, but limiting to 1 to fit in available memory!
Using LZMS compression with 1 thread
Archiving file data: 1237 MiB of 1237 MiB (100%) done
"R:\BOOT.WIM" optimized size: 356158 KiB
Space saved: 439550 KiB
RAM usage: Private bytes 14'717'964 K, virtual size 14'751'584 K

In this case the original WIM contained some old unreferenced files (the [DELETED] folder was present), so at first glance could be a huge ratio (44.76%, i.e. saved 55.24% from previous size) but in reality the real ratio has to be compared from an already LZX:50 compressed size of 543438 KiB (65.54%, i.e. 34.46% from previous size). As expected, because it assumes only physical RAM for the process, it dropped to single-threading and execution time increased to 00:16:17.766 (and because the progress bar advances only when dumping chunks into the file, it reported 0 MiB for almost all the time elapsed but the very last seconds).

As stated previously, different LZMS level (even to the worst, LZMS:1) has identical output, from compression to execution times and progress bar updates.

I've made another test compression for BOOT.WIM with this command line and output, this time from its uncompressed form:

Code: Select all

wimoptimize boot.wim --recompress --compress=LZMS:203 --threads=8 --solid --solid-chunk-size=256M

"R:\boot.wim" original size: 1278528 KiB
[WARNING] Wanted to use 8 threads, but limiting to 7 to fit in available memory!
Using LZMS compression with 7 threads
Archiving file data: 1237 MiB of 1237 MiB (100%) done
"R:\boot.wim" optimized size: 363719 KiB
Space saved: 914809 KiB
RAM usage: Private bytes 25'756'784 K, Virtual size 25'776'064 K

In this case, compression time was 00:04:37.521 (multi-threading has been applied, despite only 7 due to physical memory), but progress bar still reported 0 MiB for almost all the time elapsed but last 10-15 seconds.

Are there no chances to use virtual memory, even if I have a same-spec system with a RAID controller with 16x 931 GiB (1.0 TB) spinning disks in a huge RAID-60 volume (I don't use RAID-0 as virtual memory by choice) which outclasses and outperform physical RAM speed? (RAM reaches about 30 GiB/s while RAID-60 goes from 56 GiB/s in the fastest zone to 42 GiB/s in the middle zone, not to mention the further increase if one day I upgrade these disks to SSDs)

I don't know if I have to open a new topic or keep this topic for the progress bar issue, but: is there no way to display some info about memory compressing status instead of solely displaying dumped MiB chunks to file (which takes about half an eternity before updating)?

Whenever I have some spare time for testing multi-threading compression for INSTALL.WIM, I'll post here as soon as possible.

In the meantime, hope to find a fix for this! Thanx a lot for your help in everything.
synchronicity
Site Admin
Posts: 490
Joined: Sun Aug 02, 2015 10:31 pm

Re: LZMS solid multithreading issue (or bug)

Post by synchronicity »

A lot of the memory used during compression is accessed randomly, not sequentially. So latency is super important, not just throughput. Having it be paged in and out of disk would be extremely slow. Probably multiple orders of magnitude slower which would easily defeat the point of using any more threads.

As you noticed, the progress basically gets updated after each chunk, not during each chunk. Doing intra-chunk progress updates would require plumbing the progress update down into the compression code, which would be complex and would add additional overhead. Given that this only matters for very large chunk sizes which most users don't use, and it only affects the progress status updates, I don't currently have any plans to do this. But it's something I'll keep in mind.
Basto
Posts: 3
Joined: Fri Jun 06, 2025 1:38 pm

Re: LZMS solid multithreading issue (or bug)

Post by Basto »

Wimlib limits LZMS multi-threading to 128 MB chunks due to high RAM use per thread. Larger chunks force single-thread to avoid crashes. Virtual memory isn’t used because swapping hurts performance. This is by design, not a bug. For multi-threading with big chunks, use LZX or keep chunks at 128 MB or less.
Post Reply