The true potential of the LZX compression algorithm

Comments, questions, bug reports, etc.
Post Reply
LZX
Posts: 5
Joined: Mon Jul 08, 2019 6:41 am

The true potential of the LZX compression algorithm

Post by LZX » Mon Jul 08, 2019 9:02 am

Hi everyone!

I have made a few compression tests and was VERY surprised. I took my favorite portable version of Yet Another Terminal 2.0.0 (61 files) and tried to compress it in different ways. Here's my results.

Uncompressed data: 17 713 398 bytes in 61 files.

7-Zip v18.05 ZIP (Deflate) Ultra compression, 258-bit word: 7 106 212 bytes.

Wimlib v1.13.1 LZX level 500 compression: 7 081 092 bytes.

Wimlib v1.13.1 LZMS level 500 no-solid compression: 6 436 972 bytes.

WinCab v3.0 with 21 bits LZX compression window (LZX:21): 3 929 150 bytes !!!

The question is: what's wrong with WIM LZX compression? Why it is using the weakest LZX:15 compression instead of LZX:21, like in WinCab? Even CHM format uses 17 bits... How can I use LZX:21 (or even better?) compression level in WIM?

synchronicity
Site Admin
Posts: 296
Joined: Sun Aug 02, 2015 10:31 pm

Re: The true potential of the LZX compression algorithm

Post by synchronicity » Mon Jul 08, 2019 3:17 pm

Why it is using the weakest LZX:15 compression instead of LZX:21, like in WinCab?
Because that's the only LZX setting that's compatible with Microsoft's WIM software (WIMGAPI, DISM, ImageX).

If you don't mind your archives being compatible with wimlib only, you can use:

Code: Select all

wimcapture --solid --solid-compress=lzx:100 --solid-chunk-size=2m
Also consider using LZMS solid-mode compression, which usually gives a better compression ratio and is compatible with Microsoft's WIM software:

Code: Select all

wimcapture --solid

LZX
Posts: 5
Joined: Mon Jul 08, 2019 6:41 am

Re: The true potential of the LZX compression algorithm

Post by LZX » Mon Jul 08, 2019 4:09 pm

synchronicity wrote:
Mon Jul 08, 2019 3:17 pm
Because that's the only LZX setting that's compatible with Microsoft's WIM software (WIMGAPI, DISM, ImageX).
Very strange politics of Microsoft... VERY strange. Why don't they use all the power of LXZ? Is it again some kind of stupid marketing?
synchronicity wrote:
Mon Jul 08, 2019 3:17 pm
If you don't mind your archives being compatible with wimlib only, you can use:

Code: Select all

wimcapture --solid --solid-compress=lzx:100 --solid-chunk-size=2m
I tried this options, but the result is still much worse than non-solid CAB LZX:21 archive...
wimlib solid LZX: 4 621 528 bytes.
WinCab LZX:21: 3 929 150 bytes.
synchronicity wrote:
Mon Jul 08, 2019 3:17 pm
Also consider using LZMS solid-mode compression, which usually gives a better compression ratio and is compatible with Microsoft's WIM software:

Code: Select all

wimcapture --solid
The result is much better: 3 405 478 bytes. But unfortunately there is a very big problem using solid archives in R/W mode. Also they are not compatible with my favorite Windows 7. Is there some kind update of Win7 to support LZMS WIM archives?

synchronicity
Site Admin
Posts: 296
Joined: Sun Aug 02, 2015 10:31 pm

Re: The true potential of the LZX compression algorithm

Post by synchronicity » Mon Jul 08, 2019 4:31 pm

CAB has an advantage in compression ratio because it doesn't divide the data into independently decompressible chunks, whereas WIM does. But if you don't need independently decompressible chunks, you might as well use LZMS compression with its much larger chunks.

I don't believe DISM in Windows 7 ever supported LZMS compression. But you can use wimlib on Windows 7.

LZX
Posts: 5
Joined: Mon Jul 08, 2019 6:41 am

Re: The true potential of the LZX compression algorithm

Post by LZX » Wed Jul 10, 2019 8:16 pm

synchronicity wrote:
Mon Jul 08, 2019 4:31 pm
CAB has an advantage in compression ratio because it doesn't divide the data into independently decompressible chunks, whereas WIM does. But if you don't need independently decompressible chunks, you might as well use LZMS compression with its much larger chunks.
Hm... I don't think that is quite correct, because the difference between 3,74 Mb and 6,75 Mb is TOO big. I have made one more test and compressed my YAT with LZX:15 method. The result is 6,66 Mb CAB file. Comparing with 6,75 Mb WIM file we can see the small 90 Kb overhead of the WIM compression method (independently decompressible chunks). Here's screenshot of WinCab compression options:
.
WinCab.png
WinCab.png (6.81 KiB) Viewed 351 times
.
So the Achilles' heel of WIM format is impossibility of using lager compression windows, only 15 bits. That's SAD.

LZX
Posts: 5
Joined: Mon Jul 08, 2019 6:41 am

Re: The true potential of the LZX compression algorithm

Post by LZX » Wed Jul 10, 2019 8:58 pm

About LZX compression window in MSDN: https://docs.microsoft.com/en-us/opensp ... c775976fcc

Interesting topic about LZX: https://encode.ru/threads/2665-Super-Microsoft-LZX
Zyzzyva wrote there:
At some point I might update wimlib to support 2^25 byte windows for LZX. Currently it only supports up to 2^21 bytes, though that's already more than Microsoft's WIM software which only supports 2^15 bytes.
The question is WHY Microsoft WIM software doesn't support more than 2^15 bytes?! I think It is really abnormal.

synchronicity
Site Admin
Posts: 296
Joined: Sun Aug 02, 2015 10:31 pm

Re: The true potential of the LZX compression algorithm

Post by synchronicity » Thu Jul 11, 2019 2:07 am

wimlib supports LZX window size up to 2 MiB. You simply need to use the options I suggested. But as I said, WIM uses independently decompressible chunks, which can result in worse compression ratio than CAB at same LZX window size, even though wimlib's LZX compressor (given a high enough compression level parameter) is marginally better on equal footing.

I don't know why Microsoft doesn't support larger LZX chunk sizes in WIM. Maybe they just wanted one format suitable for mounting with random access and one highly compressed format, not something in between.

LZX
Posts: 5
Joined: Mon Jul 08, 2019 6:41 am

Re: The true potential of the LZX compression algorithm

Post by LZX » Thu Jul 11, 2019 6:23 am

Thank you for your replies!
So, the logic of Microsoft is clear: to think about the money, not about the users. Sad. Very sad.
They could make a DECOMPRESSION support, at least. But nope, money is more important for them.
P.S. Wimlib rules! It's not like those M$ products, it's really cool! Thanx again!

Post Reply