Page 1 of 1

Adapting wimlib to CAB LZX

Posted: Sat Dec 31, 2022 3:42 pm
by ncommander
Hi,

I've been trying to use wimlib to compress data using the LZX format. However, I require the compressed data to be readable by libraries like mspack.
As far as I'm concerned, there are no CAB LZX compressor implementations out there, except lzxcomp by Matthew T. Russotto, which has suboptimal performance (over 1min to compress a 50MB file, with no compression level or options available).

So far, I've modified how the block headers are written to match the CAB format, as well as writing the LZX header that determines whether Intel E8 is used.
I compress the data 32KB at a time (LZX frame) and mspack can decode the output perfectly!
There's just one issue: the decoder needs to be reset for each LZX frame, or the MAINTREE and LENGTH tables fail to build. It seems the lengths read from one frame affect another. Do you have any idea how this could be fixed, and if so, would the changes to wimlib's aligned/verbatim compression be deep? Alternatively, as a workaround, is there a valid LZX block that could clear these Huffman lengths?

I totally understand if you can't provide any answers as the topic clearly falls outside of wimlib, but I would appreciate any pointers to which functions would require changing, if additional state would have to be introduced, or an alternative to wimlib for the task.

Thanks for your time! :)

Re: Adapting wimlib to CAB LZX

Posted: Sat Dec 31, 2022 8:06 pm
by synchronicity
The LZX compression code in wimlib has been heavily optimized for the case where the data to compress is in a single buffer, **and** matches can be made with the whole buffer. That's what the WIM format uses.

In CAB, there is instead a long stream (potentially gigabytes, I think?), where matches can be made with the last 32KiB only. That's quite a bit different.

The minimal change to get it working would be to support compressing such a stream in a single buffer, with the matchfinder changed to find matches in the last 32KiB only. This is feasible (it's what I do for DEFLATE in https://github.com/ebiggers/libdeflate), but it would still be a significant change.

To get it working **properly** would require adding streaming support, so that data can be streamed in incrementally, so that potentially gigabytes of memory isn't needed. That would be a much larger change, and I've stayed away from that sort of thing in all the compressors I've written.

Re: Adapting wimlib to CAB LZX

Posted: Mon Jun 02, 2025 12:57 am
by oneeighthundred
Old post sorry but just FYI I am attempting to do this in a fork:
https://github.com/elasota/wimlib

So far it is working but I still need to finish adding cross-block matchfinding.

LZX window size max is only 2MB for CAB and 64MB for LZX DELTA and it has to do full bitstream flushes every 32kb for both of them anyway, so I'm going to try adding streaming by just adding a buffer that prepends the window to the input block and then avoid resetting the match finder between blocks. This isn't ideal but it should work as long as the prepend buffer doesn't have to be relocated too often.

I asked about this earlier and sounds like there is not interest in upstreaming it since it's out-of-scope for wimlib but will see if I can get it into gcab or Wine's cabinet.dll eventually.

Re: Adapting wimlib to CAB LZX

Posted: Wed Jun 25, 2025 1:02 pm
by Basto
You’re right that mspack expects the decoder to reset Huffman tables each frame. Wimlib’s LZX code assumes a continuous stream, so it doesn’t reset tables per frame by default. Fixing this means deep changes in how wimlib handles Huffman state across blocks.
There’s no special LZX block to reset Huffman tables, you have to reset the decoder yourself between frames.
Your best bet is to keep resetting the decoder per frame externally or modify wimlib to do that internally, which is complex. Otherwise, tools like lzxcomp, despite being slow, might be easier to use for CAB-compatible LZX.