Adapting wimlib to CAB LZX

Comments, questions, bug reports, etc.
Post Reply
ncommander
Posts: 1
Joined: Sat Dec 31, 2022 3:21 pm

Adapting wimlib to CAB LZX

Post by ncommander »

Hi,

I've been trying to use wimlib to compress data using the LZX format. However, I require the compressed data to be readable by libraries like mspack.
As far as I'm concerned, there are no CAB LZX compressor implementations out there, except lzxcomp by Matthew T. Russotto, which has suboptimal performance (over 1min to compress a 50MB file, with no compression level or options available).

So far, I've modified how the block headers are written to match the CAB format, as well as writing the LZX header that determines whether Intel E8 is used.
I compress the data 32KB at a time (LZX frame) and mspack can decode the output perfectly!
There's just one issue: the decoder needs to be reset for each LZX frame, or the MAINTREE and LENGTH tables fail to build. It seems the lengths read from one frame affect another. Do you have any idea how this could be fixed, and if so, would the changes to wimlib's aligned/verbatim compression be deep? Alternatively, as a workaround, is there a valid LZX block that could clear these Huffman lengths?

I totally understand if you can't provide any answers as the topic clearly falls outside of wimlib, but I would appreciate any pointers to which functions would require changing, if additional state would have to be introduced, or an alternative to wimlib for the task.

Thanks for your time! :)
synchronicity
Site Admin
Posts: 490
Joined: Sun Aug 02, 2015 10:31 pm

Re: Adapting wimlib to CAB LZX

Post by synchronicity »

The LZX compression code in wimlib has been heavily optimized for the case where the data to compress is in a single buffer, **and** matches can be made with the whole buffer. That's what the WIM format uses.

In CAB, there is instead a long stream (potentially gigabytes, I think?), where matches can be made with the last 32KiB only. That's quite a bit different.

The minimal change to get it working would be to support compressing such a stream in a single buffer, with the matchfinder changed to find matches in the last 32KiB only. This is feasible (it's what I do for DEFLATE in https://github.com/ebiggers/libdeflate), but it would still be a significant change.

To get it working **properly** would require adding streaming support, so that data can be streamed in incrementally, so that potentially gigabytes of memory isn't needed. That would be a much larger change, and I've stayed away from that sort of thing in all the compressors I've written.
oneeighthundred
Posts: 2
Joined: Sun Jun 01, 2025 6:15 pm

Re: Adapting wimlib to CAB LZX

Post by oneeighthundred »

Old post sorry but just FYI I am attempting to do this in a fork:
https://github.com/elasota/wimlib

So far it is working but I still need to finish adding cross-block matchfinding.

LZX window size max is only 2MB for CAB and 64MB for LZX DELTA and it has to do full bitstream flushes every 32kb for both of them anyway, so I'm going to try adding streaming by just adding a buffer that prepends the window to the input block and then avoid resetting the match finder between blocks. This isn't ideal but it should work as long as the prepend buffer doesn't have to be relocated too often.

I asked about this earlier and sounds like there is not interest in upstreaming it since it's out-of-scope for wimlib but will see if I can get it into gcab or Wine's cabinet.dll eventually.
Basto
Posts: 3
Joined: Fri Jun 06, 2025 1:38 pm

Re: Adapting wimlib to CAB LZX

Post by Basto »

You’re right that mspack expects the decoder to reset Huffman tables each frame. Wimlib’s LZX code assumes a continuous stream, so it doesn’t reset tables per frame by default. Fixing this means deep changes in how wimlib handles Huffman state across blocks.
There’s no special LZX block to reset Huffman tables, you have to reset the decoder yourself between frames.
Your best bet is to keep resetting the decoder per frame externally or modify wimlib to do that internally, which is complex. Otherwise, tools like lzxcomp, despite being slow, might be easier to use for CAB-compatible LZX.
Post Reply