zstd

bliblubli · Post by **bliblubli** » Tue May 01, 2018 2:50 pm

Hi,

would it possible to add zstd to wimlib or give some hints on how to do so? As wimlib already support parameters that are not compatible with Microsoft specs and I think most users here use the wim they generate with wimlib, I guess it will be very helpful to many people.

Regards

Post by **synchronicity** » Wed May 02, 2018 4:40 am

Well, adding a new compression format to wimlib would be pretty straightforward, and Zstandard is actually something I had tested before. You can find an experimental patch that adds Zstandard support on the "zstd_support" branch (https://wimlib.net/git/?p=wimlib;a=comm ... td_support). To use it on Linux you need to have libzstd installed and pass --with-zstd to wimlib's configure script. To specify Zstandard compression with wimlib-imagex, use --compress=zstd or --solid-compress=zstd. I set the default chunk size to 131072 bytes and the default compression level to 50 (libzstd level 10; the wimlib levels are 5 times higher), but they can be changed through the usual means.

The real question is whether it's useful enough to add, and I'm not sure it is. Note that libzstd would have to be built into the Windows binary, which would basically double its size. The Zstandard format is actually pretty similar to LZX in terms of what time/space tradeoffs it allows, and wimlib already includes a very well optimized LZX compressor, with support for window sizes up to 2 MB. Did you want to use Zstandard primarily for fast compression with window sizes over 2 MB? If so, it might be sufficient to add support for Microsoft's "LZX DELTA" extension to LZX [1], which allows window sizes up to 32 MB [1]. wimlib's LZX compressor also isn't particularly optimized for speed with large window sizes, as I've more focused on compression ratio in that scenario; that could be improved too.

Anyway, feel free to test it and compare it with wimlib's current algorithms. Please consider it experimental though; I won't support the API and on-disk format unless it's merged to the main branch. (Note that it's also risky to reserve a new compression type flag and number, since Microsoft "owns" the format and could use the same values for something else.)

[1] https://msdn.microsoft.com/en-us/librar ... g.80).aspx

bliblubli · Post by **bliblubli** » Sat May 05, 2018 11:24 am

I also thought your optimizations would reduce the gap, but it will be interesting to compare, especially with generated dict and small files.
1) I just pulled the zstd branch, but somehow, the configure script is missing. It is in the source zip on the main doynload page, but as it's from the main branch, it doesn't work. The readme in the git branch still says to use .configure. Is a file missing or the readme has to be updated?
2) Regarding the type flag, I have no idea, but isn't it possible to jump a number like if none is 0, xpress 1, lzx 2 and LZMS 3, let zstd be 7 to ensure MS can add 3 other compression types?
I will happily report performance comparisons.

Post by **synchronicity** » Sat May 05, 2018 3:57 pm

It's intentional that the configure file isn't in git, since it's a generated file. The README file assumes that you're building from a release tarball. You need to run the 'bootstrap' script which requires that you have autoconf, automake, and libtool installed. That will generate 'configure'. (Most autoconf-using projects work the same way, though a minority do check their generated 'configure' script into source control.)

As for the type number, yes it doesn't need to be consecutive with the existing ones. However, a header flag is needed as well, and space for those is limited, unless I use one of the reserved fields instead. Of course, I can still add stuff if I really want to. But I need to be sure it's worthwhile. Edit: using a different number in the 'version' field of the header may be a good solution. For pipable WIMs I actually used different magic characters, but a different version number could work for wimlib-specific stuff too.

bliblubli · Post by **bliblubli** » Sun May 06, 2018 6:20 am

ok, it works now, I'm testing and will report asap.
I definitely feel wimlib would greatly benefit from taking more freedom from the MS version as it nearly offer 100% of the functionality and thus can replace it. Only the possibility to easily navigate images by mounting or opening them in a UI is missing, but one can install Linux for that (as 7zip support is buggy). Of course keeping compatibility modes available and explaining users how to ensure compatibility is good, and it's already the case.

Edit: on a 128GB SSD with a mean read spead of 170MB/Sec (lot of small files on the partition) and an i7 6700k at stock speed (4Ghz)
- with default compression (no parameter at all), it compresses my test partition (64GB windows partition from Linux) in 13min25sec to a 28,7Gb file. The CPU is most of the time used at 100% (on all 8 virtual cores) and the SSD idles a lot
- with --compress=zstd, it compresses in 6min58 to a 29,0Gb file, the CPU is used at about 40-60% (also on all 8 virtual cores) and the SSD is working all the time.
tweaking parameters may reduce the gaps or increse them. At least on this machine and with zstd, there is room to use better compression ratio while keeping the SSD active all the time, thus keeping about the same compression time.

For decompression, Linux is way to slow while writing to ntfs (30MB mean on this same SSD, probably a limitation of ntfs-3G), I would have to build wimlib for windows and test on Windows PE, but I don't have access to a machine with VS right now. If someone can provide a windows build, I could test compare decompression speed also.

bliblubli · Post by **bliblubli** » Sun May 06, 2018 12:07 pm

ok, tested with lots of different tweaks to chunk size and level. LZX:20 with chunk of 256 compresses as fast as zstd (using more cpu time, but the SSD is still the bottleneck in this case) and gives a file that is 27,8GB, compared to 28,5GB with ZSTD:65 with same chunk size and 28,7 with LZX:50 and 128K chunks. So LZX is about 2,5% smaller while being as fast. Whatever parameter I choose for zstd, the difference in final size is pretty small (max - 2%), so there is no real benefit to change defaults. The only thing I noticed is that both LZX and ZSTD seem to be faster and more efficient at 256K, in a scenario where the SSD is the limitation.

I'll try on more datasets later to check if those trends are confirmed or not.

bliblubli · Post by **bliblubli** » Tue May 15, 2018 5:25 pm

So after trying on different set with different settings, the conclusion is that zstd is very good for very fast compression (at least an SSD is needed to see the difference as lzx is already pretty fast.) But user has to tweak the defaults. With zstd:1 and chunk-size=256K, it's 2 times faster than the fastest LZX settings, while producing files of about the same size (+/- 2% max in my test), when the I/O are not limiting. But using higher values than 1 doesn't bring much to the size until 60. Above 60, it starts to be significant, but lzx is way more efficient there. The best setting I found with an SSD was lzx:20 and chunk-size=64K, it produced files sometime even smaller than the default 50 and chunk size at 32K, while being 2 to 3 time faster than the default.

In the end, I compressed a 9GB folder with some big highly compressible files and small jpgs:
- in 16 sec to 4GB with zstd:1 and chunks of 256K. Strange is that smaller chunks give both a slower compression and a bigger file size...
- in 33 sec to 3.6GB with lzx:20 and chunks of 64K. LZX doesn't like big chunks.

The different compression levels of zstd in wimlib don't really match benches found on the net. The compression speed decreases pretty fast when using higher levels, but the ratio stays pretty low.

It was all done with zstd 1.3.4 from official Archlinux repos.

I hope it helps someone else.

Post by **synchronicity** » Fri May 18, 2018 6:47 am

Were you using non-solid compression? Remember that with non-solid compression, every file has to be compressed independently, which can result in there being much less of a difference when increasing the chunk size. That may explain some of your unusual observations. If you want to test solid compression you need to use the appropriate options, e.g.:

Code: Select all

wimcapture test test.wim --solid --solid-compress=zstd --solid-chunk-size=16m

When comparing with external zstd benchmarks, also remember that in the wimlib patch I multiplied the zstd levels by 5 so that their range is comparable to wimlib's other algorithms.

As you noticed, wimlib's LZX compression at low levels (< 35) is not very efficient on large chunk sizes. But it's actually not mainly a problem with the LZX format itself (at least up until 2MB chunks), but rather I haven't really optimized the compressor for that portion of the parameter space yet. Which I do need to do, it's just the parameters aren't the default, so I haven't focused on it previously. Remember you're not comparing just the compression *formats* but rather also the compression *implementations*.

As Mark Twain once wrote:

There are three kinds of lies: lies, damned lies, and compression algorithm benchmarks.

bliblubli · Post by **bliblubli** » Tue May 22, 2018 6:54 am

I used non-solid compression. It's much better for random access. I benched with official zstd command line binary from arch linux, it doesn't really offer high compression ratio indeed, even at max settings. And Wimlib's zstd version is much faster, due to the better multithreading (I know it's the same zstd library, but zstd's -T#numthreads variants was 2 to 3 times slower on the same number of threads as wimlib, compressing the same folder.)
Making LZX faster on big chunk sizes could make it really hard to beat.

Regarding your citation, it's very true

I guess they don't lie either, so I wonder which file types make zstd get such high compression ratios. I can confirm the speed, even more with dictionnaries on all my test, but the high ratios they advert on their github were not to get in my test. And the compute cost made all compression modes above the first one not that interesting...

wimlib

zstd

zstd

Re: zstd

Re: zstd

Re: zstd

Re: zstd

Re: zstd

Re: zstd

Re: zstd

Re: zstd