Switch from suffix array match-finder to binary tree match-finder
authorEric Biggers <ebiggers3@gmail.com>
Mon, 2 Jun 2014 03:01:14 +0000 (22:01 -0500)
committerEric Biggers <ebiggers3@gmail.com>
Sat, 7 Jun 2014 21:32:04 +0000 (16:32 -0500)
commitb7071062542143113ad654d89ee6b0603b23b524
tree72b999e353366af3a61c8c36a87fdc00b76a7008
parenta20052d2eaf44eb0466972826f8f9e0c3bcb92d2
Switch from suffix array match-finder to binary tree match-finder

This uses less memory (8 bytes overhead per position vs. 14), is faster,
requires less code (no libdivsufsort), and in some tests actually results
in a better compression ratio.

A binary tree match-finder was used in wimlib v1.5.2 but it didn't seem
as good then, probably because it was combined with a slow block division
algorithm for LZX.

Repeat offsets are now handled differently.  The binary tree match-finder
itself doesn't have any logic for repeat offsets at all; instead, the
match-choosing code (now implemented separately for LZX and LZMS) now
does special checks for matches at repeat offsets.

Since less memory is used for match-finding now, increase the default
LZMS pack chunk size from 2^25 (33554432) to 2^26 (67108864).
16 files changed:
Makefile.am
README
doc/man1/imagex-capture.1.in
include/wimlib.h
include/wimlib/divsufsort.h [deleted file]
include/wimlib/lz_bt.h [new file with mode: 0644]
include/wimlib/lz_optimal.h [deleted file]
include/wimlib/lz_sarray.h [deleted file]
include/wimlib/lzx.h
programs/imagex.c
src/divsufsort.c [deleted file]
src/lz_bt.c [new file with mode: 0644]
src/lz_sarray.c [deleted file]
src/lzms-compress.c
src/lzx-compress.c
src/wim.c