+ * - In LZX, match offsets 0 through 2 actually represent entries in an LRU
+ * queue of match offsets. This is very useful for certain types of files,
+ * such as binary files that have repeating records.
+ *
+ * ----------------------------------------------------------------------------
+ *
+ * Algorithmic Overview
+ *
+ * At a high level, any implementation of LZX compression must operate as
+ * follows:
+ *
+ * 1. Preprocess the input data to translate the targets of 32-bit x86 call
+ * instructions to absolute offsets. (Actually, this is required for WIM,
+ * but might not be in other places LZX is used.)
+ *
+ * 2. Find a sequence of LZ77-style matches and literal bytes that expands to
+ * the preprocessed data.
+ *
+ * 3. Divide the match/literal sequence into one or more LZX blocks, each of
+ * which may be "uncompressed", "verbatim", or "aligned".
+ *
+ * 4. Output each LZX block.
+ *
+ * Step (1) is fairly straightforward. It requires looking for 0xe8 bytes in
+ * the input data and performing a translation on the 4 bytes following each
+ * one.
+ *
+ * Step (4) is complicated, but it is mostly determined by the LZX format. The
+ * only real choice we have is what algorithm to use to build the length-limited
+ * canonical Huffman codes. See lzx_write_all_blocks() for details.
+ *
+ * That leaves steps (2) and (3) as where all the hard stuff happens. Focusing
+ * on step (2), we need to do LZ77-style parsing on the input data, or "window",
+ * to divide it into a sequence of matches and literals. Each position in the
+ * window might have multiple matches associated with it, and we need to choose
+ * which one, if any, to actually use. Therefore, the problem can really be
+ * divided into two areas of concern: (a) finding matches at a given position,
+ * which we shall call "match-finding", and (b) choosing whether to use a
+ * match or a literal at a given position, and if using a match, which one (if
+ * there is more than one available). We shall call this "match-choosing". We
+ * first consider match-finding, then match-choosing.
+ *
+ * ----------------------------------------------------------------------------
+ *
+ * Match-finding
+ *
+ * Given a position in the window, we want to find LZ77-style "matches" with
+ * that position at previous positions in the window. With LZX, the minimum
+ * match length is 2 and the maximum match length is 257. The only restriction
+ * on offsets is that LZX does not allow the last 2 bytes of the window to match
+ * the the beginning of the window.
+ *
+ * Depending on how good a compression ratio we want (see the "Match-choosing"
+ * section), we may want to find: (a) all matches, or (b) just the longest
+ * match, or (c) just some "promising" matches that we are able to find quickly,
+ * or (d) just the longest match that we're able to find quickly. Below we
+ * introduce the match-finding methods that the code currently uses or has
+ * previously used:
+ *
+ * - Hash chains. Maintain a table that maps hash codes, computed from
+ * fixed-length byte sequences, to linked lists containing previous window
+ * positions. To search for matches, compute the hash for the current
+ * position in the window and search the appropriate hash chain. When
+ * advancing to the next position, prepend the current position to the
+ * appropriate hash list. This is a good approach for producing matches with
+ * stategy (d) and is useful for fast compression. Therefore, we provide an
+ * option to use this method for LZX compression. See lz_hash.c for the
+ * implementation.
+ *
+ * - Binary trees. Similar to hash chains, but each hash bucket contains a
+ * binary tree of previous window positions rather than a linked list. This
+ * is a good approach for producing matches with stategy (c) and is useful for
+ * achieving a good compression ratio. Therefore, we provide an option to use
+ * this method; see lz_bt.c for the implementation.
+ *
+ * - Suffix arrays. This code previously used this method to produce matches
+ * with stategy (c), but I've dropped it because it was slower than the binary
+ * trees approach, used more memory, and did not improve the compression ratio
+ * enough to compensate. Download wimlib v1.6.2 if you want the code.
+ * However, the suffix array method was basically as follows. Build the
+ * suffix array for the entire window. The suffix array contains each
+ * possible window position, sorted by the lexicographic order of the strings
+ * that begin at those positions. Find the matches at a given position by
+ * searching the suffix array outwards, in both directions, from the suffix
+ * array slot for that position. This produces the longest matches first, but
+ * "matches" that actually occur at later positions in the window must be
+ * skipped. To do this skipping, use an auxiliary array with dynamically
+ * constructed linked lists. Also, use the inverse suffix array to quickly
+ * find the suffix array slot for a given position without doing a binary
+ * search.
+ *
+ * ----------------------------------------------------------------------------
+ *
+ * Match-choosing
+ *
+ * Usually, choosing the longest match is best because it encodes the most data
+ * in that one item. However, sometimes the longest match is not optimal
+ * because (a) choosing a long match now might prevent using an even longer
+ * match later, or (b) more generally, what we actually care about is the number
+ * of bits it will ultimately take to output each match or literal, which is
+ * actually dependent on the entropy encoding using by the underlying
+ * compression format. Consequently, a longer match usually, but not always,
+ * takes fewer bits to encode than multiple shorter matches or literals that
+ * cover the same data.
+ *
+ * This problem of choosing the truly best match/literal sequence is probably
+ * impossible to solve efficiently when combined with entropy encoding. If we
+ * knew how many bits it takes to output each match/literal, then we could
+ * choose the optimal sequence using shortest-path search a la Dijkstra's
+ * algorithm. However, with entropy encoding, the chosen match/literal sequence
+ * affects its own encoding. Therefore, we can't know how many bits it will
+ * take to actually output any one match or literal until we have actually
+ * chosen the full sequence of matches and literals.
+ *
+ * Notwithstanding the entropy encoding problem, we also aren't guaranteed to
+ * choose the optimal match/literal sequence unless the match-finder (see
+ * section "Match-finder") provides the match-chooser with all possible matches
+ * at each position. However, this is not computationally efficient. For
+ * example, there might be many matches of the same length, and usually (but not
+ * always) the best choice is the one with the smallest offset. So in practice,
+ * it's fine to only consider the smallest offset for a given match length at a
+ * given position. (Actually, for LZX, it's also worth considering repeat
+ * offsets.)
+ *
+ * In addition, as mentioned earlier, in LZX we have the choice of using
+ * multiple blocks, each of which resets the Huffman codes. This expands the
+ * search space even further. Therefore, to simplify the problem, we currently
+ * we don't attempt to actually choose the LZX blocks based on the data.
+ * Instead, we just divide the data into fixed-size blocks of LZX_DIV_BLOCK_SIZE
+ * bytes each, and always use verbatim or aligned blocks (never uncompressed).
+ * A previous version of this code recursively split the input data into
+ * equal-sized blocks, up to a maximum depth, and chose the lowest-cost block
+ * divisions. However, this made compression much slower and did not actually
+ * help very much. It remains an open question whether a sufficiently fast and
+ * useful block-splitting algorithm is possible for LZX. Essentially the same
+ * problem also applies to DEFLATE. The Microsoft LZX compressor seemingly does
+ * do block splitting, although I don't know how fast or useful it is,
+ * specifically.
+ *
+ * Now, back to the entropy encoding problem. The "solution" is to use an
+ * iterative approach to compute a good, but not necessarily optimal,
+ * match/literal sequence. Start with a fixed assignment of symbol costs and
+ * choose an "optimal" match/literal sequence based on those costs, using
+ * shortest-path seach a la Dijkstra's algorithm. Then, for each iteration of
+ * the optimization, update the costs based on the entropy encoding of the
+ * current match/literal sequence, then choose a new match/literal sequence
+ * based on the updated costs. Usually, the actual cost to output the current
+ * match/literal sequence will decrease in each iteration until it converges on
+ * a fixed point. This result may not be the truly optimal match/literal
+ * sequence, but it usually is much better than one chosen by doing a "greedy"
+ * parse where we always chooe the longest match.
+ *
+ * An alternative to both greedy parsing and iterative, near-optimal parsing is
+ * "lazy" parsing. Briefly, "lazy" parsing considers just the longest match at
+ * each position, but it waits to choose that match until it has also examined
+ * the next position. This is actually a useful approach; it's used by zlib,
+ * for example. Therefore, for fast compression we combine lazy parsing with
+ * the hash chain max-finder. For normal/high compression we combine
+ * near-optimal parsing with the binary tree match-finder.
+ *
+ * Anyway, if you've read through this comment, you hopefully should have a
+ * better idea of why things are done in a certain way in this LZX compressor,
+ * as well as in other compressors for LZ77-based formats (including third-party
+ * ones). In my opinion, the phrase "compression algorithm" is often mis-used
+ * in place of "compression format", since there can be many different
+ * algorithms that all generate compressed data in the same format. The
+ * challenge is to design an algorithm that is efficient but still gives a good
+ * compression ratio.