- * Depending on how good a compression ratio we want (see the "Match-choosing"
- * section), we may want to find: (a) all matches, or (b) just the longest
- * match, or (c) just some "promising" matches that we are able to find quickly,
- * or (d) just the longest match that we're able to find quickly. Below we
- * introduce the match-finding methods that the code currently uses or has
- * previously used:
- *
- * - Hash chains. Maintain a table that maps hash codes, computed from
- * fixed-length byte sequences, to linked lists containing previous window
- * positions. To search for matches, compute the hash for the current
- * position in the window and search the appropriate hash chain. When
- * advancing to the next position, prepend the current position to the
- * appropriate hash list. This is a good approach for producing matches with
- * stategy (d) and is useful for fast compression. Therefore, we provide an
- * option to use this method for LZX compression. See lz_hash.c for the
- * implementation.
- *
- * - Binary trees. Similar to hash chains, but each hash bucket contains a
- * binary tree of previous window positions rather than a linked list. This
- * is a good approach for producing matches with stategy (c) and is useful for
- * achieving a good compression ratio. Therefore, we provide an option to use
- * this method; see lz_bt.c for the implementation.
- *
- * - Suffix arrays. This code previously used this method to produce matches
- * with stategy (c), but I've dropped it because it was slower than the binary
- * trees approach, used more memory, and did not improve the compression ratio
- * enough to compensate. Download wimlib v1.6.2 if you want the code.
- * However, the suffix array method was basically as follows. Build the
- * suffix array for the entire window. The suffix array contains each
- * possible window position, sorted by the lexicographic order of the strings
- * that begin at those positions. Find the matches at a given position by
- * searching the suffix array outwards, in both directions, from the suffix
- * array slot for that position. This produces the longest matches first, but
- * "matches" that actually occur at later positions in the window must be
- * skipped. To do this skipping, use an auxiliary array with dynamically
- * constructed linked lists. Also, use the inverse suffix array to quickly
- * find the suffix array slot for a given position without doing a binary
- * search.
+ * There are a number of algorithms that can be used for this, including hash
+ * chains, binary trees, and suffix arrays. Binary trees generally work well
+ * for LZX compression since it uses medium-size windows (2^15 to 2^21 bytes).
+ * However, when compressing in a fast mode where many positions are skipped
+ * (not searched for matches), hash chains are faster.
+ *
+ * Since the match-finders are not specific to LZX, I will not explain them in
+ * detail here. Instead, see lz_hash_chains.c and lz_binary_trees.c.