- * - LZX does not have static Huffman blocks; however it does have two types of
- * dynamic Huffman blocks ("aligned offset" and "verbatim").
- * - LZX has a minimum match length of 2 rather than 3.
- * - In LZX, match offsets 0 through 2 actually represent entries in an LRU
- * queue of match offsets. This is very useful for certain types of files,
- * such as binary files that have repeating records.
- *
- * Algorithms
- * ==========
- *
- * There are actually two distinct overall algorithms implemented here. We
- * shall refer to them as the "slow" algorithm and the "fast" algorithm. The
- * "slow" algorithm spends more time compressing to achieve a higher compression
- * ratio compared to the "fast" algorithm. More details are presented below.
- *
- * Slow algorithm
- * --------------
- *
- * The "slow" algorithm to generate LZX-compressed data is roughly as follows:
- *
- * 1. Preprocess the input data to translate the targets of x86 call
- * instructions to absolute offsets.
- *
- * 2. Build the suffix array and inverse suffix array for the input data. The
- * suffix array contains the indices of all suffixes of the input data,
- * sorted lexcographically by the corresponding suffixes. The "position" of
- * a suffix is the index of that suffix in the original string, whereas the
- * "rank" of a suffix is the index at which that suffix's position is found
- * in the suffix array.
- *
- * 3. Build the longest common prefix array corresponding to the suffix array.
- *
- * 4. For each suffix, find the highest lower ranked suffix that has a lower
- * position, the lowest higher ranked suffix that has a lower position, and
- * the length of the common prefix shared between each. This information is
- * later used to link suffix ranks into a doubly-linked list for searching
- * the suffix array.
- *
- * 5. Set a default cost model for matches/literals.
- *
- * 6. Determine the lowest cost sequence of LZ77 matches ((offset, length)
- * pairs) and literal bytes to divide the input into. Raw match-finding is
- * done by searching the suffix array using a linked list to avoid
- * considering any suffixes that start after the current position. Each run
- * of the match-finder returns the approximate lowest-cost longest match as
- * well as any shorter matches that have even lower approximate costs. Each
- * such run also adds the suffix rank of the current position into the linked
- * list being used to search the suffix array. Parsing, or match-choosing,
- * is solved as a minimum-cost path problem using a forward "optimal parsing"
- * algorithm based on the Deflate encoder from 7-Zip. This algorithm moves
- * forward calculating the minimum cost to reach each byte until either a
- * very long match is found or until a position is found at which no matches
- * start or overlap.
- *
- * 7. Build the Huffman codes needed to output the matches/literals.
- *
- * 8. Up to a certain number of iterations, use the resulting Huffman codes to
- * refine a cost model and go back to Step #6 to determine an improved
- * sequence of matches and literals.
- *
- * 9. Output the resulting block using the match/literal sequences and the
- * Huffman codes that were computed for the block.
- *
- * Note: the algorithm does not yet attempt to split the input into multiple LZX
- * blocks, instead using a series of blocks of LZX_DIV_BLOCK_SIZE bytes.
- *
- * Fast algorithm
- * --------------
- *
- * The fast algorithm (and the only one available in wimlib v1.5.1 and earlier)
- * spends much less time on the main bottlenecks of the compression process ---
- * that is, the match finding and match choosing. Matches are found and chosen
- * with hash chains using a greedy parse with one position of look-ahead. No
- * block splitting is done; only compressing the full input into an aligned
- * offset block is considered.
- *
- * API
- * ===
- *
- * The old API (retained for backward compatibility) consists of just one
- * function:
- *
- * wimlib_lzx_compress()
- *
- * The new compressor has more potential parameters and needs more memory, so
- * the new API ties up memory allocations and compression parameters into a
- * context:
- *
- * wimlib_lzx_alloc_context()
- * wimlib_lzx_compress2()
- * wimlib_lzx_free_context()
- * wimlib_lzx_set_default_params()
- *
- * Both wimlib_lzx_compress() and wimlib_lzx_compress2() are designed to
- * compress an in-memory buffer of up to the window size, which can be any power
- * of two between 2^15 and 2^21 inclusively. However, by default, the WIM
- * format uses 2^15, and this is seemingly the only value that is compatible
- * with WIMGAPI. In any case, the window is not a true "sliding window" since
- * no data is ever "slid out" of the window. This is needed for the WIM format,
- * which is designed such that chunks may be randomly accessed.
- *
- * Both wimlib_lzx_compress() and wimlib_lzx_compress2() return 0 if the data
- * could not be compressed to less than the size of the uncompressed data.
- * Again, this is suitable for the WIM format, which stores such data chunks
- * uncompressed.
- *
- * The functions in this LZX compression API are exported from the library,
- * although with the possible exception of wimlib_lzx_set_default_params(), this
- * is only in case other programs happen to have uses for it other than WIM
- * reading/writing as already handled through the rest of the library.
- *
- * Acknowledgments
- * ===============
- *
- * Acknowledgments to several open-source projects and research papers that made
- * it possible to implement this code:
- *
- * - divsufsort (author: Yuta Mori), for the suffix array construction code,
- * located in a separate directory (divsufsort/).
- *
- * - "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its
- * Applications" (Kasai et al. 2001), for the LCP array computation.
- *
- * - "LPF computation revisited" (Crochemore et al. 2009) for the prev and next
- * array computations.