- * Copyright (c) 2014 Eric Biggers. All rights reserved.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- * 1. Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- *
- * 2. Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in the
- * documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND
- * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
- * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE
- * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR
- * BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
- * WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
- * OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
- * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ * Author: Eric Biggers
+ * Year: 2014, 2015
+ *
+ * The author dedicates this file to the public domain.
+ * You can do whatever you want with this file.
+ *
+ * ---------------------------------------------------------------------------
+ *
+ * Algorithm
+ *
+ * This is a Hash Chains (hc) based matchfinder.
+ *
+ * The data structure is a hash table where each hash bucket contains a linked
+ * list (or "chain") of sequences whose first 3 bytes share the same hash code.
+ * Each sequence is identified by its starting position in the input buffer.
+ *
+ * The algorithm processes the input buffer sequentially. At each byte
+ * position, the hash code of the first 3 bytes of the sequence beginning at
+ * that position (the sequence being matched against) is computed. This
+ * identifies the hash bucket to use for that position. Then, this hash
+ * bucket's linked list is searched for matches. Then, a new linked list node
+ * is created to represent the current sequence and is prepended to the list.
+ *
+ * This algorithm has several useful properties:
+ *
+ * - It only finds true Lempel-Ziv matches; i.e., those where the matching
+ * sequence occurs prior to the sequence being matched against.
+ *
+ * - The sequences in each linked list are always sorted by decreasing starting
+ * position. Therefore, the closest (smallest offset) matches are found
+ * first, which in many compression formats tend to be the cheapest to encode.
+ *
+ * - Although fast running time is not guaranteed due to the possibility of the
+ * lists getting very long, the worst degenerate behavior can be easily
+ * prevented by capping the number of nodes searched at each position.
+ *
+ * - If the compressor decides not to search for matches at a certain position,
+ * then that position can be quickly inserted without searching the list.
+ *
+ * - The algorithm is adaptable to sliding windows: just store the positions
+ * relative to a "base" value that is updated from time to time, and stop
+ * searching each list when the sequences get too far away.
+ *
+ * ---------------------------------------------------------------------------
+ *
+ * Notes on usage
+ *
+ * You must define MATCHFINDER_MAX_WINDOW_ORDER before including this header
+ * because that determines which integer type to use for positions. Since
+ * 16-bit integers are faster than 32-bit integers due to reduced memory usage
+ * (and therefore reduced cache pressure), the code only uses 32-bit integers if
+ * they are needed to represent all possible positions.
+ *
+ * You must allocate the 'struct hc_matchfinder' on a
+ * MATCHFINDER_ALIGNMENT-aligned boundary, and its necessary allocation size
+ * must be gotten by calling hc_matchfinder_size().
+ *
+ * ----------------------------------------------------------------------------
+ *
+ * Optimizations
+ *
+ * The longest_match() and skip_positions() functions are inlined into the
+ * compressors that use them. This isn't just about saving the overhead of a
+ * function call. These functions are intended to be called from the inner
+ * loops of compressors, where giving the compiler more control over register
+ * allocation is very helpful. There is also significant benefit to be gained
+ * from allowing the CPU to predict branches independently at each call site.
+ * For example, "lazy"-style compressors can be written with two calls to
+ * longest_match(), each of which starts with a different 'best_len' and
+ * therefore has significantly different performance characteristics.
+ *
+ * Although any hash function can be used, a multiplicative hash is fast and
+ * works well.
+ *
+ * On some processors, it is significantly faster to extend matches by whole
+ * words (32 or 64 bits) instead of by individual bytes. For this to be the
+ * case, the processor must implement unaligned memory accesses efficiently and
+ * must have either a fast "find first set bit" instruction or a fast "find last
+ * set bit" instruction, depending on the processor's endianness.
+ *
+ * The code uses one loop for finding the first match and one loop for finding a
+ * longer match. Each of these loops is tuned for its respective task and in
+ * combination are faster than a single generalized loop that handles both
+ * tasks.
+ *
+ * The code also uses a tight inner loop that only compares the last and first
+ * bytes of a potential match. It is only when these bytes match that a full
+ * match extension is attempted.
+ *
+ * ----------------------------------------------------------------------------