+/*
+ * ---------------------------------------------------------------------
+ * make_canonical_huffman_code()
+ * ---------------------------------------------------------------------
+ *
+ * Given an alphabet and the frequency of each symbol in it, construct a
+ * length-limited canonical Huffman code.
+ *
+ * @num_syms
+ * The number of symbols in the alphabet. The symbols are the
+ * integers in the range [0, num_syms - 1]. This parameter must be
+ * at least 2 and can't be greater than (1 << NUM_SYMBOL_BITS).
+ *
+ * @max_codeword_len
+ * The maximum permissible codeword length.
+ *
+ * @freqs
+ * An array of @num_syms entries, each of which specifies the
+ * frequency of the corresponding symbol. It is valid for some,
+ * none, or all of the frequencies to be 0.
+ *
+ * @lens
+ * An array of @num_syms entries in which this function will return
+ * the length, in bits, of the codeword assigned to each symbol.
+ * Symbols with 0 frequency will not have codewords per se, but
+ * their entries in this array will be set to 0. No lengths greater
+ * than @max_codeword_len will be assigned.
+ *
+ * @codewords
+ * An array of @num_syms entries in which this function will return
+ * the codeword for each symbol, right-justified and padded on the
+ * left with zeroes. Codewords for symbols with 0 frequency will be
+ * undefined.
+ *
+ * ---------------------------------------------------------------------
+ *
+ * This function builds a length-limited canonical Huffman code.
+ *
+ * A length-limited Huffman code contains no codewords longer than some
+ * specified length, and has exactly (with some algorithms) or
+ * approximately (with the algorithm used here) the minimum weighted path
+ * length from the root, given this constraint.
+ *
+ * A canonical Huffman code satisfies the properties that a codeword
+ * never lexicographically precedes a shorter codeword, and the
+ * lexicographic ordering of codewords of the same length is the same as
+ * the lexicographic ordering of the corresponding symbols. A canonical
+ * Huffman code, or more generally a canonical prefix code, can be
+ * reconstructed from only a list containing the codeword length of each
+ * symbol.
+ *
+ * The classic algorithm to generate a Huffman code creates a node for
+ * each symbol, then inserts these nodes into a min-heap keyed by symbol
+ * frequency. Then, repeatedly, the two lowest-frequency nodes are
+ * removed from the min-heap and added as the children of a new node
+ * having frequency equal to the sum of its two children, which is then
+ * inserted into the min-heap. When only a single node remains in the
+ * min-heap, it is the root of the Huffman tree. The codeword for each
+ * symbol is determined by the path needed to reach the corresponding
+ * node from the root. Descending to the left child appends a 0 bit,
+ * whereas descending to the right child appends a 1 bit.
+ *
+ * The classic algorithm is relatively easy to understand, but it is
+ * subject to a number of inefficiencies. In practice, it is fastest to
+ * first sort the symbols by frequency. (This itself can be subject to
+ * an optimization based on the fact that most frequencies tend to be
+ * low.) At the same time, we sort secondarily by symbol value, which
+ * aids the process of generating a canonical code. Then, during tree
+ * construction, no heap is necessary because both the leaf nodes and the
+ * unparented non-leaf nodes can be easily maintained in sorted order.
+ * Consequently, there can never be more than two possibilities for the
+ * next-lowest-frequency node.
+ *
+ * In addition, because we're generating a canonical code, we actually
+ * don't need the leaf nodes of the tree at all, only the non-leaf nodes.
+ * This is because for canonical code generation we don't need to know
+ * where the symbols are in the tree. Rather, we only need to know how
+ * many leaf nodes have each depth (codeword length). And this
+ * information can, in fact, be quickly generated from the tree of
+ * non-leaves only.
+ *
+ * Furthermore, we can build this stripped-down Huffman tree directly in
+ * the array in which the codewords are to be generated, provided that
+ * these array slots are large enough to hold a symbol and frequency
+ * value.
+ *
+ * Still furthermore, we don't even need to maintain explicit child
+ * pointers. We only need the parent pointers, and even those can be
+ * overwritten in-place with depth information as part of the process of
+ * extracting codeword lengths from the tree. So in summary, we do NOT
+ * need a big structure like:
+ *
+ * struct huffman_tree_node {
+ * unsigned int symbol;
+ * unsigned int frequency;
+ * unsigned int depth;
+ * struct huffman_tree_node *left_child;
+ * struct huffman_tree_node *right_child;
+ * };
+ *
+ *
+ * ... which often gets used in "naive" implementations of Huffman code
+ * generation.
+ *
+ * Most of these optimizations are based on the implementation in 7-Zip
+ * (source file: C/HuffEnc.c), which has been placed in the public domain
+ * by Igor Pavlov. But I've rewritten the code with extensive comments,
+ * as it took me a while to figure out what it was doing...!
+ *
+ * ---------------------------------------------------------------------
+ *
+ * NOTE: in general, the same frequencies can be used to generate
+ * different length-limited canonical Huffman codes. One choice we have
+ * is during tree construction, when we must decide whether to prefer a
+ * leaf or non-leaf when there is a tie in frequency. Another choice we
+ * have is how to deal with codewords that would exceed @max_codeword_len
+ * bits in length. Both of these choices affect the resulting codeword
+ * lengths, which otherwise can be mapped uniquely onto the resulting
+ * canonical Huffman code.
+ *
+ * Normally, there is no problem with choosing one valid code over
+ * another, provided that they produce similar compression ratios.
+ * However, the LZMS compression format uses adaptive Huffman coding. It
+ * requires that both the decompressor and compressor build a canonical
+ * code equivalent to that which can be generated by using the classic
+ * Huffman tree construction algorithm and always processing leaves
+ * before non-leaves when there is a frequency tie. Therefore, we make
+ * sure to do this. This method also has the advantage of sometimes
+ * shortening the longest codeword that is generated.
+ *
+ * There also is the issue of how codewords longer than @max_codeword_len
+ * are dealt with. Fortunately, for LZMS this is irrelevant because
+ * because for the LZMS alphabets no codeword can ever exceed
+ * LZMS_MAX_CODEWORD_LEN (= 15). Since the LZMS algorithm regularly
+ * halves all frequencies, the frequencies cannot become high enough for
+ * a length 16 codeword to be generated. Specifically, I think that if
+ * ties are broken in favor of non-leaves (as we do), the lowest total
+ * frequency that would give a length-16 codeword would be the sum of the
+ * frequencies 1 1 1 3 4 7 11 18 29 47 76 123 199 322 521 843 1364, which
+ * is 3570. And in LZMS we can't get a frequency that high based on the
+ * alphabet sizes, rebuild frequencies, and scaling factors. This
+ * worst-case scenario is based on the following degenerate case (only
+ * the bottom of the tree shown):
+ *
+ * ...
+ * 17
+ * / \
+ * 10 7
+ * / \
+ * 6 4
+ * / \
+ * 3 3
+ * / \
+ * 2 1
+ * / \
+ * 1 1
+ *
+ * Excluding the first leaves (those with value 1), each leaf value must
+ * be greater than the non-leaf up 1 and down 2 from it; otherwise that
+ * leaf would have taken precedence over that non-leaf and been combined
+ * with the leaf below, thereby decreasing the height compared to that
+ * shown.
+ *
+ * Interesting fact: if we were to instead prioritize non-leaves over
+ * leaves, then the worst case frequencies would be the Fibonacci
+ * sequence, plus an extra frequency of 1. In this hypothetical
+ * scenario, it would be slightly easier for longer codewords to be
+ * generated.
+ */
+void
+make_canonical_huffman_code(unsigned num_syms, unsigned max_codeword_len,
+ const u32 freqs[restrict],
+ u8 lens[restrict], u32 codewords[restrict])
+{
+ u32 *A = codewords;
+ unsigned num_used_syms;
+
+ /* Assumptions */
+ wimlib_assert2(num_syms >= 2);
+ wimlib_assert2(num_syms <= (1 << NUM_SYMBOL_BITS));
+ wimlib_assert2(max_codeword_len > 0);
+ wimlib_assert2(max_codeword_len <= 32);
+
+ /* We begin by sorting the symbols primarily by frequency and
+ * secondarily by symbol value. As an optimization, the array
+ * used for this purpose ('A') shares storage with the space in
+ * which we will eventually return the codewords. */
+
+ num_used_syms = sort_symbols(num_syms, freqs, lens, A);
+
+ /* 'num_used_syms' is the number of symbols with nonzero
+ * frequency. This may be less than @num_syms. 'num_used_syms'
+ * is also the number of entries in 'A' that are valid. Each
+ * entry consists of a distinct symbol and a nonzero frequency
+ * packed into a 32-bit integer. */
+
+ /* Handle special cases where only 0 or 1 symbols were used (had
+ * nonzero frequency). */
+
+ if (unlikely(num_used_syms == 0)) {
+ /* Code is empty. sort_symbols() already set all lengths
+ * to 0, so there is nothing more to do. */
+ return;
+ }