More lz_hash_chains, lz_binary_trees performance improvements
- Use a multiplicative hash function. In practice it's as good as the
CRC32-based hash function previously used, but it's faster to compute
and requires no static data.
- Use slightly different logic that avoids the need to special-case the
extension of matches from 'nice_len' to 'max_len'.
- Faster skip_positions() by avoiding unnecessary calculations.
- Fast match length extension on x86 using unaligned word accesses and
trailing zero count.