[Out-of-bounds read] when avl-tree nodes are allocated in specific addresses

Comments, questions, bug reports, etc.
Post Reply
Haoxin Tu
Posts: 2
Joined: Fri Nov 14, 2025 2:10 pm

[Out-of-bounds read] when avl-tree nodes are allocated in specific addresses

Post by Haoxin Tu »

Dear developers,

We found an issue in the implementation of avl-tree in the wimlib library. Please kindly check the following details.

#### Reproducing Instructions

* The test driver (native_run.c) is as follows:

Code: Select all

#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>
#include "wimlib/avl_tree.h"

#pragma pack(push, 1)
struct WrapAligned {
    char padding[3];
    struct Wrap {
        int key;
        struct avl_tree_node node;
    } wrap;
};
#pragma pack(pop)

static int cmp_nodes(const struct avl_tree_node *a, const struct avl_tree_node *b) {
    const struct Wrap *wa = avl_tree_entry(a, struct Wrap, node);
    const struct Wrap *wb = avl_tree_entry(b, struct Wrap, node);
    if (wa->key < wb->key) return -1;
    if (wa->key > wb->key) return 1;
    return 0;
}

static struct Wrap *make(void) { return (struct Wrap*)malloc(sizeof(struct Wrap)); }

int main(void) {

   // Heap manipulation code to ensure last two bits are 11
   struct WrapAligned *A_aligned = (struct WrapAligned*)aligned_alloc(256, 256);
   struct WrapAligned *B_aligned = (struct WrapAligned*)aligned_alloc(256, 256);
   
   struct Wrap *A = &A_aligned->wrap;
   struct Wrap *B = &B_aligned->wrap;
   
   printf("Address A: %p (last byte: 0x%02x, last two bits: %d%d)\n", 
          (void*)A, (unsigned int)(((uintptr_t)A) & 0xFF), 
          (int)((((uintptr_t)A) & 2) >> 1), (int)(((uintptr_t)A) & 1));
   printf("Address B: %p (last byte: 0x%02x, last two bits: %d%d)\n", 
          (void*)B, (unsigned int)(((uintptr_t)B) & 0xFF),
          (int)((((uintptr_t)B) & 2) >> 1), (int)(((uintptr_t)B) & 1));


    struct avl_tree_node *root = NULL;
    A->key=10; B->key=20;
    avl_tree_insert(&root,&A->node,cmp_nodes);
    avl_tree_insert(&root,&B->node,cmp_nodes);
    free(A_aligned);
    free(B_aligned);
    return 0;
}
To reproduce the issue, use the following:

Code: Select all

gcc -g native_run.c avl_tree.c -I./ -o native_run
gdb ./native_run
The output of gdb should be similar to this:

Code: Select all

…
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555abb in avl_tree_rebalance_after_insert (root_ptr=0x7fffffffd6a0, inserted=0x55555555c307) at avl_tree.c:513
513                     if (node == parent->left)
… 
Testing environment:

Code: Select all

$ uname -a
Linux symbolism 6.8.0-52-generic #53~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 15 19:18:46 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 
* We also tested on `Ubuntu 24.04` with `GCC-13.3.0`, and were able to reproduce the issue.

#### Justification of the real-world impact

This issue highlights a subtle and critical flaw in the implementation of the avl-tree that could result in undefined behavior, data corruption, or other security vulnerabilities if similar allocation patterns were to arise. A user (or developer) of wimlib might interact with the implementation of avl-tree in a way that creates such allocation patterns, leading to hard-to-diagnose crashes or data integrity issues. Therefore, to enhance the robustness and reliability of the implementation of the avl-tree, we believe that it is important to address this issue in all possible usage scenarios.

#### Root cause analysis

We believe that the root cause of the crash is in the function `avl_get_parent`. The parent of a node is stored in the `parent_balance` field of the `avl_tree_node` struct, with the last two bits used to store balance information. The issue occurs when the value stored in the `parent_balance` field is an address where the last two bits are both 1. In this case, when the `avl_get_parent` function is called, it clears the last two bits of the address to retrieve the parent pointer. However, since the last two bits are 1, clearing them results in an address that points to an incorrect location. In our case, this eventually leads to an out-of-bounds read when accessing the parent node, resulting in a segmentation fault.

#### Affected versions

We found that this issue can be reproduced from a very early released version (i.e., v1.6.2 released on 2014-03-31) to the current v4.14.4.

#### Attached files

We have attached the reproducing script and the test driver code to help you reproduce and investigate the issue. The code of avl-tree implementation is extracted from wimlib-v1.14.4.

Thank you for your attention to this matter. We look forward to your response and will be glad to provide any additional information or assistance needed to resolve this issue.

Best regards,
Haoxin and David
Attachments
avl-tree-oob-reproducer.zip
(116.2 KiB) Downloaded 8 times
synchronicity
Site Admin
Posts: 491
Joined: Sun Aug 02, 2015 10:31 pm

Re: [Out-of-bounds read] when avl-tree nodes are allocated in specific addresses

Post by synchronicity »

This is not a bug. The avl_tree_nodes are aligned to at least 4-byte boundaries in the actual code, so this issue does not occur. Note that their natural alignment suffices to achieve this.
Haoxin Tu
Posts: 2
Joined: Fri Nov 14, 2025 2:10 pm

Re: [Out-of-bounds read] when avl-tree nodes are allocated in specific addresses

Post by Haoxin Tu »

Thanks for your reply.

We agree that the current AVL implementation in wimlib may not exhibit this issue under standard allocation, assuming that avl_tree_node has 4-byte natural alignment.

However, our concern is not with current usage, but with assumptions that are not enforced:
There is no check/assertion that guarantees avl_tree_node will always remain properly aligned.
External users may legitimately embed AVL nodes into packed (e.g., the test driver code does) or unaligned structures (e.g., through #pragma pack, network structs, custom allocators, or symbolic virtual memory systems).

For example, the rufus project (with 33.9k star) uses the avl-tree implementations, and their users may have their own strategies to use AVL nodes. In that case, the pointer-masking mechanism (e.g.,`parent_balance & ~3`) may produce an incorrect parent pointer, which can lead to out-of-bounds reads or segmentation faults.

While we understand that this is not a problem in the current wimlib implementation, it could be a future correctness risk, especially if the AVL node is reused in a different context or if a future refactor changes its alignment characteristics.

To make the assumption explicit, we suggest a lightweight guard, such as an assertion, before using a specific node pointer value to do a calculation (e.g., in `avl_tree_insert`).

Please feel free to let us know your further feedback. Thanks.
Post Reply