wim by imagex from deduplicated volume - extraction error

Comments, questions, bug reports, etc.
Mimos
Posts: 7
Joined: Fri Dec 02, 2016 12:58 am

wim by imagex from deduplicated volume - extraction error

Post by Mimos »

I created a wim with imagex (using verify and check, no compression) from a volume which had deduplication (dhe one integrated into Win 2012R2 Server) enabled. Extracting using verify and check failed badly after a few files. Extracting without verify+check succeeded but most of the files were not readable. No random data, just a read error.

So I tried wimlib which got much better results, most of the files are readable. There are a few exceptions, though.

Next I tried extracting one single unreadable file, which I have another copy of, in a .vhd and then opened the .vhd using a hex editor. There I could see that the file contents are completely correct. So the errors are propably due to some weird attribute or reparse-point issues. I cannot provide the wim because of its size but I can provide the .vhd which is only 369kiB when compressed with 7z. Maybe this could provide some insights into the issue. Are you interested in this and what is the best way to provide the file?

I'd be happy if you could look into this so I can extract my data without too big of a hassle.
synchronicity
Site Admin
Posts: 474
Joined: Sun Aug 02, 2015 10:31 pm

Re: wim by imagex from deduplicated volume - extraction error

Post by synchronicity »

I haven't used Windows' deduplication support before, but it sounds like it replaces files with reparse points, the data is actually stored in chunk files in System Volume Information, and it's not transparent to applications that backup/restore reparse points directly. So if that's the case, a WIM image created from "deduplicated" files will contain reparse points only (you can verify this with wimdir --detailed). So the only way you might be able to backup and restore deduplicated files would be to create a WIM image of the whole filesystem, including the "System Volume Information" directory. And to do that with wimlib-imagex you'd need to provide a custom capture configuration file with the --config argument, since the default one excludes System Volume Information.
Mimos
Posts: 7
Joined: Fri Dec 02, 2016 12:58 am

Re: wim by imagex from deduplicated volume - extraction error

Post by Mimos »

Thanks for your quick reply.
As far as I know you are correct about how Window's deduplication works.

I did not include System Volume Information but I verified that the wim contains the contents of the problematic files. I also verified that the extraction also extracts these data.
I assume that the problem is that still some additional metadata are extracted which are disrupting file access.

Edit:
I tested killing wimlib directly before finishing extraction of a single small-ish directory. Then everything that is extracted up to this point is readable. When I let the extraction finish the files are not readable any longer.

I also tested your suggestion using wimlink dir deteailed on one of the files and it looks like this:

Code: Select all

----------------------------------------------------------------------------
Full Path           = "\bin\msvcr80.dll"
Attributes          = 0x00000620
    FILE_ATTRIBUTE_ARCHIVE is set
    FILE_ATTRIBUTE_SPARSE_FILE is set
    FILE_ATTRIBUTE_REPARSE_POINT is set
Security Descriptor = O:BAG:S-1-5-21-2484967678-4118065303-1390457190-513D:AI(A;ID;FA;;;BA)(A;ID;FA;;;S-1-5-21-248496767
8-4118065303-1390457190-1005)(A;ID;FA;;;SY)
Creation Time       = Sun Aug 17 17:11:14 2014 UTC
Last Write Time     = Mon Jan 07 09:49:52 2013 UTC
Last Access Time    = Sun Aug 17 17:11:14 2014 UTC
Reparse Tag         = 0x80000013
Link Group ID       = 0x0000000000000000
Link Count          = 1
        Reparse point stream:
Hash              = 0xc396073516921183030b130e29411387e49668c7
Uncompressed size = 124 bytes
Compressed size   = 124 bytes
Offset in WIM     = 9546650943 bytes
Part Number       = 1
Reference Count   = 2
Flags             =
synchronicity
Site Admin
Posts: 474
Joined: Sun Aug 02, 2015 10:31 pm

Re: wim by imagex from deduplicated volume - extraction error

Post by synchronicity »

As it turns out, Windows treats Data Deduplication reparse points specially and actually provides the "dereferenced" file contents even when an application explicitly asks not to dereference the reparse point. However it doesn't hide the reparse point itself. This explains the behavior you observed: the files in the WIM image ended up with both the file contents *and* a Data Deduplication reparse point. After extraction, the reparse point took priority when trying to read from the file, causing errors because the target volume did not include the needed chunk store.

I decided that the best solution for wimlib on Windows is to just have it ignore the reparse point portion of deduplicated files, so that they are captured as normal files. I implemented this and posted wimlib-1.11.0-BETA2. If you try it, it should work in the way you expect.

- synchronicity
Mimos
Posts: 7
Joined: Fri Dec 02, 2016 12:58 am

Re: wim by imagex from deduplicated volume - extraction error

Post by Mimos »

Thanks for your quick work.

Sadly it doesn't help, the issue after extraction stays the same.

So I tried creating a small new wim (using imagex with verify, check and no compression) that contains the issue and after a few iterations I succeded.
It just contains two small, identical files. One of them cannot be opened after extraction with wimlib.
When I took a close look at the extracted files in my testing .vhd I saw the following:
Both files are extracted to different locations but somehow the unreadable one has the reparse point attribute set.

I prepared a .7z (137 kiB) containing the wim, the vhd and the two original files (even it they are identical). If you give me your email address I will send it to you.

Edit: I read your post again and now I see why nothing changed for me: you just changed wim creation but not wim extraction, right?
synchronicity
Site Admin
Posts: 474
Joined: Sun Aug 02, 2015 10:31 pm

Re: wim by imagex from deduplicated volume - extraction error

Post by synchronicity »

Yes, it was easier to handle this case on capture rather than on apply, and then the resulting WIM image can be applied with other versions of wimlib or with other software without any problems. Can you try it again, creating new WIM images?
Mimos
Posts: 7
Joined: Fri Dec 02, 2016 12:58 am

Re: wim by imagex from deduplicated volume - extraction error

Post by Mimos »

Ok, I did a small test (only a few files) with version 1.10.0 vs version 1.11.0-BETA2
The wim captured by 1.11 is a little smaller, about 1%, but this propably hevily depends on the use case.

Code: Select all

             |         captured with
applied with | 1.10  | 1.11  | imagex | dism 6.3.9600.17031 (Server 2012R2)
-------------|-------|-------|--------|------
1.10         | works | works |  fails | fails
-------------|-------|-------|--------|------
1.11         | works | works |  fails | fails
-------------|-------|-------|--------|------
imagex       | fails | works |  fails | fails
-------------|-------|-------|--------|------
dism         | fails | works |  fails | fails
-------------|-------|-------|--------|------
7-zip        | fails | works |  fails | fails
So it looks like the .wim files captured with wimlib are now applicable by all the tools I tested. In future I will just use wimlib for capturing wims.

Sadly though my currend use-case still fails, because I'd like to apply images already captured with imagex.
synchronicity
Site Admin
Posts: 474
Joined: Sun Aug 02, 2015 10:31 pm

Re: wim by imagex from deduplicated volume - extraction error

Post by synchronicity »

You could delete the reparse points of the extracted files, e.g. with Powershell:

Code: Select all

get-childitem c:\dir -recurse -force -attributes ReparsePoint | foreach { fsutil reparsepoint delete $_.fullname }
Mimos
Posts: 7
Joined: Fri Dec 02, 2016 12:58 am

Re: wim by imagex from deduplicated volume - extraction error

Post by Mimos »

Good idea but I only get an "error: access denied" (running as admin).

I took a look into the source code of wimlib and found do_set_reparse_point in win32_apply.c. Is this where all the reparse points are written after writing the data? So I maybe could just try to replace the function body with return 0, try to get the toolchain running and then compile my own version which simply doesn't write any reparse points at all?
synchronicity
Site Admin
Posts: 474
Joined: Sun Aug 02, 2015 10:31 pm

Re: wim by imagex from deduplicated volume - extraction error

Post by synchronicity »

Hmm, well that command worked for me. But yes, compiling a copy of wimlib with do_set_reparse_point() hardcoded to return 0 would work too. In README.WINDOWS there are instructions for compiling a Windows binary on Windows.

Edit: you could even make it skip only Data Deduplication reparse points, so you don't break other reparse points such as symlinks:

Code: Select all

        if (dentry->d_inode->i_reparse_tag == 0x80000013)
                 return 0;
One reason I don't want to just do something like this myself yet is that wimlib also supports capturing WIM images on Linux using NTFS-3G, and neither NTFS-3G nor wimlib know how to transparently dereference the "Data Deduplication" reparse points, so it would still be required to restore the reparse points from such an image.
Post Reply