fast MFT scanner recursion

Comments, questions, bug reports, etc.
Post Reply
christian
Posts: 3
Joined: Mon Jun 01, 2020 11:35 pm

fast MFT scanner recursion

Post by christian »

I am very interessted in the fast MFT scan method. I see that when you find extended attributes, object IDs or encrypted files, then you switch back to the traditional method of getting most information. Otherwise you get everything but the security descriptor from the MFT and cache the security-ID mapping to only query one example file for each different security ID which should also be very fast.
This seems perfect as backup appliaction.

I have two questions:
1. When you use the old method, why do you *recursively* scan that tree where e.g. an object ID appeared? Why not only the specific file or folder entry? I tried wimimage on my win10 computer and ran process monitor and it seems like my c:\ has an object ID. This seems to have spoiled it completely, I saw recursive information queries for every file. I checked with fsutil and my c:\ indeed has an object id.
Could this be considered a kind of bug and maybe improve the performance if changed from being recursive? Maybe it's just odd that my computer has an object ID for c:\ and nobody else has this performance disadvantage or I just did not understand it correctl.y

2. I tried to do similar things regarding MFT scanning in C#. For me the security IDs and owner IDs are always 0. I have also adjusted privileges. I have not yet tried to compile wimlib and add debug output to check it. If you know why this happens, it would be great to know. If I find my computer to produce zeros with wimlib too, I will report later!

Thanks for creating this absolutely marvelous piece of software, such an attention to detail and comprehensive solution, even parsing the registry, covering NTFS! Just wow!
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: fast MFT scanner recursion

Post by synchronicity »

1. I think I did it that way just because it was simpler, and at the time not many directories on a default Windows install had object IDs (or otherwise triggered using the recursive scan). It's possible that Windows has changed to start assigning object IDs to more directories and as a result the code is no longer working properly. It looks like this won't be too hard to fix in the way you suggest (use the fallback non-recursively rather than recursively), but I'll need to test it.

2. Just to clarify, you are using FSCTL_QUERY_FILE_LAYOUT and are getting the security ID from "FILE_LAYOUT_INFO_ENTRY", like wimlib does? If I recall correctly, Windows just gives each security descriptor a unique ID starting from 0. Is it possible that all your files have the same security descriptor? If not, then perhaps you aren't using the API correctly from C#?
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: fast MFT scanner recursion

Post by synchronicity »

I've uploaded wimlib v1.13.3-BETA1 which contains the optimization you suggested to avoid unnecessarily falling back to a recursive scan. Can you try it out?
christian
Posts: 3
Joined: Mon Jun 01, 2020 11:35 pm

Re: fast MFT scanner recursion

Post by christian »

1.

Thanks for the new version!

I was not able to really test it properly because my drive has not enough space for a full backup and I can't compare WIM-files from the stable version and the beta easily (in case there are bugs with the tree building), but it looks great!
At least I timed the part of both versions until they start archiving and also ran process monitor again.
It surely does not descend into every folder anymore and I see quite fewer OBJECT_ID requests with status NOT FOUND (before it was always NOT FOUND becaues not many files have one on my computer). In phase 2 it seems to process the files with objects IDS primarily now, so it only looks at files it needs to.

The time for scanning, before archiving, went down from 1 minute and 53 seconds to just 23 seconds with the new beta!

On my computer, "fsutil objectID query c:\" gives me an indentical object ID and birth object ID, a birth volume ID and an all zero domain ID.
In a win 10 VM there is no object ID for c:\
By setting an object ID for c:\ it can surely be replicated.

I have no idea whatsoever who put that object ID there on my PC. The new version helps for my computer!

2.
I have recompiled wimlib-imagex and inserted a printf statement for the security ID. Different numbers appear, so everything with wimlib seems to be fine and also my computer or user accounts seems to not exhibit strange behaviour.

Yes, I use FSCTL_QUERY_FILE_LAYOUT from c# and parse the buffer. I get ExtraInfoOffset from offset 32 per file entry and then from that pointer offset 36 and 40 for ownerid and securityid and they are always zero for every file. I have double checked the offsets and also enabled privileges. I also use the FSCTL_QUERY_FILE_LAYOUT on the same path.

But I'll look at it again in a hex editor and will also try to load the security descriptor from C# in the traditional way and compare further with wimlib, maybe I'll figure it out. It was just that maybe you experienced the same and from the top of your head remember that problem.

There seems to be no other usable reference to this FSCTL_QUERY_FILE_LAYOUT on the internet except for something called SwiftSearch on Github. No blogs, no docs...

3. When building and reading the build instructions, the cygwin installer did not have " - mingw64-x86_64-pkg-config" in devel, but "pkg-config" (also in devel)
christian
Posts: 3
Joined: Mon Jun 01, 2020 11:35 pm

Re: fast MFT scanner recursion

Post by christian »

2. It was an aligment issue in my c# code:

typedef struct {
struct {
u64 CreationTime;
u64 LastAccessTime;
u64 LastWriteTime;
u64 ChangeTime;
u32 FileAttributes;
u32 padding
} BasicInformation;
u32 OwnerId;
u32 SecurityId;
s64 Usn;
} FILE_LAYOUT_INFO_ENTRY;*/

After FileAttributes there are 4 byte padding.
synchronicity
Site Admin
Posts: 472
Joined: Sun Aug 02, 2015 10:31 pm

Re: fast MFT scanner recursion

Post by synchronicity »

3. When building and reading the build instructions, the cygwin installer did not have " - mingw64-x86_64-pkg-config" in devel, but "pkg-config" (also in devel)
Fixed by commit e8d0faeeaf7d
There seems to be no other usable reference to this FSCTL_QUERY_FILE_LAYOUT on the internet
Did you check https://docs.microsoft.com/en-us/window ... ile-layout?
Post Reply