"Last Chance" Exclusion Callback

Comments, questions, bug reports, etc.
Post Reply
zipmagic
Posts: 61
Joined: Thu Aug 06, 2015 7:09 am

"Last Chance" Exclusion Callback

Post by zipmagic »

Is there a chance to have a "last chance callback" for excluding a file from compression?

wimlib's extensive callbacks are great, but the file callbacks are only sent during the initial scanning phase, right?

When operating under very limited disk space conditions, such that the WIM file is being built "eating up" space freed from files that are deleted as soon as wimlib finishes compressing them, this becomes a necessity.

As free disk space grows, files which could not be *safely* compressed inside the WIM at the *start* of the process (initial scanning phase) can often be accommodated inside the WIM further along the process.

If such a "last chance exclusion callback" is not possible in wimlib...can the library consumer get "creative" with some tricks?

For example, what if the file is "swapped" at the last minute when I receive your progress callback for the file - would that work, as a hack of sorts?

In other words, all files would be signaled for inclusion during the regular exclusion callback; when the progress callback is received with the file name to compress, if there isn't enough disk room to accommodate the growth in the WIM and the original uncompressed file copy still present during compression; a zero-byte file could be substituted instead manually - a poor man's "last chance exclusion" callback of sorts.

Would this break anything in the WIM or in wimlib, since wimlib probably sets up internal WIM structures well ahead of this phase?
synchronicity
Site Admin
Posts: 474
Joined: Sun Aug 02, 2015 10:31 pm

Re: "Last Chance" Exclusion Callback

Post by synchronicity »

The problem is that by the time wimlib starts reading the actual file data, it has already scanned all the files and set up internal structures which represent all the files and what data needs to be read from them, including all the file lengths. So hacking in a second exclusion callback would be very difficult and wouldn't really fit into the design. And no, it wouldn't work to "swap" in empty files.

There is certainly an argument to be made that the file data really should be read and compressed/written incrementally, as the directories are scanned; a lot of other file archiving programs do that. But I decided a long time ago that the separate "metadata" and "data" passes was the way to go, as it had a *lot* of implementation advantages. It also allows some interesting optimizations, such as reading files in the order in which they are laid out on-disk, or scanning the NTFS "Master File Table" via FSCTL_QUERY_FILE_LAYOUT and then opening all files by inode number, so that we don't have to read the directories at all. And it can be a usability improvement in some situtations --- for example, if some silly error occurs while calling all the various APIs to read all the various file metadata (which is not unheard of, especially on Windows), it is reported right away, rather than after you've already spent 30 minutes compressing gigabytes of data.
zipmagic
Posts: 61
Joined: Thu Aug 06, 2015 7:09 am

Re: "Last Chance" Exclusion Callback

Post by zipmagic »

Thank you.
Post Reply