Eric Biggers [Tue, 27 Dec 2016 23:24:55 +0000 (17:24 -0600)]
Add basic infrastructure for storing xattr items
Define a new tagged metadata item to hold a list of names and values of
Linux-style extended attributes, and prepare for supporting
capture/apply of extended attributes.
I considered making the xattrs a stream instead, referenced from the
tagged item which would just hold a hash. This would have allowed
xattrs to be deduplicated between files. However, I ultimately decided
against this because WIMGAPI and older versions of wimlib would discard
the streams on optimize/export, and extraction would be much more
complicated because xattr streams could come up for extraction before
other streams --- which would be especially problematic for symlinks.
Eric Biggers [Tue, 27 Dec 2016 23:24:55 +0000 (17:24 -0600)]
tagged_items updates
- Expose tagged_item functions in new header tagged_items.h
- Make object_id functions inline functions in object_id.h
- Make inode_get_tagged_item() return stored length, not aligned length
- Add a new function inode_set_tagged_data() which removes existing
items before setting the new one, and use it for inode_set_object_id()
- Make inode_add_tagged_item() append item rather than prepend
- Keep items 8-byte aligned in memory
Eric Biggers [Tue, 27 Dec 2016 02:27:29 +0000 (20:27 -0600)]
Improve random number generation
wimlib used rand() to generate random numbers, e.g. for GUIDs. This was
neither cryptographically secure nor thread-safe. Use getrandom(),
/dev/urandom, or RtlGenRandom() instead.
Eric Biggers [Sat, 17 Dec 2016 03:47:44 +0000 (19:47 -0800)]
join.c: clean up verify_swm_set()
UBSAN complained when the parts_to_swms array had 0 length. Clean this
up by sorting the parts first, making the verification simpler. Also
don't bother checking compression_type and chunk_size anymore; checking
guid should be sufficient, and it doesn't really matter if the
compression formats are different since now everything will be written
out correctly anyway.
Eric Biggers [Thu, 15 Dec 2016 04:49:55 +0000 (20:49 -0800)]
Extract sparse files as sparse
When extracting a stream belonging to an inode with
FILE_ATTRIBUTE_SPARSE_FILE set, before writing any data, mark the
extracted stream as sparse if needed and skip preallocating space.
Then, skip writing zero regions. This makes it so that sparse files are
still sparse after extraction.
Eric Biggers [Sat, 8 Oct 2016 02:59:14 +0000 (19:59 -0700)]
mkwinpeimg: use case insensitive mode when updating boot.wim
It was reported that some Windows PE images have a system directory
called "windows" rather than "Windows". Use case insensitive mode to
ensure added files go to the right place.
Eric Biggers [Wed, 27 Jul 2016 00:10:05 +0000 (17:10 -0700)]
libattr is no longer needed
wimlib only uses the extended attributes interface on Linux, where it
appears it is now safe to assume the functions are present in libc (see:
http://lists.nongnu.org/archive/html/acl-devel/2012-04/msg00001.html).
Note: the setfattr program from the "attr" package is still required to
run the NTFS-3G test script.
Eric Biggers [Sat, 9 Jul 2016 17:13:23 +0000 (12:13 -0500)]
configure.ac: Do not check for <sys/param.h>
This header is conditionally included by <ntfs-3g/endians.h>. It defines
too much stuff on certain platforms, e.g. an ALIGN() macro on FreeBSD,
and it appears redundant with other methods of determining the
endianness.
Eric Biggers [Sat, 9 Jul 2016 17:12:14 +0000 (12:12 -0500)]
ntfs-3g_capture.c: include <ntfs-3g/compat.h> to get ENODATA definition
Some platforms, e.g. FreeBSD, do not define ENODATA. On such platforms,
libntfs-3g uses ENOENT instead, and <ntfs-3g/compat.h> defines ENODATA as
ENOENT.
Eric Biggers [Sat, 9 Jul 2016 15:01:25 +0000 (10:01 -0500)]
bitops: rename bit scan functions
Our bit scan functions use 0-based indices and do not allow zero inputs.
Rename them to 'bsr' and 'bsf' to match the x86 instructions and avoid
confusion with another common convention for 'fls' and 'ffs'.
Eric Biggers [Wed, 22 Jun 2016 01:01:57 +0000 (20:01 -0500)]
lz_extend: simplify lz_extend() slightly
Unrolling the first four word copies does not seem give noticably better
performance anymore, and on a recent Intel processor actually appears to
decrease the performance slightly.
Eric Biggers [Sat, 25 Jun 2016 00:41:23 +0000 (19:41 -0500)]
Character encoding and string conversion updates
- Allow unpaired surrogates when translating between "UTF-8" and
"UTF-16LE". This allows Windows-style filenames to be processed
losslessly on UNIX-like systems, even if they are not valid UTF-16LE.
- Implement UTF-8 and UTF-16LE codecs ourselves and drop the iconv
requirement. This was necessary to allow surrogate codepoints, but it
also provides better performance and actually results in *fewer* lines
of code and a slightly smaller binary.
- Drop support for multibyte encodings other than UTF-8 on UNIX-like
systems. It is probably not worth the effort to support such
encodings. Interestingly, the support was entirely broken before
v1.9.1, yet no one ever complained...