Error : filename is not valid UTF-8

Skyblue · Post by **Skyblue** » Mon Jul 05, 2021 3:23 pm

Hi synchronicity,

The latest release of wimlib ( V 1.13.4 ) reports the following error message while capturing a folder on Oracle Linux 5.8 32-bit :

Code: Select all

[ERROR] "/appdata/krbapp/prodappl/xxkrb/11.5.0/bin/file▒200.xls": filename is not valid UTF-8.  This is not supported.

Is there a way to archive those non-UTF-8 files with wimlib? Thanks.

Post by **synchronicity** » Thu Jul 08, 2021 9:37 pm

I'm afraid not, as the WIM file format stores filenames as Windows-style wide character strings (UTF-16LE, with unpaired surrogates allowed). As a result there is no way to represent a UNIX-style arbitrary byte sequence filename unless it is valid UTF-8 (with unpaired surrogates allowed).

Edit: in principle filenames with a well-defined encoding other than UTF-8, say ISO-8859-1, could be mapped to UTF-16 as well. Almost everyone uses UTF-8 now though, so there hasn't been a need to support this.

Skyblue · Post by **Skyblue** » Tue Jul 13, 2021 4:08 pm

Hi synchronicity,

Thanks for the info. Luckly, I had the option to delete the offending files and wimlib did the job.

chungy · Post by **chungy** » Wed Jul 28, 2021 7:58 am

synchronicity wrote: ↑Thu Jul 08, 2021 9:37 pm Edit: in principle filenames with a well-defined encoding other than UTF-8, say ISO-8859-1, could be mapped to UTF-16 as well. Almost everyone uses UTF-8 now though, so there hasn't been a need to support this.

It should be possible, even, to have a flag that interprets all file names as ISO-8859-1 and capture every possible file name that might be seen, and a corresponding flag on apply/extract. It has a particular advantage in that the first 256 code points in Unicode are also the entire character set of 8859-1, the conversion is pretty simple.

I'd generally agree that assuming UTF-8 is a safe default (especially given how long it took for this issue to arrise), and maintains the least surprises. Old archives, to and from Windows, etc.

wimlib

Error : filename is not valid UTF-8

Error : filename is not valid UTF-8

Re: Error : filename is not valid UTF-8

Re: Error : filename is not valid UTF-8

Re: Error : filename is not valid UTF-8