Page 1 of 1

Error : filename is not valid UTF-8

Posted: Mon Jul 05, 2021 3:23 pm
by Skyblue
Hi synchronicity,

The latest release of wimlib ( V 1.13.4 ) reports the following error message while capturing a folder on Oracle Linux 5.8 32-bit :

Code: Select all

[ERROR] "/appdata/krbapp/prodappl/xxkrb/11.5.0/bin/file▒200.xls": filename is not valid UTF-8.  This is not supported.
Is there a way to archive those non-UTF-8 files with wimlib? Thanks.

Re: Error : filename is not valid UTF-8

Posted: Thu Jul 08, 2021 9:37 pm
by synchronicity
I'm afraid not, as the WIM file format stores filenames as Windows-style wide character strings (UTF-16LE, with unpaired surrogates allowed). As a result there is no way to represent a UNIX-style arbitrary byte sequence filename unless it is valid UTF-8 (with unpaired surrogates allowed).

Edit: in principle filenames with a well-defined encoding other than UTF-8, say ISO-8859-1, could be mapped to UTF-16 as well. Almost everyone uses UTF-8 now though, so there hasn't been a need to support this.

Re: Error : filename is not valid UTF-8

Posted: Tue Jul 13, 2021 4:08 pm
by Skyblue
Hi synchronicity,

Thanks for the info. Luckly, I had the option to delete the offending files and wimlib did the job.

Re: Error : filename is not valid UTF-8

Posted: Wed Jul 28, 2021 7:58 am
by chungy
synchronicity wrote: Thu Jul 08, 2021 9:37 pm Edit: in principle filenames with a well-defined encoding other than UTF-8, say ISO-8859-1, could be mapped to UTF-16 as well. Almost everyone uses UTF-8 now though, so there hasn't been a need to support this.
It should be possible, even, to have a flag that interprets all file names as ISO-8859-1 and capture every possible file name that might be seen, and a corresponding flag on apply/extract. It has a particular advantage in that the first 256 code points in Unicode are also the entire character set of 8859-1, the conversion is pretty simple.

I'd generally agree that assuming UTF-8 is a safe default (especially given how long it took for this issue to arrise), and maintains the least surprises. Old archives, to and from Windows, etc.