If you thought JavaScript was a mess, here's what it takes to pass a file path to a Windows API in C++: 🧵
- Windows uses UTF-16, but most modern software uses UTF-8
- Converting UTF-8 to UTF-16 requires calling MultiByteToWideChar twice (once for size, second to convert)
- Alternatively you can set the process code page to UTF-8 and call the 'A' variant API directly, but only sometimes, and only with Windows 10 v1903+, and you might still have to change the system locale setting and reboot
(1/3)
It’s better and worse at the same time: it just doesn’t bother with it for the most part. If you have files named with UTF-8 characters, and run it with a locale that uses an ISO-whatever charset, it just displays them wrong. As long as the byte is not a zero or an ASCII forward slash, it’ll take it.
There’s still a path length limit but it’s bigger: 255 bytes for filenames and 4096 bytes for a whole path. That’s bytes, not characters. So if you use UTF-16 like on Windows, those numbers are halved.
That said, it’s assumed to be UTF-8 these days and should be interpreted as UTF-8, nobody uses non-UTF-8 locales anymore. But you technically can.
I assume this is not thr case for Linux
It’s better and worse at the same time: it just doesn’t bother with it for the most part. If you have files named with UTF-8 characters, and run it with a locale that uses an ISO-whatever charset, it just displays them wrong. As long as the byte is not a zero or an ASCII forward slash, it’ll take it.
There’s still a path length limit but it’s bigger: 255 bytes for filenames and 4096 bytes for a whole path. That’s bytes, not characters. So if you use UTF-16 like on Windows, those numbers are halved.
That said, it’s assumed to be UTF-8 these days and should be interpreted as UTF-8, nobody uses non-UTF-8 locales anymore. But you technically can.