cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: Building libcurl on MS-Windows with UNICODE defined

From: Tom Bishop, Wenlin Institute <tangmu_at_wenlin.com>
Date: Tue, 1 Nov 2011 11:53:45 -0700

On Oct 15, 2011, at 2:57 PM, Daniel Stenberg wrote:

> On Mon, 10 Oct 2011, Tom Bishop, Wenlin Institute wrote:
>
> Thanks for your work and research!
>
>> I don't know whether this significantly affects the operation of libcurl as it is actually used. If libcurl needs any of these functions to handle non-Latin strings, it will presumably fail.
>
> ...
>
>> Is there any documentation of this issue for libcurl? I don't find any mention of it in the source code itself. Excuse me if the issue has already been addressed on this mailing list.
>
> No, I don't believe we've discussed this to any particular degree in the past. At least I can't recall it.
>
>> And, is there any interest in adding support for UNICODE? MS-Windows has supported Unicode since 1995, sixteen years ago. Half the world's population uses non-Latin scripts, and even for languages such as English, Unicode provides useful characters that aren't in the ANSI/Windows code pages.
>
> Well, does the current code cause some kind of problem? The way I read your mail is that you think there _might_ be problems, but I don't think anyone has reported/mentioned any up until now and you're not being very specific either so in my view this is not a criticial issue.

Please excuse the lateness of this reply. I agree this does not appear to be a critical issue and I understand that a new version is coming soon, so I don't suggest making any immediate changes. I'm not aware of any problem with the current code, provided that it's built with the makefile (with UNICODE not defined), and assuming that user-names, etc., passed to these MS-Windows functions, are always ASCII in actual current usage (probably true since nobody has complained).

> But if we can fix problems by altering the code, and not cause backwards compatible problems, then I'm all for it!
>
> The only unicode related issue on windows that I can recall is people trying to use curl_formadd() and pass in a unicode file name path.
>
> I'll certainly appreciate a patch and I hope I can get more Windows savy people than me to help me review it for correctness.

Thank you! When I have more time to spare I'd like to study it further and possibly make some suggestions. The only change I'd suggest testing after the next version is to replace FormatMessage(), etc., with FormatMessageA(), etc., so that the code will compile and run correctly (for code points < U+0080) regardless of whether UNICODE is defined, for the benefit of people who compile CURL with different makefiles. (I've done that already in my own copy, and it compiles without warnings, but so far I'm using only a tiny part of CURL's functionality so I can't say it's well-tested. Still, the logic is simple: if UNICODE is not defined, then FormatMessage is defined as FormatMessageA anyway, so the replacement has no effect. If UNICODE is defined, then FormatMessage is defined as FormatMessageW, and it triggers a compiler warning and run-time failure if called with "char *". Therefore it's better to use the name FormatMessageA explicitly, as long as you're calling it with "char *".)

Probably there are also potential improvements that would use the Unicode-capable versions of the MS-Windows functions, possibly supporting UTF-8 strings that get converted to UTF-16 so that CURL's API can still use "char *" but non-ASCII characters will get passed correctly to the MS-Windows functions.

Best wishes,

Tom

文林 Wenlin Institute, Inc. Software for Learning Chinese
E-mail: wenlin@wenlin.com Web: http://www.wenlin.com
Telephone: 1-877-4-WENLIN (1-877-493-6546)

-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2011-11-01