curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder Daniel himself.

Re: Sv: Re: Curl callback question, porting from OpenSSL 1.x to 3.x and from 32bit plain to 64bit UTF16

From: Andreas Mohr via curl-library <curl-library_at_lists.haxx.se>
Date: Wed, 3 Sep 2025 13:15:27 +0200

Hi,

sorry for the delay - I had missed your reply mail, and then was AWOL for a couple days ;-)


On Thu, Aug 21, 2025 at 09:47:35AM +0300, Anders Gustafsson wrote:
> Yes. I do make assumptions based on how this app is used and yes it is C++/MFC on windows going from an old
> version with no specified coding to UTF.

"no specified coding"
Indeed - this often means that
one "naturally"(?) has ACP behaviour, IOW encountering all sorts of different encodings (depending on
current machine configuration global state crap) - with
potential DATA CORRUPTION niceties ensuing.
- as opposed to properly defined encoding handling boundaries where
various implementation layers do have firmly defined - and capable! - encoding specs
(UTF8, *or* UTF16...... *or* specific legacy codepage stuff where somehow needed).

> The certificate data in this case is "regular" ie a PEM payload that is not UTF encoded, just:
>
> -----BEGIN CERTIFICATE-----
> xxxxxxx
> -----END CERTIFICATE-----
>
> Ie one byte per character. Filenames OTOH might be a different kettle of fish.

"one byte per character"
Imprecise [spec] characterization.
I'd prefer "ASCII" for that.
Since if that PEM spec weren't ASCII[-compatible] *), you'd see really funny characters on your monitor ;-)



> I probably should include a
> testcase where the path has non-ascii characters.

Yup, it probably is a very good idea to
have a program (and its config location) [made to be] rooted in a filesystem path which
has the most egregiously weirdest set of Unicode codepoints clapping-hands smileys multi-languages that one can imagine.
Then when having non-Unicode-compliant handling somewhere,
one hopefully *will* notice sufficiently soonish (Fail-Fast / Shift-Left).



> The string passed to CURLOPT_SSL_CTX_DATA is a CStringA so it is OK to convert to char, especially assuming
> that the data is just PEM

"convert to char"
Well not a conversion I'd think, since
container type payload *is*
char-[byte-]typed [[- which doesn't say anything at all about its actual encoding of course...]]


> Yes. String handling in Windows is weird.

"Q: Are you a linuxer? Is this a concealed religious fight against Windows?"
  https://utf8everywhere.org/#faq.linuxer




[[[yup, me is a Linuxer :-)]]]


> I sort of understand the performance aspect of using UTF-16, but it
> makes other things harder. I guess another option would be to use std::string all the way.

Though note that being std::string-typed is
not in any way, shape or form related to encoding.
(while utf8everywhere.org says that std::string is [to always be] UTF8 encoding **),
which is an appreciable recommendation,
of course in itself the std::string type doesn't really have
such a fixed meaning - unfortunately? c.f. u8string though).

IMHO the main point is
being aligned to usual protocol [comfort zone].
In implementation areas very closely related to Win32 APIs, do UTF16 (...W()).
In most other areas, don't (thus,
UTF8 of course - to keep up Unicode-compliant encodings fundamental requirements...).

That way one will be able to
directly [conveniently] consume various provided APIs (e.g.:
- CStringW::MakeUpper() etc.
- window SetWindowText[W]() etc.
- registry SetPrivateProfile*[W]() etc.
- perhaps other woefully Win32-specific stuff such as shlwapi Path*[W]()
)

"How to do text on Windows"
  https://utf8everywhere.org/#windows



> Which is pretty much what I do. All certificate data is assumed to be PEM and coded as ASCII with one byte
> per character.

Yup, probably encoding (thus: transcoding) {c|sh}ould be disregarded here since
this PEM stuff ought to be
ASCII subset only *) (nevermind certain rather exotic encodings which
actually are *NOT* ASCII compatible, such as EBCDIC - where
one *would* need proper full transcoding, to have the correct encoding representation established!).


"one byte per character"
Don't.
IMHO that's some unhelpful thinking / unrelated noise (if I am planning a nice BBQ meal then
I certainly won't be analyzing each single charcoal with
a scanning electron microscope on its carbon content either... :-)).
Things are a certain specific encoding [spec] [name], always. Whether that's *realized* via
some SBCS or MBCS (/DBCS) mechanism or UTF8 ***) or UTF16 or UTF32 or whichever other "bits fumbling" mechanics DOES NOT MATTER AT ALL ****).
Thus simply "either correct or wrong"
(there's one encoding which is the correct one and 9999 others which are wrong :-)).
Thus, transcoding. Unless I definitely know that
a certain encoding already is
compatible (since e.g.: supporting ASCII subset, where I am needing ASCII compat) and thus
I can [afford to] do shortcutting skipping (in such cases).

*) which it most certainly is (simple base64-based patterns):
https://serverfault.com/questions/9708/what-is-a-pem-file-and-how-does-it-differ-from-other-openssl-generated-key-file
  "base64 translation of the x509 ASN.1 keys."
**) most certainly because that's the [almost] only way to have a byte-typed container provide a Unicode-compliant encoding...
***) also an "MBCS", of course...
  ((just a VERY INCOMPATIBLE one in case of
  Win32 CP_ACP protocol affected areas...))
****) well, except where working on string data (doing active _string processing_) -
  with corresponding properly compatible APIs, of course...

Greetings

Andreas Mohr

> --
> Med vänlig hälsning
>
> Anders Gustafsson, ingenjör
> anders.gustafsson_at_pedago.fi | Support +358 18 12060 | Direkt +358 9 315 45 121 | Mobil +358 40506 7099
>
> Pedago interaktiv ab, Nygatan 7 B , AX-22100 MARIEHAMN, ÅLAND, FINLAND
>
>
>
> >>> Andreas Mohr <andi_at_lisas.de> 2025-08-20 17:44 >>>
> Hi,
>
> disclaimer:
> quite experienced in certain areas, yet not too much in others (CURL).
>
> TL;DR:
> thus discussing potential string encoding issues only.
>
> On Wed, Aug 20, 2025 at 03:58:14PM +0300, Anders Gustafsson via curl-library wrote:
> > So, yes this is windows ?????? libcurl/8.15.0-DEV OpenSSL/3.5.2 zlib/1.3.1
> >
> > I had some issues and I just want to check whether I am going about this the right way. The function calls
> an
> > API where the client certificate is used to authenticate the caller so in the original version I used the
> > sslctx_function(). To complicate matters does my app support PEM certificates and keys in two different
> ways:
> > 1. As files (Say on a removable secure media) and 2. As strings in the database for ease of use.
> >
> > The first way (filenames) worked right away, ie:
> >
> > m_Certificate.Trim();
> > if (m_Certificate.IsEmpty())
> > curl_easy_setopt(curl, CURLOPT_SSLCERT, m_CertificateFile.GetString());
>
> Bleeep - ATLMFC CStringT::GetString() encountered.
> This might thus be
> dirty "encoding shortcutting" here
> (simply invoking .GetString() to
> "quickly" "get at" some "char"-compatibly-typed - hah! - input, rather than
> doing active transcoding to
> the actually *correct* encoding spec of
> some char-typed handling).
>
> Thus, consulting this one:
>
> > Where m_Certificate and m_Key and regular (char) strings with the PEM coded data.
>
> What would "regular" mean?
>
> Considering that
> CString errfilename;
> with
> CT2A(errfilename),
> one would think that this is
> a CString[T] with UNICODE config setting (put differently, CStringW)
> environment, however
> since you said "regular (char) strings", I am assuming that
> you are on !UNICODE config (i.e., CStringA).
>
>
> https://manpages.ubuntu.com/manpages/kinetic/man3/CURLOPT_SSLCERT.3.html
> Yeah nice - that page does not specify at all which
> encoding char *cert (a filesystem item argument!! - which could be
> containing all the smileys available in this universe, with
> only a bit of luck...) is expected to have.
> Thus on Windows one would tend to
> assume ACP crap - which of course would mean that it is
> Unicode-compliance-broken (since: neither UTF16 nor UTF8 nor UTF7 nor
> UTF-EBCDIC or whatever ;-)).
>
>
>
>
>
> > errno_t fileerr = fopen_s(&errfile, CT2A(errfilename), "w+, ccs=UTF-8");
>
> Unicode-compliance-broken filesystem item handling!
>
> CT2A(errfilename) will be
> wide-typed to CP_ACP transition (in UNICODE config setting), and
> "nothing" *) (in non-UNICODE).
>
> *) BTW *HORRIBLE* atlconv.h comment "// Code page doesn't matter" atlconv.h transcoding protocol breakage!!!
> Yeah, as if that would be the case for
> e.g. CP_ACP to CP_UTF8 transcoding, which *is* a valid transcoding use case...)
> (think of
> CT2CA(..., CP_UTF8)
> protocol behaviour **DIFFERENCE**)
>
>
> Thus, your errfilename possibly is ACP (CP_ACP, GetACP()) content,
> which would be
> "compatible" since
> fopen_s() API is equally ACP-specced on Windows (yuck).
> ...but: then it would be
> Unicode-compliance-broken (due to
> being ACP crap, rather than
> e.g. UTF8 as usually on Linux).
>
> Since fopen_s() should have an overload for wide-typed input I'd think,
> the way to go would at least be
> CT2W(errfilename) - thereby
> properly preserving Unicode-compliant (since wide-typed!) encoding (when
> on UNICODE config setting - and ACP crap on !UNICODE).
>
> Or, better do utf8everywhere.org (i.e., std::string[-means-utf8] -
> to have ensured that
> **every** string traffic anywhere is Unicode-compliant), and thus do
> std::string errfilename = "myUtf8InputStuffStringFromSomewhere"; // e.g. std::filesystem **) API
> fopen_s(... CA2W(errfilename, CP_UTF8) ...);
>
>
> **) rather *horribly* Unicode-compliance-broken (on Windows!) - I digress...
> "<filesystem>: prevent filesystem::path dangerous conversions to/from default code page encoding"
> https://github.com/microsoft/STL/issues/909
>
>
> > In the second scenario, PEM in database, I had some problems and I just wanted to check that the code I
> came
> > up with is sane. Ie the authentication will not happen unless I have both certificate and key, so:
> >
> > if (!m_Certificate.IsEmpty())
> > {
> > curl_easy_setopt(curl, CURLOPT_SSL_CTX_FUNCTION, sslctx_function);
> > curl_easy_setopt(curl, CURLOPT_SSL_CTX_DATA, m_Certificate.GetString());
>
> WARNING CORRUPTION: CURLOPT_SSL_CTX_DATA has a void*-typed ptr arg, thus
> both .GetString() CString[T] is accepted, *silently*).
> IOW, once on UNICODE config setting it would be
> *broken*.
>
> So, questionable encoding stuff again.
> According to
> https://curl.se/libcurl/c/CURLOPT_SSL_CTX_DATA.html
> "char *mypem = /* CA cert in PEM format"
> it seems to *appear* that
> "PEM format" means some plain ASCII-only payload.
>
> Now to be maximally precise one could do
> const UINT nCP_PEM = 20127 /*CP_ASCII*/ /* these clearly are [to be!] all ASCII-only, right!!!? */;
> std::string strCertificate = CW2A(CA2W(m_Certificate), nCP_PEM);
> (this transition expects that m_Certificate has system ACP content, of course)
>
>
>
> (OTOH one could just assume that
> all [relevant] ACP encodings are ASCII subset, thus
> simply NOT do transcoding since it then ought to be
> ASCII-compliant content already anyway).
>
>
> > Then, below, which seems to work OK. I first used the example here:
> > https://curl.se/libcurl/c/CURLOPT_SSL_CTX_FUNCTION.html
> > but that one did not fix my key for me. Yes, this code still leaves allocated memory in case of errors.
>
> "fix my key" - that wording might hint at
> encoding issues.
> But perhaps we're talking about
> a plain CURL certificate config issue only after all.
>
> I could not precisely identify (thus: discuss) particular ***) issues in
> your handling, but I'd hope that
> this will give you some ideas (*if* it is an encoding issue).
>
> ***) well, except for the broken fopen_s() filesystem item handling...
>
> Greetings
>
> Andreas Mohr

-- 
Epidämliche Plage rationaler Schlagseite
-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2025-09-03