Closed
Description
I want to upload file by curl ,but i get the file name error and missing file suffix
curl/libcurl version 7.87.0
curl 'http://10.128.6.26:3030/package/publish' --form 'projectName="electron-test"' --form 'packageFile=@"C:\Users\Administrator\Desktop\test-upload\测试中文-0.1.3-win.exe"'
i get file name '测试中文-0.1.3-w'
In version 7.83.1
I get file name '测试中文-0.1.3-win.exe'
System:win10
Activity
jay commentedon Jan 9, 2023
Please give us the version information
curl -V
from both versions.UnicornZhang commentedon Jan 9, 2023
@jay error version:curl 7.87.0 (x86_64-w64-mingw32) libcurl/7.87.0 OpenSSL/1.1.1s (Schannel) zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.3 libpsl/0.21.1 (+libidn2/2.3.1) libssh2/1.10.0 nghttp2/1.51.0
Release-Date: 2022-12-21
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM PSL SPNEGO SSL SSPI threadsafe TLS-SRP UnixSockets zstd
another version: curl 7.83.1 (Windows) libcurl/7.83.1 Schannel
Release-Date: 2022-05-13
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS HSTS IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI UnixSockets
jay commentedon Jan 9, 2023
I can reproduce this in a Windows 7 VM when I change the region and locale (including non-Unicode locale) to traditional Chinese and use the mingw-w64 official curl builds after 7.83.1_2. With the same locale settings I have Visual Studio builds of 7.87.0, both no character set and multibyte character set, that will not reproduce in that VM with 7.87.0. I notice the Chinese characters are multibyte in the ansi codepage so maybe that has something to do with it, even though I can't reproduce with VS builds.
This is the last good version:
This is the first bad version:
The
¦
are multibyte Chinese characters that socat can't print because the server is running in English. The same number of bytes in the chinese characters both times.@vszakats any idea?
vszakats commentedon Jan 9, 2023
I cannot spot anything between 7.83.1_1 and 7.83.1_2 that might affect character handling (skimming through the 100 commits, which are mainly HTTP/3 / LibreSSL prep work, switching libssh2 builds to autotools, "unified" packaging, and some other unrelated things):
curl/curl-for-win@ce495c3...70eb533
7.83.1_2 changed packaging to the single-zip style (curl/curl-for-win@3182733).
It also enabled
-O3
with curl, though this later turned out to be a no-op and had to be redone because the upstreamMakefile.m32
overrode it to-O2
anyway.So I cannot see anything in this revision that might affect Chinese text.
7.83.1_3 replaced OpenSSL with openssl-quic, adding HTTP/3 support. That seems even less related:
curl/curl-for-win@70eb533...1a0412d
Though rereading the thread makes it unclear if this became an issue between 7.84.0 and 7.83.1, or between two distinct curl-for-win builds of 7.83.1. If the former, we need to turn to the curl source code. It's also not very likely that this is curl-for-win-specific, though if we have the regression between two revisions, I can look it up again.
vszakats commentedon Jan 9, 2023
My generic guess would be this:
curl on Windows doesn't have Unicode support by default. In these builds all strings are considered raw-bytes and passed around as-is (*). This makes things appear to work in certain practical cases. For some this may be perceived as curl having special support for certain codepages/encodings. The reality though that it is just a happy coincidence.
(*) But, curl has certain places (and/or dependencies) which may steer off the track of the optimistic "as-is" string handling, and doing string converstion/manipulation, which instantly breaks the happy coincidences by assuming certain input formats and spitting out certain output formats. It's enough to call any Win32 API function with a
W
ending (e.g.WriteConsoleW()
insrc/tool_cb_wrt.c
).This is made even more complex in curl due to using both CRT functions and the Win32 API directly, each with potentially different encoding requirement or Unicode support. (And even more complexity comes when interfacing with the dependencies curl supports.)
Speaking of curl with Unicode support enabled: This is the correct path, but as of today, the level of support is just not enough to cover all cases. (And before we could cover all cases, we'd need to clear codepage requirements for each string curl is accepting or returning). The downside is that this mode breaks all the use-cases which work "correctly" by happy coincidence in non-Unicode mode, because of the added conversions and heavier use of Unicode-enabled functions needing them.
Till these fundamental issues are solved, iterations of these non-ASCII issues will keep popping up.
The solution is complex. Even if fully solved, with Unicode mode finished and enabled by default, it will inherently result in fallouts, because the old "happy coincidence" cases will start to break and will require correction to work as expected in a Unicode-enabled environment. (even more so for libcurl API users)
Cherish98 commentedon Feb 11, 2023
@vszakats I believe it is caused by commit 68fa9bf in which
HAVE_BASENAME
is set to 1. Before that, it used the curl's version (Curl_basename()
) which treated strings as raw-bytes and passed around as-is. The function is called bystrippath()
insidecurl_mime_filedata()
to set filename:curl/lib/mime.c
Lines 1437 to 1445 in 38262c9
With mingw-w64's
basename()
, you'll get the truncated name测试中文-0.1.3-win
from:vszakats commentedon Feb 11, 2023
@Cherish98: Nice catch! That commit syncs the
HAVE_BASENAME
build setting for GNU Make/.m32/.mk (MinGW-w64) builds with CMake/autotools logic, which autodetected and enabled this setting prior to that patch. In practice, it means all CMake/autotools builds were broken before, official curl Windows builds got broken with curl/curl-for-win@1dc206c (went live with curl/curl-for-win@cb966af as binary release 7.83.1_4) and allMakefile.mk/Makefile.m32
builds broke with that patch.If we can confirm this as the root cause, the fix would be to ignore
HAVE_BASENAME
for all build methods forWIN32
and always stick to the local implementation (or force-disable this setting for all build methods).Can you make a curl build without
HAVE_BASENAME
and confirm that it fixes the problem?Correction: Native MSVC builds were never affected by that patch.
vszakats commentedon Feb 11, 2023
Untested patch:
disable HAVE_BASENAME test 1
vszakats commentedon Feb 11, 2023
HAVE_BASENAME
is enabled is official builds since 7.83.1_4, but Jay reproduced the issue with 7.83.1_2. This suggests a different cause or something in addition to this one.vszakats commentedon Feb 11, 2023
Test binaries with disabled
HAVE_BASENAME
here:https://ci.appveyor.com/project/curlorg/curl-for-win/build/artifacts
Cherish98 commentedon Feb 11, 2023
I tested the binaries with HAVE_BASENAME disabled, and it indeed has fixed the issue.
vszakats commentedon Feb 11, 2023
Thanks for your test @Cherish98! I'll make a PR of that patch soon.
8 remaining items