Skip to content

windows:curl -F filename error when i use chinese #10261

Closed
@UnicornZhang

Description

@UnicornZhang

I want to upload file by curl ,but i get the file name error and missing file suffix

curl/libcurl version 7.87.0

curl  'http://10.128.6.26:3030/package/publish' --form 'projectName="electron-test"'  --form 'packageFile=@"C:\Users\Administrator\Desktop\test-upload\测试中文-0.1.3-win.exe"'

i get file name '测试中文-0.1.3-w'

In version 7.83.1

I get file name '测试中文-0.1.3-win.exe'

System:win10

Activity

jay

jay commented on Jan 9, 2023

@jay
Member

Please give us the version information curl -V from both versions.

UnicornZhang

UnicornZhang commented on Jan 9, 2023

@UnicornZhang
Author

@jay error version:curl 7.87.0 (x86_64-w64-mingw32) libcurl/7.87.0 OpenSSL/1.1.1s (Schannel) zlib/1.2.13 brotli/1.0.9 zstd/1.5.2 libidn2/2.3.3 libpsl/0.21.1 (+libidn2/2.3.1) libssh2/1.10.0 nghttp2/1.51.0
Release-Date: 2022-12-21
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM PSL SPNEGO SSL SSPI threadsafe TLS-SRP UnixSockets zstd

another version: curl 7.83.1 (Windows) libcurl/7.83.1 Schannel
Release-Date: 2022-05-13
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS HSTS IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI UnixSockets

jay

jay commented on Jan 9, 2023

@jay
Member

I can reproduce this in a Windows 7 VM when I change the region and locale (including non-Unicode locale) to traditional Chinese and use the mingw-w64 official curl builds after 7.83.1_2. With the same locale settings I have Visual Studio builds of 7.87.0, both no character set and multibyte character set, that will not reproduce in that VM with 7.87.0. I notice the Chinese characters are multibyte in the ansi codepage so maybe that has something to do with it, even though I can't reproduce with VS builds.

This is the last good version:

curl 7.83.1 (x86_64-pc-win32) libcurl/7.83.1 OpenSSL/3.0.2 (Schannel) zlib/1.2.12 brotli/1.0.9 libidn2/2.3.2 libssh2/1.10.0 nghttp2/1.47.0 libgsasl/1.10.0
Release-Date: 2022-05-11
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli gsasl HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI TLS-SRP UnixSockets


POST / HTTP/1.1
Host: 192.168.1.80:8000
User-Agent: curl/7.83.1
Accept: */*
Content-Length: 338
Content-Type: multipart/form-data; boundary=------------------------dc8b9491458271bb

--------------------------dc8b9491458271bb
Content-Disposition: form-data; name="projectName"

electron-test
--------------------------dc8b9491458271bb
Content-Disposition: form-data; name="packageFile"; filename="¦¦¦¦¦¦¦¦-0.1.3-win.exe"
Content-Type: application/octet-stream

test
--------------------------dc8b9491458271bb--

This is the first bad version:

curl 7.84.0 (x86_64-pc-win32) libcurl/7.84.0 OpenSSL/3.0.4 (Schannel) zlib/1.2.12 brotli/1.0.9 libidn2/2.3.2 libssh2/1.10.0 nghttp2/1.48.0 ngtcp2/0.6.0 nghttp3/0.5.0 libgsasl/1.10.0
Release-Date: 2022-06-27
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli gsasl HSTS HTTP2 HTTP3 HTTPS-proxy IDN IPv6 Kerberos Largefile libz MultiSSL NTLM SPNEGO SSL SSPI threadsafe TLS-SRP UnixSockets


POST / HTTP/1.1
Host: 192.168.1.80:8000
User-Agent: curl/7.84.0
Accept: */*
Content-Length: 334
Content-Type: multipart/form-data; boundary=------------------------c33fdc604f32fdb5

--------------------------c33fdc604f32fdb5
Content-Disposition: form-data; name="projectName"

electron-test
--------------------------c33fdc604f32fdb5
Content-Disposition: form-data; name="packageFile"; filename="¦¦¦¦¦¦¦¦-0.1.3-win"
Content-Type: application/octet-stream

test
--------------------------c33fdc604f32fdb5--

The ¦ are multibyte Chinese characters that socat can't print because the server is running in English. The same number of bytes in the chinese characters both times.

@vszakats any idea?

vszakats

vszakats commented on Jan 9, 2023

@vszakats
Member

I cannot spot anything between 7.83.1_1 and 7.83.1_2 that might affect character handling (skimming through the 100 commits, which are mainly HTTP/3 / LibreSSL prep work, switching libssh2 builds to autotools, "unified" packaging, and some other unrelated things):
curl/curl-for-win@ce495c3...70eb533

7.83.1_2 changed packaging to the single-zip style (curl/curl-for-win@3182733).

It also enabled -O3 with curl, though this later turned out to be a no-op and had to be redone because the upstream Makefile.m32 overrode it to -O2 anyway.

So I cannot see anything in this revision that might affect Chinese text.

7.83.1_3 replaced OpenSSL with openssl-quic, adding HTTP/3 support. That seems even less related:
curl/curl-for-win@70eb533...1a0412d

Though rereading the thread makes it unclear if this became an issue between 7.84.0 and 7.83.1, or between two distinct curl-for-win builds of 7.83.1. If the former, we need to turn to the curl source code. It's also not very likely that this is curl-for-win-specific, though if we have the regression between two revisions, I can look it up again.

vszakats

vszakats commented on Jan 9, 2023

@vszakats
Member

My generic guess would be this:

curl on Windows doesn't have Unicode support by default. In these builds all strings are considered raw-bytes and passed around as-is (*). This makes things appear to work in certain practical cases. For some this may be perceived as curl having special support for certain codepages/encodings. The reality though that it is just a happy coincidence.

(*) But, curl has certain places (and/or dependencies) which may steer off the track of the optimistic "as-is" string handling, and doing string converstion/manipulation, which instantly breaks the happy coincidences by assuming certain input formats and spitting out certain output formats. It's enough to call any Win32 API function with a W ending (e.g. WriteConsoleW() in src/tool_cb_wrt.c).

This is made even more complex in curl due to using both CRT functions and the Win32 API directly, each with potentially different encoding requirement or Unicode support. (And even more complexity comes when interfacing with the dependencies curl supports.)

Speaking of curl with Unicode support enabled: This is the correct path, but as of today, the level of support is just not enough to cover all cases. (And before we could cover all cases, we'd need to clear codepage requirements for each string curl is accepting or returning). The downside is that this mode breaks all the use-cases which work "correctly" by happy coincidence in non-Unicode mode, because of the added conversions and heavier use of Unicode-enabled functions needing them.

Till these fundamental issues are solved, iterations of these non-ASCII issues will keep popping up.

The solution is complex. Even if fully solved, with Unicode mode finished and enabled by default, it will inherently result in fallouts, because the old "happy coincidence" cases will start to break and will require correction to work as expected in a Unicode-enabled environment. (even more so for libcurl API users)

Cherish98

Cherish98 commented on Feb 11, 2023

@Cherish98
Contributor

@vszakats I believe it is caused by commit 68fa9bf in which HAVE_BASENAME is set to 1. Before that, it used the curl's version (Curl_basename()) which treated strings as raw-bytes and passed around as-is. The function is called by strippath() inside curl_mime_filedata() to set filename:

curl/lib/mime.c

Lines 1437 to 1445 in 38262c9

/* As a side effect, set the filename to the current file's base name.
It is possible to withdraw this by explicitly calling
curl_mime_filename() with a NULL filename argument after the current
call. */
base = strippath(filename);
if(!base)
result = CURLE_OUT_OF_MEMORY;
else {
CURLcode res = curl_mime_filename(part, base);

With mingw-w64's basename(), you'll get the truncated name 测试中文-0.1.3-win from:

#include <libgen.h>
#include <stdio.h>
int main(){
    char s[] = "测试中文-0.1.3-win.exe";
    return puts(basename(s));
}
vszakats

vszakats commented on Feb 11, 2023

@vszakats
Member

@Cherish98: Nice catch! That commit syncs the HAVE_BASENAME build setting for GNU Make/.m32/.mk (MinGW-w64) builds with CMake/autotools logic, which autodetected and enabled this setting prior to that patch. In practice, it means all CMake/autotools builds were broken before, official curl Windows builds got broken with curl/curl-for-win@1dc206c (went live with curl/curl-for-win@cb966af as binary release 7.83.1_4) and all Makefile.mk/Makefile.m32 builds broke with that patch.

If we can confirm this as the root cause, the fix would be to ignore HAVE_BASENAME for all build methods for WIN32 and always stick to the local implementation (or force-disable this setting for all build methods).

Can you make a curl build without HAVE_BASENAME and confirm that it fixes the problem?

Correction: Native MSVC builds were never affected by that patch.

vszakats

vszakats commented on Feb 11, 2023

@vszakats
Member

Untested patch:

diff --git a/lib/curl_setup.h b/lib/curl_setup.h
index 2eb9697fd..fcdfe3ca2 100644
--- a/lib/curl_setup.h
+++ b/lib/curl_setup.h
@@ -838,6 +838,10 @@ int getpwuid_r(uid_t uid, struct passwd *pwd, char *buf,
 #define USE_HTTP3
 #endif
 
+#if defined(HAVE_BASENAME) && defined(WIN32)
+#undef HAVE_BASENAME
+#endif
+
 #if defined(USE_UNIX_SOCKETS) && defined(WIN32)
 #  if defined(__MINGW32__) && !defined(LUP_SECURE)
      typedef u_short ADDRESS_FAMILY; /* Classic mingw, 11y+ old mingw-w64 */
added a commit that references this issue on Feb 11, 2023
vszakats

vszakats commented on Feb 11, 2023

@vszakats
Member

HAVE_BASENAME is enabled is official builds since 7.83.1_4, but Jay reproduced the issue with 7.83.1_2. This suggests a different cause or something in addition to this one.

vszakats

vszakats commented on Feb 11, 2023

@vszakats
Member

Test binaries with disabled HAVE_BASENAME here:
https://ci.appveyor.com/project/curlorg/curl-for-win/build/artifacts

Cherish98

Cherish98 commented on Feb 11, 2023

@Cherish98
Contributor

I tested the binaries with HAVE_BASENAME disabled, and it indeed has fixed the issue.

vszakats

vszakats commented on Feb 11, 2023

@vszakats
Member

Thanks for your test @Cherish98! I'll make a PR of that patch soon.

8 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    UnicodeUnicode, code page, character encodingWindowsWindows-specificcmdline tool

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @jay@vszakats@UnicornZhang@Cherish98

      Issue actions

        windows:curl -F filename error when i use chinese · Issue #10261 · curl/curl