Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Should CURLOPT_READFUNCTION need EOF twice?
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Jiří Hruška via curl-library <curl-library_at_lists.haxx.se>
Date: Wed, 15 Nov 2023 05:48:20 +0100
Hi all, I found a bug which has been causing my `git fetch` to hang often and
I'm fixing it on the Git side. But the root cause is ultimately an integration
issue with libcurl, so I thought I would discuss it with you here too.
If the outgoing request body is larger than some buffer size, git-remote-curl
registers a `CURLOPT_READFUNCTION` callback and streams the data out in chunks.
When everything is sent, the callback returns 0 (EOF), the request is completed
and a response comes back.
The problem is, for some specific combination of remote host + network + OS +
HTTP/2 library + libcurl version + timing etc., cURL happens to call the
`CURLOPT_READFUNCTION` callback again sometimes, even after already getting
a zero return value back once before. This causes Git to try read more data to
send from a pipe which is already silent and everything deadlocks.
(More details are provided in the patch thread at [1].)
I think I can see how this happens in transfer.c / `readwrite_upload()` [2].
If the user callback is called with the previous buffer only partially sent so
far (i.e. `k->upload_present` > 0, hence `offset` > 0), then even if it returns
0, the result will be `nread = offset + fillcount` > 0 and the upload will not
be finished yet. Only after exhausting the whole send buffer, some subsequent
zero return value will result in `if(nread <= 0)` and `Curl_done_sending()`.
And it seems fine and intentional. Just slightly weird from the library user's
perspective to get called again. Yet in line with how fread() would behave,
and that's ultimately the API here, right?
As far as I can see, the behavior in this specific aspect is undocumented [3].
Although some might argue that the part "Returning 0 signals end-of-file to the
library and causes it to stop the current transfer" suggests the callback would
not be called again. Which could cause a crash even, if someone already frees
their state or whatever. Hence a note about this in the docs might be useful.
But what's your opinion, dear cURL hackers?
1) Bug, should be fixed (to never call again after 0/EOF)
2) Quirk, should be documented
3) Intentional, should be documented
4) ...? (my whole analysis is flawed etc.)
Kind regards,
Jiri
[1]: https://public-inbox.org/git/CAGE_+C6DJMAO0bj5QHoKBBV3gMEMtZ-ajJ9ZnDGYq6eorr-oig_at_mail.gmail.com/
[2]: https://github.com/curl/curl/blob/444f64b3e93bbb0b6a65ffa10c5bb3d1becfc140/lib/transfer.c#L810
[3]: https://curl.se/libcurl/c/CURLOPT_READFUNCTION.html
Date: Wed, 15 Nov 2023 05:48:20 +0100
Hi all, I found a bug which has been causing my `git fetch` to hang often and
I'm fixing it on the Git side. But the root cause is ultimately an integration
issue with libcurl, so I thought I would discuss it with you here too.
If the outgoing request body is larger than some buffer size, git-remote-curl
registers a `CURLOPT_READFUNCTION` callback and streams the data out in chunks.
When everything is sent, the callback returns 0 (EOF), the request is completed
and a response comes back.
The problem is, for some specific combination of remote host + network + OS +
HTTP/2 library + libcurl version + timing etc., cURL happens to call the
`CURLOPT_READFUNCTION` callback again sometimes, even after already getting
a zero return value back once before. This causes Git to try read more data to
send from a pipe which is already silent and everything deadlocks.
(More details are provided in the patch thread at [1].)
I think I can see how this happens in transfer.c / `readwrite_upload()` [2].
If the user callback is called with the previous buffer only partially sent so
far (i.e. `k->upload_present` > 0, hence `offset` > 0), then even if it returns
0, the result will be `nread = offset + fillcount` > 0 and the upload will not
be finished yet. Only after exhausting the whole send buffer, some subsequent
zero return value will result in `if(nread <= 0)` and `Curl_done_sending()`.
And it seems fine and intentional. Just slightly weird from the library user's
perspective to get called again. Yet in line with how fread() would behave,
and that's ultimately the API here, right?
As far as I can see, the behavior in this specific aspect is undocumented [3].
Although some might argue that the part "Returning 0 signals end-of-file to the
library and causes it to stop the current transfer" suggests the callback would
not be called again. Which could cause a crash even, if someone already frees
their state or whatever. Hence a note about this in the docs might be useful.
But what's your opinion, dear cURL hackers?
1) Bug, should be fixed (to never call again after 0/EOF)
2) Quirk, should be documented
3) Intentional, should be documented
4) ...? (my whole analysis is flawed etc.)
Kind regards,
Jiri
[1]: https://public-inbox.org/git/CAGE_+C6DJMAO0bj5QHoKBBV3gMEMtZ-ajJ9ZnDGYq6eorr-oig_at_mail.gmail.com/
[2]: https://github.com/curl/curl/blob/444f64b3e93bbb0b6a65ffa10c5bb3d1becfc140/lib/transfer.c#L810
[3]: https://curl.se/libcurl/c/CURLOPT_READFUNCTION.html
-- Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.htmlReceived on 2023-11-15