Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Re: libcurl read-like interface
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: XSLT2.0 via curl-library <curl-library_at_cool.haxx.se>
Date: Sat, 26 Dec 2020 17:16:35 +0100
I beg your pardon if I made you angry Daniel, understating the
"simplicity" of http!
You are not responding to the point about "control inversion" though.
Let me try to better explain what happens to a "read" with the current
behaviour. You know it since you have coded it yourself in "fcurl".
Let's assume we have done all the bits to start the transfer, and now do
fcurl_read().
At some point you arrive in function transfer() which calls
curl_multi_perform() (at line 123)
curl_multi_perform() will not return as long as there is data present
(the "transfer loop"), and will repeatedly use the write_callback(),
which must now store excess data.
How much data do we talk about here?
Well, I made the test on my (slow) laptop. Using curl_easy_rcv(), in
plain http I have a lot of EGAIN, that is where the curl_multi_perform()
will return because the socket is empty, hence there, the storage will
be "reasonable" around a few kilobytes.
But in https I get something like 500 total EGAIN on the 10GB file, that
is because my PC is twice slower at doing TLS than the network, hence
there is almost always "data in the pipe" for curl_multi_perform() to
consume.
That would mean allocating 20MB or more per transfer.
In theory, on an even slower machine or some workload of the CPU, what
the current fcurl_read() could end with is: put all the file in memory,
then you serve the subsequent fcurl_read from memory!
Performance aside, it is not an acceptable "risk" when writing a
filesystem: you don't know how many concurrent files/streams can be
opened at the same time. In case of "random read", you would have
buffered megabyte of data you just don't need.
To avoid that, the current solution I am using is running the curl
transfers in separate threads, and blocking the callback with a
semaphore when I have all the data I need to satisfy reads. That makes a
complex code with thread communication, locking, atomic counters, etc...
much prone to errors an bugs.
That is because the "transfer loop", in the current architecture, is the
property of libcurl.
If we could be reversed the control of that loop, the "transfer loop"
would now be the caller's property, and we don't have any need of doing
separate threads to avoid a the drawback explained above. The control of
how many bytes we want to pull out is now in the hands of the caller's
transfer loop (plus a few more kilobytes in intermediate layers like
TLS). Excess bytes stay in the socket's buffer (or the intermediate layers).
Even not talking "performance", because indeed a "cache miss" in a
network filesystem is much worse that some additional
copy/alloc/semaphore-context-switch, I am looking at a major
simplification of my code.
As of today, the solutions I could thing about for that simplification:
- using curl_easy_rcv()
- writing my own transfer/socket code +/- with inspirations from other
codes like GNU wget, which is very far from doing all what curl does
(and does not have a "library") but can also be considered solid in what
it does (http limited to 1.1 apparently).
The right balance to get the control of the "transfer loop" back in my
caller code, with all the simplifications it brings, seems to me the
first option: curl_easy_rcv()
Yes, I am aware I will then need to replicate some existing code (as
http/1.1 is a known protocol), either from libcurl, or inspired from
other code, but I hope the balance will still lean in the right direction.
Summary keywords, if you want to react on that, could be: control
inversion of the transfer loop.
Sorry again.
Cheers
Alain
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2020-12-26
Date: Sat, 26 Dec 2020 17:16:35 +0100
I beg your pardon if I made you angry Daniel, understating the
"simplicity" of http!
You are not responding to the point about "control inversion" though.
Let me try to better explain what happens to a "read" with the current
behaviour. You know it since you have coded it yourself in "fcurl".
Let's assume we have done all the bits to start the transfer, and now do
fcurl_read().
At some point you arrive in function transfer() which calls
curl_multi_perform() (at line 123)
curl_multi_perform() will not return as long as there is data present
(the "transfer loop"), and will repeatedly use the write_callback(),
which must now store excess data.
How much data do we talk about here?
Well, I made the test on my (slow) laptop. Using curl_easy_rcv(), in
plain http I have a lot of EGAIN, that is where the curl_multi_perform()
will return because the socket is empty, hence there, the storage will
be "reasonable" around a few kilobytes.
But in https I get something like 500 total EGAIN on the 10GB file, that
is because my PC is twice slower at doing TLS than the network, hence
there is almost always "data in the pipe" for curl_multi_perform() to
consume.
That would mean allocating 20MB or more per transfer.
In theory, on an even slower machine or some workload of the CPU, what
the current fcurl_read() could end with is: put all the file in memory,
then you serve the subsequent fcurl_read from memory!
Performance aside, it is not an acceptable "risk" when writing a
filesystem: you don't know how many concurrent files/streams can be
opened at the same time. In case of "random read", you would have
buffered megabyte of data you just don't need.
To avoid that, the current solution I am using is running the curl
transfers in separate threads, and blocking the callback with a
semaphore when I have all the data I need to satisfy reads. That makes a
complex code with thread communication, locking, atomic counters, etc...
much prone to errors an bugs.
That is because the "transfer loop", in the current architecture, is the
property of libcurl.
If we could be reversed the control of that loop, the "transfer loop"
would now be the caller's property, and we don't have any need of doing
separate threads to avoid a the drawback explained above. The control of
how many bytes we want to pull out is now in the hands of the caller's
transfer loop (plus a few more kilobytes in intermediate layers like
TLS). Excess bytes stay in the socket's buffer (or the intermediate layers).
Even not talking "performance", because indeed a "cache miss" in a
network filesystem is much worse that some additional
copy/alloc/semaphore-context-switch, I am looking at a major
simplification of my code.
As of today, the solutions I could thing about for that simplification:
- using curl_easy_rcv()
- writing my own transfer/socket code +/- with inspirations from other
codes like GNU wget, which is very far from doing all what curl does
(and does not have a "library") but can also be considered solid in what
it does (http limited to 1.1 apparently).
The right balance to get the control of the "transfer loop" back in my
caller code, with all the simplifications it brings, seems to me the
first option: curl_easy_rcv()
Yes, I am aware I will then need to replicate some existing code (as
http/1.1 is a known protocol), either from libcurl, or inspired from
other code, but I hope the balance will still lean in the right direction.
Summary keywords, if you want to react on that, could be: control
inversion of the transfer loop.
Sorry again.
Cheers
Alain
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.se/mail/etiquette.html
Received on 2020-12-26