Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
Total http/2 concurrency for multiplexed multi-handle
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: Jeroen Ooms via curl-library <curl-library_at_lists.haxx.se>
Date: Wed, 8 Feb 2023 18:02:40 +0100
I have a CRON job scraping some content from GitHub every night (about
20k small files). It worked well for a year, but recently something
changed such that after a few minutes GitHub stars giving a lot of 403
and then after another minute I start getting thousands of these:
HTTP/2 stream 20135 was not closed cleanly before end of the underlying stream
HTTP/2 stream 20137 was not closed cleanly before end of the underlying stream
HTTP/2 stream 20139 was not closed cleanly before end of the underlying stream
So either they introduced a server bug, or perhaps GitHub is
deliberately blocking abusive behavior due to high concurrency.
I am using a multi handle with CURLPIPE_MULTIPLEX and otherwise
default settings. Am I correct that this means libcurl starts 100
concurrent streams (CURLMOPT_MAX_CONCURRENT_STREAMS), and still make 6
concurrent connections (CURLMOPT_MAX_HOST_CONNECTIONS) per host, i.e.
download 600 files in parallel? I can imagine that could be considered
abusive.
Should I set CURLMOPT_MAX_HOST_CONNECTIONS to 1 in case of http/2
multiplexing? Or is CURLMOPT_MAX_HOST_CONNECTIONS ignored in case of
multiplexing?
One other thing I noticed is that GitHub does not seem to set any
MAX_CONCURRENT_STREAMS, or at least I am not seeing any. For example
on httpbin I see this:
curl -v 'https://httpbin.org/get' --http2
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
However for GitHub I don't see such a thing:
curl -v 'https://raw.githubusercontent.com/curl/curl/master/README' --http2
So does this mean libcurl will assume 100 streams is OK?
Is there a way to debug this, and monitor how many active downloads a
multi-handle is making in total (summed over all connections)? Afaik,
the value 'running_handles' from curl_multi_perform() gives me the
total uncompleted requests, including those that have not started yet,
so that does not tell me how many are in progress?
Date: Wed, 8 Feb 2023 18:02:40 +0100
I have a CRON job scraping some content from GitHub every night (about
20k small files). It worked well for a year, but recently something
changed such that after a few minutes GitHub stars giving a lot of 403
and then after another minute I start getting thousands of these:
HTTP/2 stream 20135 was not closed cleanly before end of the underlying stream
HTTP/2 stream 20137 was not closed cleanly before end of the underlying stream
HTTP/2 stream 20139 was not closed cleanly before end of the underlying stream
So either they introduced a server bug, or perhaps GitHub is
deliberately blocking abusive behavior due to high concurrency.
I am using a multi handle with CURLPIPE_MULTIPLEX and otherwise
default settings. Am I correct that this means libcurl starts 100
concurrent streams (CURLMOPT_MAX_CONCURRENT_STREAMS), and still make 6
concurrent connections (CURLMOPT_MAX_HOST_CONNECTIONS) per host, i.e.
download 600 files in parallel? I can imagine that could be considered
abusive.
Should I set CURLMOPT_MAX_HOST_CONNECTIONS to 1 in case of http/2
multiplexing? Or is CURLMOPT_MAX_HOST_CONNECTIONS ignored in case of
multiplexing?
One other thing I noticed is that GitHub does not seem to set any
MAX_CONCURRENT_STREAMS, or at least I am not seeing any. For example
on httpbin I see this:
curl -v 'https://httpbin.org/get' --http2
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
However for GitHub I don't see such a thing:
curl -v 'https://raw.githubusercontent.com/curl/curl/master/README' --http2
So does this mean libcurl will assume 100 streams is OK?
Is there a way to debug this, and monitor how many active downloads a
multi-handle is making in total (summed over all connections)? Afaik,
the value 'running_handles' from curl_multi_perform() gives me the
total uncompleted requests, including those that have not started yet,
so that does not tell me how many are in progress?
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.se/mail/etiquette.htmlReceived on 2023-02-08