Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl: support parallel transfers #3804

Closed
wants to merge 2 commits into from
Closed

Conversation

bagder
Copy link
Member

@bagder bagder commented Apr 24, 2019

This is done by making sure each individual transfer is first added to a linked list as then they can be performed serially, or at will, in parallel.

Status

  • creates a unique curl handle for each transfer and adds them to a linked list
  • iterates over each transfer in the list and runs them one by one
  • existing code and test cases adjusted
  • minor behavior change documented
  • ability to run transfers in parallel
  • implement a basic progress meter for multiple concurrent transfers
  • documented the new parallel option (-Z / --parallel)
  • supports maximum N transfers done in parallel (--parallel-max)
  • make sure --retry still works for serial transfers

Still TODO post first merge

  • make --retry work for parallel
  • make sure it handles torture tests fine
  • test cases for parallel downloads
  • work out how to report success/failures for individual transfers

@bagder
Copy link
Member Author

bagder commented Apr 24, 2019

The codacy warnings are just silly.

@MarcelRaad
Copy link
Member

The codacy warnings are just silly.

You can turn that one off via Code patterns -> Flawfinder.

@bagder
Copy link
Member Author

bagder commented Apr 24, 2019

I can? I can't find anywhere to do that!

@MarcelRaad
Copy link
Member

After "Login with GitHub", "Code patterns" appears at the menu on the left.

@bagder bagder added the feature-window A merge of this requires an open feature window label Apr 25, 2019
@jay
Copy link
Member

jay commented Apr 26, 2019

I would just use --parallel and nix the short option, this seems like use will be almost always in scripts. I don't see anyone at the command line running curl -Z <fifty urls>.

confusing:
ckmopsuvwxz
CKMOPSUVWXZ

@jay
Copy link
Member

jay commented Apr 27, 2019

I had one unreproducible hang while I was playing around with this, I'm not sure what caused it but during the investigation I found an off-by-one bug so I'm going to attribute it to that unless it happens again.

> curld --parallel -o first http://mirrors.rit.edu/ubuntu-releases/16.04/ubuntu-16.04.2-server-amd64.template
DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed
--  --      0     0     1     1     0 --:--:-- --:--:-- --:--:-- -9223

That resource no longer exists, so I expect the tool to terminate almost immediately. The first time it didn't but every time after that it did (regardless of the fix).

@bagder
Copy link
Member Author

bagder commented Apr 27, 2019

thanks for the fixes @jay!

@jumoog
Copy link

jumoog commented Apr 29, 2019

I saw your YouTube video.. you can do a ETA if you request the file size of all urls first

@bagder
Copy link
Member Author

bagder commented Apr 29, 2019

No, I can't.

Primarily, lots of transfers libcurl can do simply do not know the size before-hand no matter what we do.
Secondly, for those that do, if we are to make 10,000 transfers it is not feasible to "get the size" of those 10K first before any transfers start. Partly because how libcurl works and partly because how those protocols work.

@bagder bagder force-pushed the bagder/parallel-transfers branch from 1084610 to bce4060 Compare May 11, 2019 10:55
@bagder bagder changed the title [WIP] curl: support parallel transfers curl: support parallel transfers May 11, 2019
@bagder
Copy link
Member Author

bagder commented May 11, 2019

I've now moved a few of the remaining action items to get them done after the first merge, so that we can do the development of those things more parallel and with cooperation. As long as the non-parallel use cases still work exactly like before and are stable it shouldn't be a problem to proceed like this.

@jay
Copy link
Member

jay commented May 14, 2019

My --write-outs are being overwritten by the progress meter. Simple example:

> curld --parallel --write-out "test %{http_code}" -o foo http://httpbin.org/get

DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed
--  --    204     0     1     0     0 --:--:-- --:--:-- --:--:--  3290

In parallel_transfers after a transfer finishes post_transfer is called for that transfer, which does the write-out. However after that progress meter is called again which overwrites the line.

curl/src/tool_operate.c

Lines 2011 to 2040 in 80f044f

if(!mcode) {
int rc;
CURLMsg *msg;
bool removed = FALSE;
do {
msg = curl_multi_info_read(multi, &rc);
if(msg) {
bool retry;
struct per_transfer *ended;
CURL *easy = msg->easy_handle;
result = msg->data.result;
curl_easy_getinfo(easy, CURLINFO_PRIVATE, (void *)&ended);
curl_multi_remove_handle(multi, easy);
result = post_transfer(global, share, ended, result, &retry);
if(retry)
continue;
progress_finalize(ended); /* before it goes away */
all_added--; /* one fewer added */
removed = TRUE;
(void)del_transfer(ended);
}
} while(msg);
if(removed)
/* one or more transfers completed, add more! */
(void)add_parallel_transfers(global, multi);
}
}
(void)progress_meter(global, &start, TRUE);

"DL% UL%  Dled  Uled  Xfers  Live   Qd Total     Current  Left    Speed\n"
"test 200"  <---  result = post_transfer(global, share, ended, result, &retry); 
"\r--  --    204     0     1     0     0 --:--:-- --:--:-- --:--:--  3290\n"  <---  (void)progress_meter(global, &start, TRUE); 

@bagder
Copy link
Member Author

bagder commented May 14, 2019

Any suggestion on what to do with --write-out in a parallel scenario?

@jay
Copy link
Member

jay commented May 14, 2019

Any suggestion on what to do with --write-out in a parallel scenario?

I'm not sure. Delay until finished?

@bagder
Copy link
Member Author

bagder commented May 14, 2019

I suppose it goes into the whole "how to show what happened for each transfer" discussion too. If you do 20 transfers, how are those --write-out going to make sense at the end...

@jay
Copy link
Member

jay commented May 14, 2019

I suppose it goes into the whole "how to show what happened for each transfer" discussion too. If you do 20 transfers, how are those --write-out going to make sense at the end...

If they're done in the order they were given they could make some sense. Or there could be a transfer id like "%{transfer_id}"

@bagder bagder force-pushed the bagder/parallel-transfers branch 2 times, most recently from 834a55a to 4c082ef Compare May 27, 2019 12:46
@bagder bagder force-pushed the bagder/parallel-transfers branch from 4c082ef to 81956d4 Compare June 11, 2019 12:00
@bagder bagder force-pushed the bagder/parallel-transfers branch from 81956d4 to 85264f9 Compare June 25, 2019 06:21
@bagder bagder force-pushed the bagder/parallel-transfers branch from 85264f9 to fa129a6 Compare July 15, 2019 13:29
This is done by making sure each individual transfer is first added to a
linked list as then they can be performed serially, or at will, in
parallel.
@bagder bagder force-pushed the bagder/parallel-transfers branch from fa129a6 to e281e99 Compare July 19, 2019 22:03
@bagder
Copy link
Member Author

bagder commented Jul 19, 2019

I intend to merge this within shortly.

@bagder bagder closed this in b889408 Jul 20, 2019
@bagder bagder deleted the bagder/parallel-transfers branch July 20, 2019 17:15
@MarcelRaad
Copy link
Member

This broke the autobuilds. Unfortunately, it's not easy to see why in this huge diff:
https://curl.haxx.se/dev/log.cgi?id=20190722042000-16028#prob1

src/tool_operate.c:2290:11: error: implicit declaration of function 'easysrc_cleanup'; did you mean 'glob_cleanup'? [-Werror=implicit-function-declaration]

jay added a commit to jay/curl that referenced this pull request Jul 22, 2019
easysrc_cleanup is only defined when CURL_DISABLE_LIBCURL_OPTION is
defined, and prior to this change would be called regardless.

Bug: curl#3804 (comment)
Reported-by: Marcel Raad

Closes #xxxx
@jay
Copy link
Member

jay commented Jul 22, 2019

This broke the autobuilds.

#4142

jay added a commit that referenced this pull request Jul 23, 2019
easysrc_cleanup is only defined when CURL_DISABLE_LIBCURL_OPTION is not
defined, and prior to this change would be called regardless.

Bug: #3804 (comment)
Reported-by: Marcel Raad

Closes #4142
caraitto pushed a commit to caraitto/curl that referenced this pull request Jul 23, 2019
This is done by making sure each individual transfer is first added to a
linked list as then they can be performed serially, or at will, in
parallel.

Closes curl#3804
caraitto pushed a commit to caraitto/curl that referenced this pull request Jul 23, 2019
easysrc_cleanup is only defined when CURL_DISABLE_LIBCURL_OPTION is not
defined, and prior to this change would be called regardless.

Bug: curl#3804 (comment)
Reported-by: Marcel Raad

Closes curl#4142
@lock lock bot locked as resolved and limited conversation to collaborators Oct 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cmdline tool feature-window A merge of this requires an open feature window
Development

Successfully merging this pull request may close these issues.

None yet

4 participants