Re: More debugging for pausing/resuming connections
Date: Sat, 12 May 2018 00:30:00 +0200 (CEST)
On Fri, 11 May 2018, Philip Prindeville wrote:
> I’m trying to confirm that there’s a bug in curl_easy_pause(). What I
> typically see is the following:
> and… that’s all she wrote. Things appear to lock-up. Even though a lot
> more data should have arrived, I only see a single WRITEFUNCTION callback.
> I’m not seeing the session being put back into a paused state.
The example code I wrote previously when we discussed pausing did roughly this
but it did not get stuck so I think there's something more to it than just
these steps that's required to make it trigger. Possibly timing, possibly
> 1. can we add more debugging/tracing around the unpausing function so that
> sufficient state is dumped that we can tell if it’s operating correctly or
> not? For instance, looking at data->state.tempcount and
> data->state.tempwrite instead in curl_easy_pause().
I think you should add all the debugging/tracing you need to get you more
details to figure out what's going on! That's what I tend to do when debugging
issues like this.
> 2. can we make curl_easy_pause() callable from another thread so that we can
> eliminate the periodic timer in #5 above?
I'm not against the idea, but that would require mutex locking of some sorts
so it would either require that libcurl gets that ability built-in (it has
been discussed numerous times in the past for the TLS mutex callbacks) or that
it uses for example mutex callbacks like in the share API.
One of these tasks that requires that someone rolls up their sleeves, write up
a proposal and is prepared to write some code to make it happen.
> 3. similarly, can we make curl_multi_remove_handle() callable from another
> thread so we can abort a transfer without the periodic timer in #5 above?
It has pretty much the same caveats as (2).
> I have some code that seems to indicate there’s a bug in the pausing code,
> but I can’t share it with the public.
I would urge you to try to write new stand-alone application code that you can
share with us that can reproduce the bug. Then you can write it as simple as
possible which will only help debugging the case. I think that's well invested
time. As if you can't reproduce it with such a simple example, then that also
tells us something. If you can repro the problem like this, it is way better
to have a simple example to use when debugging and we can discuss the details
openly on this list and we can get more eyes on the issue... And then we can
also possibly turn that code into a test case to make sure this doesn't
regress in the future.
-- / daniel.haxx.se
Received on 2018-05-12