cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: debugging a crash in Curl_pgrsTime/checkPendPipeline?

From: <johansen_at_sun.com>
Date: Tue, 21 Jul 2009 20:38:27 -0700

On Mon, Jul 20, 2009 at 01:35:27PM -0700, johansen_at_sun.com wrote:
> I'm still trying to reproduce the problem, but no luck yet.

A colleague's machine seems adept at reproducing this problem, but I
haven't yet discovered how to induce the conditions myself. Regardless,
I was able to throw together a more instrumented version of libcurl and
get better debugging information out of the core.

Here's what I have so far. I'd appreciate some thoughts on the
potential root cause:

I've still been unable to reproduce the problem; however, a co-worker
with a machine that is from the twilight zone seems able to reproduce
this problem every time he runs our application. I got him to run
against a version of libcurl with CTF, CURLDEBUG, and libumem.

Lucky for us, we got a coredump:

> ::status
file: /usr/bin/i86/isapython2.4
initial argv: /usr/bin/python2.4 /usr/bin/pkg install --no-index redistributable
threading model: native threads
status: process terminated by SIGSEGV (Segmentation Fault), addr=deadbeef

In this case, the segfault is on 0xdeadbeef, which means we got caught
trying to access memory that was already free'd. Very naughty.

The stack is different with libumem, presumably because it stops us from
carrying on with the bad behavior once it catches us doing things that
we shouldn't.

> $C
08046094 libcurl.so.3.0.0`Curl_removeHandleFromPipeline+0x16(cf3b00c, deadbeef,
2ec2298c, fd50b332)
08046144 libcurl.so.3.0.0`multi_runsingle+0xe5b(cb9524c, 2ec22b0c, cf79370,
fd50c39a)
08046194 libcurl.so.3.0.0`curl_multi_perform+0x62(cb9524c, 80461bc)
080461c8 pycurl.so`do_multi_perform+0xb3(8805414, 0, 8454e94, fd5716a0)
08046218 libpython2.4.so.1.0`call_function+0x315(804628c, 0, da8b7165, 0)

The final stack frame shows the address of the SessionHandle, but not
the connecdata or easy handle that caused us the trouble. Fortunately,
the previous frame contains the Curl_one_easy:

> 2ec22b0c::print struct Curl_one_easy
{
    next = 0x2ec22e6c
    prev = 0x2ec226ec
    easy_handle = 0xcf3b00c
    easy_conn = 0x2f98c88c
    state = 0t14 (CURLM_STATE_DONE)
    result = 0 (CURLE_OK)
    msg = 0
    msg_num = 0
    sockets = [ 0, 0, 0, 0, 0 ]
    numsocks = 0
}

This looks legit. We're in CURLM_STATE_DONE and just called
removeHandleFromPipeline. Without memory debugging, we would have died
in the next call, which is to checkPendPipeline.

Looking at our struct connectdata, it has been freed just as we had suspected.

> 0x2f98c88c::whatis
2f98c88c is 2f98c880+c, bufctl 2f9976e0 freed from umem_alloc_896

Fortunately for us, we still have the bufctl available:

> 2f9976e0::bufctl -v
            ADDR BUFADDR TIMESTAMP THREAD
                            CACHE LASTLOG CONTENTS
        2f9976e0 2f98c880 a14aa73b08306 1
                          80d7710 8073140 80905a0
                 libumem.so.1`umem_cache_free_debug+0x13c
                 libumem.so.1`umem_cache_free+0x43
                 libumem.so.1`umem_free+0xe2
                 libumem.so.1`process_free+0x55
                 libumem.so.1`free+0x1a
                 libcurl.so.3.0.0`curl_dofree+0x69
                 libcurl.so.3.0.0`conn_free+0x243
                 libcurl.so.3.0.0`Curl_disconnect+0x219
                 libcurl.so.3.0.0`Curl_done+0x214
                 libcurl.so.3.0.0`multi_runsingle+0x728
                 libcurl.so.3.0.0`curl_multi_perform+0x62

Disassembling multi_runsingle gives us some clues about what's going on
at the time that runsingle called Curl_done:

> multi_runsingle+0x728::dis
libcurl.so.3.0.0`multi_runsingle+0x709:
call -0x2f503 <PLT=libcurl.so.3.0.0`Curl_posttransfer>
libcurl.so.3.0.0`multi_runsingle+0x70e: addl $0x10,%esp
libcurl.so.3.0.0`multi_runsingle+0x711: subl $0x4,%esp
libcurl.so.3.0.0`multi_runsingle+0x714: pushl $0x0
libcurl.so.3.0.0`multi_runsingle+0x716: movl 0xc(%ebp),%eax
libcurl.so.3.0.0`multi_runsingle+0x719: pushl 0x14(%eax)
libcurl.so.3.0.0`multi_runsingle+0x71c: movl 0xc(%ebp),%eax
libcurl.so.3.0.0`multi_runsingle+0x71f: addl $0xc,%eax
libcurl.so.3.0.0`multi_runsingle+0x722: pushl %eax
libcurl.so.3.0.0`multi_runsingle+0x723:
call -0x2ff9d <PLT=libcurl.so.3.0.0`Curl_done>
libcurl.so.3.0.0`multi_runsingle+0x728: addl $0x10,%esp
libcurl.so.3.0.0`multi_runsingle+0x72b: movb $0x1,-0x25(%ebp)
libcurl.so.3.0.0`multi_runsingle+0x72f:
jmp +0x797 <libcurl.so.3.0.0`multi_runsingle+0xecb>

This code looks like it calls Curl_posttransfer, immediately followed by
Curl_done. There are a few places in multi_runsingle that do this.
More investigation is required:

bcurl.so.3.0.0`multi_runsingle+0x630: movl 0xc(%ebp),%eax
libcurl.so.3.0.0`multi_runsingle+0x633: movl 0x8(%eax),%eax
libcurl.so.3.0.0`multi_runsingle+0x636: cmpb $0x0,0x29d(%eax)

This is a bit before runsingle+0x728, but it's comparing offset 0x29d
from a particular pointer. Looking at the plausible candidates yields
the following:

> ::print -at struct SessionHandle set.connect_only
29d _Bool set.connect_only

So this code is the CURLM_STATE_DO portion of multi_runsingle.

libcurl.so.3.0.0`multi_runsingle+0x683:
call -0x2f4bd <PLT=libcurl.so.3.0.0`Curl_do>
libcurl.so.3.0.0`multi_runsingle+0x688: addl $0x10,%esp
libcurl.so.3.0.0`multi_runsingle+0x68b: movl %eax,0x14(%esi)
libcurl.so.3.0.0`multi_runsingle+0x68e: movl 0xc(%ebp),%eax
libcurl.so.3.0.0`multi_runsingle+0x691: cmpl $0x0,0x14(%eax)
libcurl.so.3.0.0`multi_runsingle+0x695:
jne +0x69 <libcurl.so.3.0.0`multi_runsingle+0x700>

Here we are calling Curl_do, and then checking easy->result (offset
0x14) against 0. In this case we must have taken the jump, since we're
down near 0x728.

libcurl.so.3.0.0`multi_runsingle+0x700: subl $0xc,%esp
libcurl.so.3.0.0`multi_runsingle+0x703: movl 0xc(%ebp),%eax
libcurl.so.3.0.0`multi_runsingle+0x706: pushl 0x8(%eax)
libcurl.so.3.0.0`multi_runsingle+0x709:
call -0x2f503 <PLT=libcurl.so.3.0.0`Curl_posttransfer>

Offset 0x700 is the preamble to Curl_posstransfer, which comes
immediately before our call to Curl_done, so this is where we hit the
error.

Unfortunately, finding the location of the error doesn't give us a lot
of insight into what went wrong here.

Looking at the handles helps, though:

One interesting discovery involves looking at all of the Curl_one_easy
structures that are associated with this multi handle:

cb95250 {
    cb95250 next = 0x2ec22cec
    cb95254 prev = 0x2ec2298c
    cb95258 easy_handle = 0
    cb9525c easy_conn = 0
    cb95260 state = 0 (CURLM_STATE_INIT)
    cb95264 result = 0 (CURLE_OK)
    cb95268 msg = 0
    cb9526c msg_num = 0
    cb95270 sockets = [ 0, 0, 0, 0, 0 ]
    cb95284 numsocks = 0
}
2ec22cec {
    2ec22cec next = 0x2ec229ec
    2ec22cf0 prev = 0xcb95250
    2ec22cf4 easy_handle = 0xceeb00c
    2ec22cf8 easy_conn = 0
    2ec22cfc state = 0t15 (CURLM_STATE_COMPLETED)
    2ec22d00 result = 0 (CURLE_OK)
    2ec22d04 msg = 0xceda69c
    2ec22d08 msg_num = 0x1
    2ec22d0c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22d20 numsocks = 0
}
2ec229ec {
    2ec229ec next = 0x2ec226ec
    2ec229f0 prev = 0x2ec22cec
    2ec229f4 easy_handle = 0xcf4600c
    2ec229f8 easy_conn = 0
    2ec229fc state = 0t15 (CURLM_STATE_COMPLETED)
    2ec22a00 result = 0 (CURLE_OK)
    2ec22a04 msg = 0xceda54c
    2ec22a08 msg_num = 0x1
    2ec22a0c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22a20 numsocks = 0
}
2ec226ec {
    2ec226ec next = 0x2ec22b0c
    2ec226f0 prev = 0x2ec229ec
    2ec226f4 easy_handle = 0xcf2900c
    2ec226f8 easy_conn = 0
    2ec226fc state = 0t15 (CURLM_STATE_COMPLETED)
    2ec22700 result = 0 (CURLE_OK)
    2ec22704 msg = 0xceda7b4
    2ec22708 msg_num = 0x1
    2ec2270c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22720 numsocks = 0
}
2ec22b0c {
    2ec22b0c next = 0x2ec22e6c
    2ec22b10 prev = 0x2ec226ec
    2ec22b14 easy_handle = 0xcf3b00c
    2ec22b18 easy_conn = 0x2f98c88c
    2ec22b1c state = 0t14 (CURLM_STATE_DONE)
    2ec22b20 result = 0 (CURLE_OK)
    2ec22b24 msg = 0
    2ec22b28 msg_num = 0
    2ec22b2c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22b40 numsocks = 0
}
2ec22e6c {
    2ec22e6c next = 0x2ec2286c
    2ec22e70 prev = 0x2ec22b0c
    2ec22e74 easy_handle = 0xcf0a00c
    2ec22e78 easy_conn = 0
    2ec22e7c state = 0t12 (CURLM_STATE_PERFORM)
    2ec22e80 result = 0 (CURLE_OK)
    2ec22e84 msg = 0
    2ec22e88 msg_num = 0
    2ec22e8c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22ea0 numsocks = 0
}
2ec2286c {
    2ec2286c next = 0x2ec22a4c
    2ec22870 prev = 0x2ec22e6c
    2ec22874 easy_handle = 0xceff00c
    2ec22878 easy_conn = 0
    2ec2287c state = 0t11 (CURLM_STATE_WAITPERFORM)
    2ec22880 result = 0 (CURLE_OK)
    2ec22884 msg = 0
    2ec22888 msg_num = 0
    2ec2288c sockets = [ 0, 0, 0, 0, 0 ]
    2ec228a0 numsocks = 0
}
2ec22a4c {
    2ec22a4c next = 0x2ec22bcc
    2ec22a50 prev = 0x2ec2286c
    2ec22a54 easy_handle = 0xcef600c
    2ec22a58 easy_conn = 0
    2ec22a5c state = 0t15 (CURLM_STATE_COMPLETED)
    2ec22a60 result = 0t55 (CURLE_SEND_ERROR)
    2ec22a64 msg = 0xceda824
    2ec22a68 msg_num = 0x1
    2ec22a6c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22a80 numsocks = 0
}
2ec22bcc {
    2ec22bcc next = 0x2ec22b6c
    2ec22bd0 prev = 0x2ec22a4c
    2ec22bd4 easy_handle = 0xcf3200c
    2ec22bd8 easy_conn = 0
    2ec22bdc state = 1 (CURLM_STATE_CONNECT)
    2ec22be0 result = 0 (CURLE_OK)
    2ec22be4 msg = 0
    2ec22be8 msg_num = 0
    2ec22bec sockets = [ 0, 0, 0, 0, 0 ]
    2ec22c00 numsocks = 0
}
2ec22b6c {
    2ec22b6c next = 0x2ec2298c
    2ec22b70 prev = 0x2ec22bcc
    2ec22b74 easy_handle = 0xcf1600c
    2ec22b78 easy_conn = 0
    2ec22b7c state = 1 (CURLM_STATE_CONNECT)
    2ec22b80 result = 0 (CURLE_OK)
    2ec22b84 msg = 0
    2ec22b88 msg_num = 0
    2ec22b8c sockets = [ 0, 0, 0, 0, 0 ]
    2ec22ba0 numsocks = 0
}
2ec2298c {
    2ec2298c next = 0xcb95250
    2ec22990 prev = 0x2ec22b6c
    2ec22994 easy_handle = 0xcf1f00c
    2ec22998 easy_conn = 0
    2ec2299c state = 1 (CURLM_STATE_CONNECT)
    2ec229a0 result = 0 (CURLE_OK)
    2ec229a4 msg = 0
    2ec229a8 msg_num = 0
    2ec229ac sockets = [ 0, 0, 0, 0, 0 ]
    2ec229c0 numsocks = 0
}

Notice that every Curl_one_easy except for 0x2ec22b0c has it's easy_conn
set to NULL. It doesn't seem like a coincidence that 0x2ec22b0c is also
the handle that had the use-after-free problem. The multi handle has
pipelining enabled, so we should have called signalPipeClose() on all of
the outstanding sessions:

> 0xcb9524c::print struct Curl_multi pipelining_enabled
pipelining_enabled = 0x1
> 0xcf3b00c::print struct SessionHandle multi
multi = 0xcb9524c

Only the 2ec22e6c and 2ec2286c Curl_one_easy structures have
SessionHandles where pipe_broke is True. This means that
multi_runsingle hasn't yet been re-invoked on these handles. They're in
the PERFORM and WAITPERFORM states, respectively.

It looks as though one handle had a send error, went to kick other
Curl_one_easies off it's pipeline, but somehow failed to clear the
easy_conn for one last handle.

As far as I can tell, this is how we got into trouble:

The Curl_one_easy with address 0x2ec22b0c completes in
CURLM_STATE_PERFORM successfully, this results in its SessionHandle
being removed from the connectdata's receive pipeline in this piece of
code:

[starting at multi.c:1318]

      else if(TRUE == done) {
        char *newurl;
        bool retry = Curl_retry_request(easy->easy_conn, &newurl);
        followtype follow=FOLLOW_NONE;

        /* call this even if the readwrite function returned error */
        Curl_posttransfer(easy->easy_handle);

        /* we're no longer receving */
        Curl_removeHandleFromPipeline(easy->easy_handle,
                                      easy->easy_conn->recv_pipe);

        /* expire the new receiving pipeline head */
        if(easy->easy_conn->recv_pipe->head)
          Curl_expire(easy->easy_conn->recv_pipe->head->ptr, 1);

        /* Check if we can move pending requests to send pipe */
        checkPendPipeline(easy->easy_conn);

        /* When we follow redirects or is set to retry the connection, we must
           to go back to the CONNECT state */
        if(easy->easy_handle->req.newurl || retry) {
          if(!retry) {
            /* if the URL is a follow-location and not just a retried request
               then figure out the URL here */
            newurl = easy->easy_handle->req.newurl;
            easy->easy_handle->req.newurl = NULL;
            follow = FOLLOW_REDIR;
          }
          else
            follow = FOLLOW_RETRY;
          easy->result = Curl_done(&easy->easy_conn, CURLE_OK, FALSE);
          if(easy->result == CURLE_OK)
            easy->result = Curl_follow(easy->easy_handle, newurl, follow);
          if(CURLE_OK == easy->result) {
            multistate(easy, CURLM_STATE_CONNECT);
            result = CURLM_CALL_MULTI_PERFORM;
          }
          else
            /* Since we "took it", we are in charge of freeing this on
               failure */
            free(newurl);
        }
        else {
          /* after the transfer is done, go DONE */

          /* but first check to see if we got a location info even though we're
             not following redirects */
          if (easy->easy_handle->req.location) {
            newurl = easy->easy_handle->req.location;
            easy->easy_handle->req.location = NULL;
            easy->result = Curl_follow(easy->easy_handle, newurl, FOLLOW_FAKE);
            if (easy->result)
              free(newurl);
          }

          multistate(easy, CURLM_STATE_DONE);
          result = CURLM_CALL_MULTI_PERFORM;
        }
      }

Above we can see that after posttransfer, the removeHandleFromPipeline
call takes our easy_handle (SessionHandle) off of the recv_pipe. If
everything else goes accoriding to plan, we transition to
CURLM_STATE_DONE and go on with life.

The list of Curl_one_easy's show that nobody else has an easy_conn, and
that 0x2ec22a4c encountered a SEND_ERROR. That send error would have
resulted in a Curl_done() and we have the bufctl showing the connect
data being free'd. A call to Curl_disconnect() calls signalPipeClose(),
followed by Curl_multi_handlePipeBreak(), which finally sets the
easy_conn to NULL. If 0x2ec22b0c removed its SessionHandle from the
pipelines as part of moving from CURLM_STATE_PERFORM ->
CURLM_STATE_DONE, and it was sharing a connection with 0x2ec22a4c, then
it seems plausible that an error on the shared connection would cause
the easy_conn to get free'd, but never set to NULL in 0x2ec22b0c since
its SessionHandle was removed from the connection's recv pipeline.

The question here is then twofold:

1. Does this sound like a reasonable analysis of the problem?
2. How do we fix this?

-j
Received on 2009-07-22