cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: multi-threads SFTP in libcurl

From: SU Xing A <Xing.a.Su_at_alcatel-lucent.com>
Date: Fri, 12 Nov 2010 10:59:12 +0800

 
Hi, Daniel,
   
     It is not that libssh2 suddenly do wrong when go beyond 20 threads. It still happened but not often when less than 20 threads.
     This issue happened when we use libcurl/libssh2/openssl in multi-threads. It do not appear in single thread.
   
     The pstack as the following always occur when coredump happed

     ----------------- lwp# 22 / thread# 22 --------------------
 00000000 ???????? (1d8310, ff0f9fd8, 0, 19a278, 20f1a0, 10)
 ff0db23c crypt_encrypt (199f48, 19a278, 199fd4, 0, 6c, 1d8308) + 24
 ff0f4dc4 decrypt (199f48, 19a078, 20f1ab, 1c70, 0, 10) + d4
 ff0f58e8 _libssh2_transport_read (199f48, 1c84, ff0fb9d0, 10, 0, 19a068) + 6c0
 ff0d8ed4 _libssh2_channel_read (1dcdb0, 0, fd67b8f8, 4, 183e10, 0) + 8c
 ff0eb2d0 sftp_packet_read (1aa348, 65, 8, fd67b9f8, fd67ba0c, 199f48) + a8
 ff0eb840 sftp_packet_requirev (1aa348, 2, ff0faf8a, 8, fd67b9f8, fd67ba0c) + c0
 ff0ed290 sftp_read (1d4df0, 9564c, 4000, 0, 0, 0) + 2b0
 ff0ed4e8 libssh2_sftp_read (1d4df0, 9564c, 4000, 0, 0, 0) + 30
 ff177640 sftp_recv (1913a0, 0, 9564c, 4000, fd67bb80, 0) + 30
 ff138064 Curl_read (1913a0, 43, 9564c, 4000, fd67bc44, 0) + 27c
 ff153a70 readwrite_data (950f0, 1913a0, 95108, fd67bce4, fd67bd7c, 0) + 188
 ff154d60 Curl_readwrite (1913a0, fd67bd7c, 3e8, fd67bd80, 183e10, d) + 180
 ff155fe4 Transfer (1913a0, fd67be04, ff161258, ff161330, ff1347b0, fec35578) + 64c
 ff156974 Curl_do_perform (950f0, ff161258, ff161330, ff1347b0, febc1f74, fec303a8) + 174
 ff156d78 Curl_perform (950f0, 2711, fd67bf44, 29, fec34ef0, fec35578) + 20
 ff157adc curl_easy_perform (950f0, 2711, fd67bf68, 90d68, ff15791c, 75628) + 184
 00011af8 void*test_muti(void*) (42418, fd67c000, 0, 0, 11898, 1) + 260
 febc8a20 _lwp_start (0, 0, 0, 0, 0, 0)
.....
----------------- lwp# 27 / thread# 27 --------------------
 ff241688 OPENSSL_cleanse (1ca030, 400, 8, ff3702f4, 5fc, 400) + 38
 ff287ad4 BN_clear_free (1a399c, 0, 100, 0, 0, 1a399c) + 24
 ff288fc0 BN_CTX_free (20, 1a399c, 1ca030, 5e4, 6, 1dc668) + 48
 ff0dd9f0 diffie_hellman_sha1 (20a860, 1ab1b8, 1ab198, 100, 20, 21) + 1a98
 ff0ddf90 kex_method_diffie_hellman_group_exchange_sha1_key_exchange (20a860, 20eb08, 0, 1803c1, 0, 0) + 268
 ff0dfeb4 _libssh2_kex_exchange (20a860, 0, 20eafc, 0, 13, 1) + 35c
 ff0e8e38 session_startup (20a860, 3e, 1, 1, 0, 0) + 1e0
 ff0e90f0 libssh2_session_startup (20a860, 3e, fd17b2d7, 3a, 0, 80808080) + 10
 ff170f94 ssh_statemach_act (1c64d0, fd17bb34, 1c6800, ff1832b8, ff1832c4, 20f100) + a4
 ff176630 ssh_easy_statemach (1c64d0, 1, ff170970, 1c64d0, 2a33bf, 8) + 70
 ff176b08 ssh_connect (1c64d0, fd17bd80, 1c6598, fd17bc2c, fd17bca8, 0) + 258
 ff149200 Curl_protocol_connect (1c64d0, fd17bd80, fd17bca8, fd17baa4, ff0000, 80808080) + 130
 ff14c958 setup_conn (1c64d0, fd17bd80, fd17bd84, d, ffbffeff, fec35e80) + 168
 ff14cb8c Curl_connect (15c830, fd17be10, fd17bd84, fd17bd80, 35, d) + bc
 ff1563ec connect_host (15c830, fd17be10, ff17efb0, 4, fea8ca00, fec35578) + 44
 ff156870 Curl_do_perform (15c830, 12035, fec34ac0, 7cf9c, febc1f74, fec303a8) + 70
 ff156d78 Curl_perform (15c830, 2711, fd17bf44, 2c, fec34ef0, fec35578) + 20
 ff157adc curl_easy_perform (15c830, 2711, fd17bf68, 90d68, febac35c, 183268) + 184
 00011af8 void*test_muti(void*) (42440, fd17c000, 0, 0, 11898, 1) + 260
 febc8a20 _lwp_start (0, 0, 0, 0, 0, 0)
....

From the pstack, I guess when libssh2 call "diffie_hellman_sha1/BN_CTX_free", some resource was released but thread 22 still needed. So coredump happened.
Is it any clue? Thanks a lot.

My env is as the following:
1. SunOS 5.10 Generic_141444-09 sun4v sparc SUNW,T5440 (Solaris 10 for sparc, 2x8 cores, 16GB momery)
2. curl-7.21.0
3. libssh2-1.2.6
4. openssl-0.9.8o

I have tried curl-7.21.2 and libssh2-1.2.7 but problem still happened.

We have already build libssh2 and libcurl with debug symbols. Thanks.

Best Regards,
Xing

-----Original Message-----
From: owner-oambase-wtkframework_at_LIST.ALCATEL-LUCENT.COM [mailto:owner-oambase-wtkframework_at_LIST.ALCATEL-LUCENT.COM] On Behalf Of CHEN Xiaolei A
Sent: 2010年11月12日 10:20
To: oambase-wtkframework_at_list.alcatel-lucent.com
Subject: FW: multi-threads SFTP in libcurl

Daniel replied.

-----Original Message-----
From: curl-library-bounces_at_cool.haxx.se [mailto:curl-library-bounces_at_cool.haxx.se] On Behalf Of Daniel Stenberg
Sent: 2010年11月11日 20:26
To: libcurl development
Subject: RE: multi-threads SFTP in libcurl

On Thu, 11 Nov 2010, CHEN Xiaolei A wrote:

please do not top post, because people tend to read from top to bottom not the other way around!

> It cored in the following thread. It seems that libssh2 access already
> free memory when call crypt_encrypt.

... but why would libssh2 suddenly do something wrong when you go beyond 20 threads and work fine until then?

> ----------------- lwp# 2 / thread# 2 -------------------- 00000000
> ???????? (1e5638, ff0ffd78, 0, 196310, 0, 0) ff0dce7c crypt_encrypt
> (1956a8, 196310, 195734, 0, 0, 1e5630) + 24 ff0fb6ac decrypt (1956a8,
> 1957d0, 1d12fb, 23a0, 3ffc, 1957d0) + d4 ff0fc2d0
> _libssh2_transport_read (1956a8, 23b0, ff1011e8, 10, 0, 1957d0) + 6e0
> ff0da5a0 _libssh2_channel_read (1a4460, 0, fea7b8f8, 4, 1809f8, 0) +
> 88 ff0ef870 sftp_packet_read (1a7898, 65, 3, fea7b9f8, fea7ba0c,
> 1956a8) + a8 ff0eff30 sftp_packet_requirev (1a7898, 2, ff1007e2, 3,
> fea7b9f8, fea7ba0c) + c0 ff0f1f3c sftp_read (1dbcb0, b4efc, 4000, 0,
> 0, 0) + 34c
> ff0f2298 libssh2_sftp_read
>
> Is there anything more we can do?

Yes, first you build libssh2 and libcurl with debug symbols present so that backtraces make sense.

Then you need to start narrowing things down. Like do all the 20 threads need to transfer things for the bug to happen? Does the 20 threads all succeed in what they are set out to do? Lots of SSH/SFTP servers limit the number of simultaneous accesses they allow.

An additional experient would be to build libssh2 with gcrypt and have a go with that and see if it fails in a similar way (it does require its own set of mutex callbacks), as then we can truly suspect that the problem is outside of the crypto layers.

--
  / daniel.haxx.se
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
**********************************************************************
To unsubscribe: <mailto:oambase-wtkframework-unsubscribe-request_at_list.alcatel-lucent.com>
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette:  http://curl.haxx.se/mail/etiquette.html
Received on 2010-11-12