Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random memory leak while calling curl_multi_cleanup #7683

Closed
slydder101 opened this issue Sep 7, 2021 · 24 comments
Closed

Random memory leak while calling curl_multi_cleanup #7683

slydder101 opened this issue Sep 7, 2021 · 24 comments

Comments

@slydder101
Copy link

slydder101 commented Sep 7, 2021

Random memory leak is detected when calling curl_multi_cleanup before the associated easy handles are closed.

Steps to reproduce.

Build the following sample multi_app application:

#include <stdio.h>
#include <string.h>

/* somewhat unix-specific */
#include <sys/time.h>
#include <unistd.h>

/* curl stuff */
#include <curl/curl.h>

#define HANDLECOUNT 2   /* Number of simultaneous transfers */

/* The maximum number of curl_multi_perform iterations to perform before
 * calling curl_multi_cleanup.
 */
#define MAX_ITERATIONS 14

int main(void)
{
    CURL *handles[HANDLECOUNT];
    CURLM *multi_handle;

    int still_running = 1; /* keep number of running handles */
    int i;
    int iteration = 0;

    CURLMsg *msg; /* for picking up messages with the transfer status */
    int msgs_left; /* how many messages are left */

    /* Allocate one CURL handle per transfer */
    for (i = 0; i < HANDLECOUNT; i++) {
        handles[i] = curl_easy_init();
        curl_easy_setopt(handles[i], CURLOPT_URL, "https://example.com");
    }

    /* init a multi stack */
    multi_handle = curl_multi_init();

    /* add the individual transfers */
    for(i = 0; i < HANDLECOUNT; i++)
        curl_multi_add_handle(multi_handle, handles[i]);

    while (still_running) {
        CURLMcode mc = curl_multi_perform(multi_handle, &still_running);

        if (iteration++ >= MAX_ITERATIONS)
            goto out;

        if (still_running)
            /* wait for activity, timeout or "nothing" */
            mc = curl_multi_poll(multi_handle, NULL, 0, 1000, NULL);

        if (mc)
            break;
    }

    /* Free the CURL handles */
    for(i = 0; i < HANDLECOUNT; i++) {
        curl_multi_remove_handle(multi_handle, handles[i]);
        curl_easy_cleanup(handles[i]);
    }

    curl_multi_cleanup(multi_handle);

    return 0;
}

Run the application with valgrind using the following shell script:

#!/bin/sh

while valgrind -q --error-exitcode=1 --leak-check=full ./multi_app > /dev/null; do
    echo "Retrying..."
done

Depending on your computer's speed the leak will appear after a few tries:

Retrying...
==146245== 20,162 (1,280 direct, 18,882 indirect) bytes in 2 blocks are definitely lost in loss record 81 of 81
==146245==    at 0x483E7C5: malloc (vg_replace_malloc.c:380)
==146245==    by 0x495877A: ssl_session_dup (ssl_sess.c:110)
==146245==    by 0x496ABD3: tls_process_new_session_ticket (statem_clnt.c:2620)
==146245==    by 0x4965D8C: read_state_machine (statem.c:636)
==146245==    by 0x4965D8C: state_machine.part.0 (statem.c:434)
==146245==    by 0x493C729: ssl3_read_bytes (rec_layer_s3.c:1658)
==146245==    by 0x4943765: ssl3_read_internal (s3_lib.c:4465)
==146245==    by 0x494EE17: SSL_read (ssl_lib.c:1783)
==146245==    by 0x48E5D5B: ossl_closeone.isra.0 (openssl.c:1407)
==146245==    by 0x48E5DE5: ossl_close (openssl.c:1427)
==146245==    by 0x48ED11E: Curl_ssl_close (vtls.c:676)
==146245==    by 0x48DA583: conn_shutdown (url.c:752)
==146245==    by 0x48DA583: Curl_disconnect (url.c:870)
==146245==    by 0x4889D92: Curl_conncache_close_all_connections (conncache.c:555)

Environment

  • curl 7.78
  • openSSL 1.1.1.k and openSSL 1.1.1.l

Notes

The memory leak appeared starting from curl 7.78 with the following commit
b249592d29ae0a2b3e8e07fdbc01f33b5a5b8420.
For know I have managed to reproduce it only with openSSL backend and not with
mbedTLS.

To reproduce the issue there needs to be at least two simultaneous connections
and the cleanup operations need to be close in time to exit.

@bagder
Copy link
Member

bagder commented Sep 7, 2021

I fixed the example and increased parallelism (see below) but it doesn't seem to reproduce for me with git master and OpenSSL/1.1.1l. It does reproduce pretty easily with 7.78.0.

I thus conclude that this bug doesn't exist anymore.

--- issue7683.c~        2021-09-07 15:26:57.346871588 +0200
+++ issue7683.c 2021-09-07 15:29:29.956228837 +0200
@@ -8,7 +8,7 @@
 /* curl stuff */
 #include <curl/curl.h>
 
-#define HANDLECOUNT 2   /* Number of simultaneous transfers */
+#define HANDLECOUNT 20   /* Number of simultaneous transfers */
 
 /* The maximum number of curl_multi_perform iterations to perform before
  * calling curl_multi_cleanup.
@@ -44,7 +44,7 @@
         CURLMcode mc = curl_multi_perform(multi_handle, &still_running);
 
         if (iteration++ >= MAX_ITERATIONS)
-            goto out;
+            break;
 
         if (still_running)
             /* wait for activity, timeout or "nothing" */

@slydder101
Copy link
Author

Sorry for the goto typo.

I have also tested the leak using the master branch and run the test with 20 parallel connections.
The leak reproduced the same as with 7.78.0.
I have done the following steps:

$ git clone https://github.com/curl/curl.git
$ cd curl
$ autoreconf -fi
$ ./configure --enable-shared --disable-static --with-openssl --prefix=/home/demo/test/curl_memleak
$ make -j9
$ make install

Then from the test folder I have run the following script:

#!/bin/sh

while LD_LIBRARY_PATH=lib valgrind -q --error-exitcode=1 --leak-check=full ./multi_app > /dev/null; do
    echo "Retrying..."
done

The output was:

$ ./memleak.sh
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
==225478== 80,648 (5,120 direct, 75,528 indirect) bytes in 8 blocks are definitely lost in loss record 81 of 81
==225478==    at 0x483E7C5: malloc (vg_replace_malloc.c:380)
==225478==    by 0x494C77A: ssl_session_dup (ssl_sess.c:110)
==225478==    by 0x495EBD3: tls_process_new_session_ticket (statem_clnt.c:2620)
==225478==    by 0x4959D8C: read_state_machine (statem.c:636)
==225478==    by 0x4959D8C: state_machine.part.0 (statem.c:434)
==225478==    by 0x4930729: ssl3_read_bytes (rec_layer_s3.c:1658)
==225478==    by 0x4937765: ssl3_read_internal (s3_lib.c:4465)
==225478==    by 0x4942E17: SSL_read (ssl_lib.c:1783)
==225478==    by 0x48B909A: ossl_closeone.isra.0 (in /home/demo/test/curl_memleak/lib/libcurl.so.4.7.0)
==225478==    by 0x48B9111: ossl_close (in /home/demo/test/curl_memleak/lib/libcurl.so.4.7.0)
==225478==    by 0x48C035A: Curl_ssl_close (in /home/demo/test/curl_memleak/lib/libcurl.so.4.7.0)
==225478==    by 0x48AE900: Curl_disconnect (in /home/demo/test/curl_memleak/lib/libcurl.so.4.7.0)
==225478==    by 0x48627A8: Curl_conncache_close_all_connections (in /home/demo/test/curl_memleak/lib/libcurl.so.4.7.0)
==225478==

@bagder
Copy link
Member

bagder commented Sep 8, 2021

I just can't seem to make master leak.

Your new valgrind output lacks debug symbols in libcurl but I presume it is the same call. This one:

(void)SSL_read(backend->handle, buf, (int)sizeof(buf));

The leak is also clearly visible to be an allocation done within OpenSSL and to me it looks like that memory should be freed just a few lines further down in the libcurl code:

SSL_free(backend->handle);

@slydder101
Copy link
Author

I investigated further more and it is very strange what is happening. I
performed tests on various independent computers with different distros.

I attached the archive curl-memleak.zip with the source code and small shell scripts used for the
reports below (just execute make.sh and run.sh afterwards).

Conclusions:

  1. On Gentoo the issue does appear with master:

     Gentoo Base System release 2.7
     Linux 5.10.61-gentoo #1 SMP Mon Aug 30 15:26:16 EEST 2021 x86_64 Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz GenuineIntel GNU/Linux
     ldd (Gentoo 2.33-r1 p4) 2.33
     gcc (Gentoo 10.3.0-r2 p3) 10.3.0
     binutils (GNU ld (Gentoo 2.36.1 p5) 2.36.1)
     OpenSSL 1.1.1l  24 Aug 2021
    
     ldd ./multi_app
             linux-vdso.so.1 (0x00007ffd811ca000)
             libcurl.so.4 => /home/demo/test/curl-memleak/lib/libcurl.so.4 (0x00007fafed67a000)
             libc.so.6 => /lib64/libc.so.6 (0x00007fafed4b0000)
             libnghttp2.so.14 => /usr/lib64/libnghttp2.so.14 (0x00007fafed483000)
             libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00007fafed460000)
             libpsl.so.5 => /usr/lib64/libpsl.so.5 (0x00007fafed44d000)
             libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00007fafed3ba000)
             libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007fafed0f9000)
             libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x00007fafecfed000)
             libbrotlidec.so.1 => /usr/lib64/libbrotlidec.so.1 (0x00007fafecfdf000)
             libz.so.1 => /lib64/libz.so.1 (0x00007fafecfc5000)
             libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fafecfa5000)
             /lib64/ld-linux-x86-64.so.2 (0x00007fafed781000)
             libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007fafece21000)
             libdl.so.2 => /lib64/libdl.so.2 (0x00007fafece19000)
             libbrotlicommon.so.1 => /usr/lib64/libbrotlicommon.so.1 (0x00007fafecdf6000)
    
     ==27399== 120,684 (7,680 direct, 113,004 indirect) bytes in 12 blocks are definitely lost in loss record 72 of 72
     ==27399==    at 0x483F7E5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
     ==27399==    by 0x4BC0F94: ssl_session_dup (ssl_sess.c:110)
     ==27399==    by 0x4BD2B56: tls_process_new_session_ticket (statem_clnt.c:2620)
     ==27399==    by 0x4BCE028: read_state_machine (statem.c:636)
     ==27399==    by 0x4BCE028: state_machine.part.0 (statem.c:434)
     ==27399==    by 0x4BA6004: ssl3_read_bytes (rec_layer_s3.c:1658)
     ==27399==    by 0x4BACD64: ssl3_read_internal (s3_lib.c:4465)
     ==27399==    by 0x4BB7DE2: SSL_read (ssl_lib.c:1783)
     ==27399==    by 0x48F1766: ossl_closeone (openssl.c:1407)
     ==27399==    by 0x48F1829: ossl_close (openssl.c:1427)
     ==27399==    by 0x48FB3B1: Curl_ssl_close (vtls.c:676)
     ==27399==    by 0x48DE164: conn_shutdown (url.c:752)
     ==27399==    by 0x48DE801: Curl_disconnect (url.c:870)
     ==27399==
    
  2. On Manjaro the issue does appear with master:

     Manjaro Linux
     Linux 4.19.205-1-MANJARO #1 SMP Thu Aug 26 20:42:05 UTC 2021 x86_64 unknown unknown GNU/Linux
     ldd (GNU libc) 2.33
     gcc (GCC) 11.1.0
     binutils (GNU ld (GNU Binutils) 2.36.1)
     OpenSSL 1.1.1l  24 Aug 2021
    
     ldd ./multi_app
         linux-vdso.so.1 (0x00007ffdb7bd8000)
         libcurl.so.4 => /home/demo/test/curl-memleak/lib/libcurl.so.4 (0x00007f5544f98000)
         libc.so.6 => /usr/lib/libc.so.6 (0x00007f5544da4000)
         libnghttp2.so.14 => /usr/lib/libnghttp2.so.14 (0x00007f5544d78000)
         libidn2.so.0 => /usr/lib/libidn2.so.0 (0x00007f5544d56000)
         librtmp.so.1 => /usr/lib/librtmp.so.1 (0x00007f5544d37000)
         libpsl.so.5 => /usr/lib/libpsl.so.5 (0x00007f5544d24000)
         libssl.so.1.1 => /usr/lib/libssl.so.1.1 (0x00007f5544c8f000)
         libcrypto.so.1.1 => /usr/lib/libcrypto.so.1.1 (0x00007f55449a9000)
         libldap-2.4.so.2 => /usr/lib/libldap-2.4.so.2 (0x00007f554495a000)
         liblber-2.4.so.2 => /usr/lib/liblber-2.4.so.2 (0x00007f5544948000)
         libzstd.so.1 => /usr/lib/libzstd.so.1 (0x00007f5544839000)
         libbrotlidec.so.1 => /usr/lib/libbrotlidec.so.1 (0x00007f554482b000)
         libz.so.1 => /usr/lib/libz.so.1 (0x00007f554480f000)
         libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f55447ee000)
         /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f55450a4000)
         libunistring.so.2 => /usr/lib/libunistring.so.2 (0x00007f554466c000)
         libgnutls.so.30 => /usr/lib/libgnutls.so.30 (0x00007f554446b000)
         libhogweed.so.6 => /usr/lib/libhogweed.so.6 (0x00007f5544422000)
         libnettle.so.8 => /usr/lib/libnettle.so.8 (0x00007f55443db000)
         libgmp.so.10 => /usr/lib/libgmp.so.10 (0x00007f5544339000)
         libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f5544332000)
         libresolv.so.2 => /usr/lib/libresolv.so.2 (0x00007f5544318000)
         libsasl2.so.3 => /usr/lib/libsasl2.so.3 (0x00007f55442fa000)
         libbrotlicommon.so.1 => /usr/lib/libbrotlicommon.so.1 (0x00007f55442d7000)
         libp11-kit.so.0 => /usr/lib/libp11-kit.so.0 (0x00007f55441a0000)
         libtasn1.so.6 => /usr/lib/libtasn1.so.6 (0x00007f554418a000)
         libffi.so.7 => /usr/lib/libffi.so.7 (0x00007f554417e000)
    
     Retrying...
     ==28242== 60,486 (3,840 direct, 56,646 indirect) bytes in 6 blocks are definitely lost in loss record 81 of 81
     ==28242==    at 0x483E7C5: malloc (vg_replace_malloc.c:380)
     ==28242==    by 0x4C0B77A: ssl_session_dup (ssl_sess.c:110)
     ==28242==    by 0x4C1DBD3: tls_process_new_session_ticket (statem_clnt.c:2620)
     ==28242==    by 0x4C18D8C: read_state_machine (statem.c:636)
     ==28242==    by 0x4C18D8C: state_machine.part.0 (statem.c:434)
     ==28242==    by 0x4BEF729: ssl3_read_bytes (rec_layer_s3.c:1658)
     ==28242==    by 0x4BF6765: ssl3_read_internal (s3_lib.c:4465)
     ==28242==    by 0x4C01E17: SSL_read (ssl_lib.c:1783)
     ==28242==    by 0x48F5482: ossl_closeone (openssl.c:1407)
     ==28242==    by 0x48F5545: ossl_close (openssl.c:1427)
     ==28242==    by 0x48FF284: Curl_ssl_close (vtls.c:676)
     ==28242==    by 0x48E18B9: conn_shutdown (url.c:752)
     ==28242==    by 0x48E1F86: Curl_disconnect (url.c:870)
     ==28242==
    

    Important! I had cases on a laptop with the same Manjaro, that I had to
    leave ~10-15 minutes the run.sh script to run, in order to reproduce the
    problem.

  3. On the other hand under Fedora 34, Ubuntu 20.04 LTS and Debian 10 the issue
    cannot be reproduced nor with master nor with tag curl-7_78_0. On all
    systems the run.sh was left to run ~2 hours.

    Here are the system information that were tested:

     Fedora release 34 (Thirty Four)
     Linux 5.13.13-200.fc34.x86_64 #1 SMP Thu Aug 26 17:06:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
     ldd (GNU libc) 2.33
     gcc (GCC) 11.2.1 20210728 (Red Hat 11.2.1-1)
     binutils (GNU ld version 2.35.2-5.fc34)
     OpenSSL 1.1.1l  FIPS 24 Aug 2021
    
     Ubuntu 20.04.2 LTS \n \l
     Linux 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
     ldd (Ubuntu GLIBC 2.31-0ubuntu9.2) 2.31
     gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
     binutils (GNU ld (GNU Binutils for Ubuntu) 2.34)
     OpenSSL 1.1.1f  31 Mar 2020
    
     Debian GNU/Linux 10 \n \l
     Linux 4.19.0-17-amd64 #1 SMP Debian 4.19.194-3 (2021-07-18) x86_64 unknown unknown GNU/Linux
     ldd (Debian GLIBC 2.28-10) 2.28
     gcc (Debian 8.3.0-6) 8.3.0
     binutils (GNU ld (GNU Binutils for Debian) 2.31.1)
     OpenSSL 1.1.1d  10 Sep 2019
    
  4. Can you please try the attached make.sh & run.sh script and let it run
    for ~1-2 hours please? If the issue reproduced for you with 7.78 than I
    believe it should reproduce also with master.

  5. I wander if could it be possible to be some default compile/linking option
    difference between these distros that may trigger the memory leak?

Thank you for your time.

@bszente
Copy link

bszente commented Sep 16, 2021

Hello @bagder ! After some investigation, I found the "key" configure option that makes the memory leak to appear. It is --with-nghttp2.

In the previous post, on my Gentoo system (that my colleague used) the memory leak did appear. However on the very same machine building libcurl --without-nghttp2 makes the memory leak to disappear.

Likewise, installing on Debian and Fedora and Ubuntu the libnghttp2-dev package and building curl --with-nghttp2 (or leaving just the autodetect to enable it) makes the memory leak to appear with master.

As a conclusion, the memory leak does appear on any Linux machine (in a few iterations using the run.sh script) with:

  • using master and
  • using OpenSSL (--with-openssl) and
  • using nghttp2 (--with-nghttp2)

I hope this information sheds some light on the issue.

@bagder
Copy link
Member

bagder commented Sep 16, 2021

Not really. I always use openssl and nghttp2 in my builds and this is what it says for me right now:

Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...
Retrying...

build details

curl 7.79.1-DEV (x86_64-pc-linux-gnu) libcurl/7.79.1-DEV OpenSSL/1.1.1l zlib/1.2.11 brotli/1.0.9 zstd/1.4.8 c-ares/1.17.0 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.9.0_DEV nghttp2/1.45.0-DEV librtmp/2.3 libgsasl/1.10.0 OpenLDAP/2.4.59
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps ldap ldaps mqtt pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli Debug gsasl GSS-API HSTS HTTP2 HTTPS-proxy IDN IPv6 Kerberos Largefile libz NTLM NTLM_WB PSL SPNEGO SSL TLS-SRP TrackMemory UnixSockets zstd

@bszente
Copy link

bszente commented Sep 16, 2021

This is weird. I made a build with nghttp2 master as well, and the leak does appear:

Gentoo Base System release 2.7
Linux 5.10.61-gentoo #1 SMP Mon Aug 30 15:26:16 EEST 2021 x86_64 Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz GenuineIntel GNU/Linux
ldd (Gentoo 2.33-r1 p4) 2.33
gcc (Gentoo 10.3.0-r2 p3) 10.3.0
binutils (GNU ld (Gentoo 2.36.1 p5) 2.36.1)
OpenSSL 1.1.1l  24 Aug 2021

ldd ./multi_app
        linux-vdso.so.1 (0x00007ffeb7b2a000)
        libcurl.so.4 => /home/user/git/curl-memleak/lib/libcurl.so.4 (0x00007f6b3465a000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f6b34490000)
        libnghttp2.so.14 => /home/user/git/curl-memleak/lib/libnghttp2.so.14 (0x00007f6b34464000)
        libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00007f6b34441000)
        libpsl.so.5 => /usr/lib64/libpsl.so.5 (0x00007f6b3442e000)
        libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00007f6b3439b000)
        libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00007f6b340da000)
        libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x00007f6b33fce000)
        libbrotlidec.so.1 => /usr/lib64/libbrotlidec.so.1 (0x00007f6b33fc0000)
        libz.so.1 => /lib64/libz.so.1 (0x00007f6b33fa6000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f6b33f86000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f6b34761000)
        libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00007f6b33e02000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f6b33dfa000)
        libbrotlicommon.so.1 => /usr/lib64/libbrotlicommon.so.1 (0x00007f6b33dd7000)

Retrying...
==13863== 221,254 (14,080 direct, 207,174 indirect) bytes in 22 blocks are definitely lost in loss record 72 of 72
==13863==    at 0x483F7E5: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==13863==    by 0x4BBFF94: ssl_session_dup (ssl_sess.c:110)
==13863==    by 0x4BD1B56: tls_process_new_session_ticket (statem_clnt.c:2620)
==13863==    by 0x4BCD028: read_state_machine (statem.c:636)
==13863==    by 0x4BCD028: state_machine.part.0 (statem.c:434)
==13863==    by 0x4BA5004: ssl3_read_bytes (rec_layer_s3.c:1658)
==13863==    by 0x4BABD64: ssl3_read_internal (s3_lib.c:4465)
==13863==    by 0x4BB6DE2: SSL_read (ssl_lib.c:1783)
==13863==    by 0x48F18E4: ossl_closeone (openssl.c:1407)
==13863==    by 0x48F19A7: ossl_close (openssl.c:1427)
==13863==    by 0x48FB52F: Curl_ssl_close (vtls.c:676)
==13863==    by 0x48DE288: conn_shutdown (url.c:752)
==13863==    by 0x48DE925: Curl_disconnect (url.c:870)
==13863==

Almost identical dependencies as yours:

curl 7.79.1-DEV (x86_64-pc-linux-gnu) libcurl/7.79.1-DEV OpenSSL/1.1.1l zlib/1.2.11 brotli/1.0.9 zstd/1.5.0 libidn2/2.3.2 libpsl/0.21.0 (+libidn2/2.3.0) nghttp2/1.45.0-DEV
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher gophers http https imap imaps mqtt pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: alt-svc AsynchDNS brotli Debug HSTS HTTP2 HTTPS-proxy IDN IPv6 Largefile libz NTLM NTLM_WB PSL SSL TLS-SRP TrackMemory UnixSockets zstd

@bagder would you mind testing a vanilla Debian 11 QEMU image or a vanilla Debian Docker image (script), if I create one for you, so you would not waste your time by creating it? Which one would you prefer? Thanks.

@bagder
Copy link
Member

bagder commented Sep 16, 2021

I'm not sure I need to be able to reproduce this. As mentioned already I think this rather looks like an OpenSSL memory-leak. I don't see that libcurl does anything wrong but openssl leaks that memory. Don't you agree?

@bszente
Copy link

bszente commented Sep 16, 2021

I'm not sure I need to be able to reproduce this. As mentioned already I think this rather looks like an OpenSSL memory-leak. I don't see that libcurl does anything wrong but openssl leaks that memory. Don't you agree?

The memory leak is definitely inside OpenSSL (session cache), however the cause is not clear. The fact is, the issue appeared with curl-7.78. Of course, this can mean any of the following:

  1. curl 7.78+ changes triggered the manifestation of an underlying bug in OpenSSL, which was already present in OpenSSL
  2. curl 7.78+ with OpenSSL manifests this memory leak for a total different technical reason (e.g. glibc, or any other toolchain or userspace differences)
  3. curl 7.78+ introduced a bug by not using OpenSSL properly

What bothers me is that we managed to reproduce on several machines with several different stable distributions. It can happen on the field. We have no machines where the issue does not reproduce and that is confusing me.

To answer your question: I agree with you. case 3 is only more a theoretical case, in practice it can be excluded as you wrote in your linked comment.

Let's close this issue with the following conclusions:

  1. libcurl does free all OpenSSL handles/buffers properly
  2. the changes mentioned in the issue description introduced in 7.78 applies to all SSL backends, and so far only OpenSSL is confirmed to have memory leaks
  3. the example program should be reduced only to OpenSSL, i.e. to reproduce the issue without libcurl, using only pure OpenSSL API and reporting this possible issue in that project. This would give the definitive answer, but due to the fact that our main case is with mBedTLS, we will not investigate this further for now.

Thank you very much for your input.

@bagder
Copy link
Member

bagder commented Sep 16, 2021

This problem is likely triggered by this change that was introduced in 7.78.0: b249592 which is why it isn't visible in earlier libcurl versions.

@bszente
Copy link

bszente commented Sep 16, 2021

One last note about investigation under QEMU 6, using vanilla Debian 11.

  • Binary built --with-nghttp2:

    • dual core: memory leak
    • single core: no leak
  • Binary built --without-nghttp2:

    • dual core: no leak
    • single core: no leak

Probably it is some kind of timing/threading/synchronization issue. Interesting is the --with-nghttp2 factor, though.

@bszente
Copy link

bszente commented Sep 16, 2021

@bagder I found the "missing component" of the reproduction: CPU speed. My computer is "too fast". If I run stress -c 32 to keep the CPU busy, I can avoid the memory leak even using --with-nghttp2. Your system most likely is "slow enough", hence the memory leak did not appear. In the moment when I killed the stress command, the memory leak instantly appeared on my system.

So the conditions for memory leak to appear:

curl master + OpenSSL (--with-openssl) +nghttp2 (--with-nghttp2) + and a "fast enough" CPU!

My desktop system is an i7-8700 CPU @ 3.20GHz for reference. Single core, or a slower CPU will make the memory leak to disappear.

@bagder
Copy link
Member

bagder commented Sep 16, 2021

dual core: memory leak
single core: no leak

All TLS handling done by (lib)curl is in the same single thread no matter if there are threads available...

@bszente
Copy link

bszente commented Sep 16, 2021

All TLS handling done by (lib)curl is in the same single thread no matter if there are threads available...

Sure, I was just documenting all my investigation results here, just in case when someone will encounter the same issue and wants to dig further (outside the libcurl scope). Soon curl 7.78 will be widespread on common distros so the exposure on "fast CPUs" will be higher, and the issue may reappear in future. I just can't get why nghttp2 is also required beside OpenSSL for the leak to manifest. Apparently it should not have anything to do with the issue, but it clearly affects the symptoms as I wrote in my previous comments.

From our point of view, you may close the issue. Thank you for your time.

@bagder
Copy link
Member

bagder commented Sep 16, 2021

the exposure on "fast CPUs" will be higher

That correlation is very weak and mostly a guesswork though. I ran my tests on a i7-3770K CPU @ 3.50GHz. I think there's another trigger.

@jay
Copy link
Member

jay commented Sep 17, 2021

@mkauf any ideas

@mkauf
Copy link
Contributor

mkauf commented Sep 17, 2021

Thank you for the test program. Valgrind reports these leaks on my system too. It has something to do with the callback function ossl_new_session_cb, which is set by SSL_CTX_sess_set_new_cb(). The leaks disappear if I replace the callbacks's code with "return 0". Maybe OpenSSL session information is leaked.

@bagder
Copy link
Member

bagder commented Sep 17, 2021

Maybe OpenSSL session information is leaked.

All the stack traces showing the leak that I've seen here have showed it involving tls_process_new_session_ticket, so it seems related to session tickets (and now with the added info, probably even related to having a callback set for when a new ticket arrives).

I don't see how curl is responsible for such memory or even can free such memory! If the memory leak would've been done by libcurl code in the callback, then valgrind would've pinpointed that as the culprit. I still think this smells like an OpenSSL memory-leak.

@bagder bagder added the TLS label Sep 17, 2021
@mkauf
Copy link
Contributor

mkauf commented Sep 21, 2021

The problem is this check in Curl_ssl_addsessionid():

  if(!data->state.session)
    return CURLE_OK;

So if the SSL session cache does not exist anymore, Curl_ssl_addsessionid() returns CURLE_OK. But this return code (also) indicates that the SSL session has been saved in curl's SSL session cache. Therefore ossl_new_session_cb() returns 1 in this case. With this return value, OpenSSL increments the reference counter for the SSL session, but the SSL session has not been saved in curl's SSL session cache -> memory leak.

Probably Curl_ssl_addsessionid() needs another output parameter to inform the caller whether the SSL session has actually been saved in the cache.

Maybe this low-risk patch can be considered for tomorrow's curl release:

diff --git a/lib/vtls/openssl.c b/lib/vtls/openssl.c
index 87f4b02b7..7e01c1e98 100644
--- a/lib/vtls/openssl.c
+++ b/lib/vtls/openssl.c
@@ -2509,7 +2509,7 @@ static int ossl_new_session_cb(SSL *ssl, SSL_SESSION *ssl_sessionid)
       }
     }
 
-    if(!incache) {
+    if(!incache && data->state.session) {
       if(!Curl_ssl_addsessionid(data, conn, isproxy, ssl_sessionid,
                                 0 /* unknown size */, sockindex)) {
         /* the session has been put into the session cache */
diff --git a/lib/vtls/schannel.c b/lib/vtls/schannel.c
index 722a937c4..a90f2e85e 100644
--- a/lib/vtls/schannel.c
+++ b/lib/vtls/schannel.c
@@ -1450,7 +1450,7 @@ schannel_connect_step3(struct Curl_easy *data, struct connectdata *conn,
         incache = FALSE;
       }
     }
-    if(!incache) {
+    if(!incache && data->state.session)) {
       result = Curl_ssl_addsessionid(data, conn, isproxy, BACKEND->cred,
                                      sizeof(struct Curl_schannel_cred),
                                      sockindex);

@bagder
Copy link
Member

bagder commented Sep 21, 2021

Maybe this low-risk patch can be considered for tomorrow's curl release

Thanks for the patch!

It's too late for tomorrow though I think. It's not even a PR yet and it seems like we might want to make that change to cover more/all TLS backends.

@slydder101 can you try it out and see if it fixes the leak for you?

@bszente
Copy link

bszente commented Sep 21, 2021

Thank you very much for the fix!

I confirm the patch fixes the leak on all the systems that we have with Gentoo, Fedora, Debian (those from the previous comments, where the leak appeared). Manjaro has to be confirmed by @slydder101 on his system.

we might want to make that change to cover more/all TLS backends.

@bagder do you think this leak may happen also with mBedTLS as well? I mean, is there any theoretical chance for it, but in practice we did not encountered it?

@bagder
Copy link
Member

bagder commented Sep 21, 2021

do you think this leak may happen also with mBedTLS as well

Now that @mkauf has done his awesome research and figured out why it happened, I think its a good time to make sure that the fix makes sure that there's no leak left in any of the TLS backends!

@slydder101
Copy link
Author

I can also confirm that the patch fixes the leak on Manjaro.

Thank you for your effort.

mkauf added a commit to mkauf/curl that referenced this issue Sep 22, 2021
…cache

On connection shutdown, a new TLS session ticket may arrive after the SSL
session cache has already been destructed. In this case, the new SSL session
cannot be added to the SSL session cache.

The callers of Curl_ssl_addsessionid() need to know whether the SSL session
has been added to the cache. If it has not been added, the reference counter
of the SSL session must not be incremented, or memory used by the SSL session
must be freed. This is now possible with the new output parameter "added" of
Curl_ssl_addsessionid().

Fixes curl#7683
@mkauf
Copy link
Contributor

mkauf commented Sep 22, 2021

Thank you for the feedback! I have created a pull request: #7752

@mkauf mkauf closed this as completed in 60738f3 Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants