Buy commercial curl support from WolfSSL. We help you work
out your issues, debug your libcurl applications, use the API, port to new
platforms, add new features and more. With a team lead by the curl founder
himself.
multi_wait blocks on sread (recv)
- Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]
From: William Smith via curl-library <curl-library_at_lists.haxx.se>
Date: Wed, 8 Sep 2021 18:15:33 -0400
Hello,
I recently upgraded curl from version 7.50 to 7.78 on RHEL6 using openldap
and openssl. After the upgrade, a long standing in-house library of ours
that uses libcurl to access a credential server and an ldap server started
blocking in the sread call in multi_wait.
Context: A multithreaded java webservice calls this in-house library which
returns access credentials for each user accessing the service. The
webservice handles thousands of users and up to 15 or so calls per second
at peak. The curl calls to both the credential server and the ldap server
are protected by their own individual mutexes so that only one credential
call and only one ldap call can happen at a time.
I fixed the block before the sread in multi_wait by using a
nonblocking pselect before the sread. This is not a complete solution in
that it isn't adequate for all operating systems. It may also not be
the "curl" way of doing things, but I confess that I don't know
the codebase so this was the quickest option. The blocking situation seems
only to occur for the secure ldap calls. It doesn't seem like it's the
same problem about openldap listed in the "known bugs" in the release notes
for 7.78. Rolling back the curl version to 7.50, which uses the same
openldap, fixes the problem so I'm inclined to think the issue has nothing
to do with openldap.
The infinite while loop around the sread call depends on an external
condition, which seems pretty weak to me. The external condition is that
the socket is nonblocking. I added an fcntl call to get the flags to test
for the O_NONBLOCK flag. It turns out that it's not always set. In the
case when O_NONBLOCK is not set, the sread (recv) blocks indefinitely. In
addition, I'm seeing EBADF errors from the fcntl and pselect calls, and
ENOTSOCK from the sread (recv) call. Since I'm not familiar with the code
base, I don't understand how it's possible to get EBADF, ENOTSOCK, or a
blocking socket in multi_wait. Maybe there's an async connect going on and
it's not fully configured before calling multi_wait. I have no idea.
In addition, the overall timeout I have for the curl call (CURLOPT_TIMEOUT)
is ignored. The partial exception to this statement is if I set up a
progress function. If I set up a progress function that always returns the
"continue" flag, I'll see a trace from libcurl that says "Operation timed
out after 10000ms". I have a 10s timeout. But, instead of breaking out of
the blocking sread, I get a SEGV in ldapsb_tls_write (openldap.c), at this
line:
ret = (li->send)(data, FIRSTSOCKET, buf, len, &err)
li->send is NULL, resulting in the SEGV.
Adding "if (li && li->send)" in front of this line fixed the problem.
BTW, the same problem occurs if the progress function returns something
other than the "continue" flag. I get a trace saying "Operation was
aborted by an application callback", followed by the SEGV.
Again, the SEGV only happens if I set up a progress function. Adding the
check for li->send fixes the SEGV.
Both of these problems are fixed for me. I'm happy to discuss in more
detail if necessary.
Regards,
Bill Smith
Date: Wed, 8 Sep 2021 18:15:33 -0400
Hello,
I recently upgraded curl from version 7.50 to 7.78 on RHEL6 using openldap
and openssl. After the upgrade, a long standing in-house library of ours
that uses libcurl to access a credential server and an ldap server started
blocking in the sread call in multi_wait.
Context: A multithreaded java webservice calls this in-house library which
returns access credentials for each user accessing the service. The
webservice handles thousands of users and up to 15 or so calls per second
at peak. The curl calls to both the credential server and the ldap server
are protected by their own individual mutexes so that only one credential
call and only one ldap call can happen at a time.
I fixed the block before the sread in multi_wait by using a
nonblocking pselect before the sread. This is not a complete solution in
that it isn't adequate for all operating systems. It may also not be
the "curl" way of doing things, but I confess that I don't know
the codebase so this was the quickest option. The blocking situation seems
only to occur for the secure ldap calls. It doesn't seem like it's the
same problem about openldap listed in the "known bugs" in the release notes
for 7.78. Rolling back the curl version to 7.50, which uses the same
openldap, fixes the problem so I'm inclined to think the issue has nothing
to do with openldap.
The infinite while loop around the sread call depends on an external
condition, which seems pretty weak to me. The external condition is that
the socket is nonblocking. I added an fcntl call to get the flags to test
for the O_NONBLOCK flag. It turns out that it's not always set. In the
case when O_NONBLOCK is not set, the sread (recv) blocks indefinitely. In
addition, I'm seeing EBADF errors from the fcntl and pselect calls, and
ENOTSOCK from the sread (recv) call. Since I'm not familiar with the code
base, I don't understand how it's possible to get EBADF, ENOTSOCK, or a
blocking socket in multi_wait. Maybe there's an async connect going on and
it's not fully configured before calling multi_wait. I have no idea.
In addition, the overall timeout I have for the curl call (CURLOPT_TIMEOUT)
is ignored. The partial exception to this statement is if I set up a
progress function. If I set up a progress function that always returns the
"continue" flag, I'll see a trace from libcurl that says "Operation timed
out after 10000ms". I have a 10s timeout. But, instead of breaking out of
the blocking sread, I get a SEGV in ldapsb_tls_write (openldap.c), at this
line:
ret = (li->send)(data, FIRSTSOCKET, buf, len, &err)
li->send is NULL, resulting in the SEGV.
Adding "if (li && li->send)" in front of this line fixed the problem.
BTW, the same problem occurs if the progress function returns something
other than the "continue" flag. I get a trace saying "Operation was
aborted by an application callback", followed by the SEGV.
Again, the SEGV only happens if I set up a progress function. Adding the
check for li->send fixes the SEGV.
Both of these problems are fixed for me. I'm happy to discuss in more
detail if necessary.
Regards,
Bill Smith
-- Unsubscribe: https://lists.haxx.se/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.htmlReceived on 2021-09-09