Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

socketpair: fix potential hangs on Windows #7144

Conversation

pgroke-dt
Copy link
Contributor

Fixes potential hang in accept by using select + non-blocking accept.
Fixes potential hang in peer check by replacing the send/recv check with a getsockname/getpeername check.
Adds length check for returned sockaddr data.


We have, very rarely, seen Curl_socketpair hang in accept for more than 20 minutes (at which time our watchdog would write a minidump and restart the application). The callstacks were not always 100% identical but they all had in common that Curl_socketpair hangs in accept. Unfortunately we were not able to reproduce those hangs locally. After inspection of the code I found two weaknesses:

  1. AFAIU accept could block forever, for example if the OS drops the connection from the backlog for whatever reason. Also from my tests it looks like loopback connections really go through at least large parts of the TCP stack on Windows, so I think that other scenarios could also be possible. Maybe Windows drops loopback packets in certain situations? Or some kind of DoS guard in the OS kicks in even for loopback connections? IMO the easiest (and only) fix for this is to use a timeout with accept, so that's what I decided to do (select followed by non-blocking accept). Re. the use of select: Using select for a single socket in Windows-only code should not be an issue. The dreaded select issue with high FDs on POSIX systems doesn't affect Windows: the only limitation there is the number of sockets in an fd_set, the value of the socket handles doesn't matter.

  2. The send+recv based peer validation could also lead to hangs. If a "foreign" socket connects first, trying to recv from the accepted connection could block forever. I decided to not use non-blocking IO here but replace the check with a getsockname + getpeername check instead -- because it's both easier and more secure. We have never seen hangs in recv but I thought if I'm already working on a patch for Curl_socketpair I should probably address that too.

I also thought it could make sense to add a check for the returned addrlen so I did that as well. (Might be a little paranoid, but I think it can't hurt.)

Fixes potential hang in accept by using select + non-blocking accept.
Fixes potential hang in peer check by replacing the send/recv check with a getsockname/getpeername check.
Adds length check for returned sockaddr data.
@bagder bagder added the Windows Windows-specific label May 28, 2021

/* use non-blocking accept to make sure we don't block forever */
if(ioctlsocket(listener, FIONBIO, &nonblock) == -1)
goto error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably use curlx_nonblock() instead for greater portability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see a point in using curlx_nonblock in Windows-only code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, just realized my mistake, see #7144 (comment)

lib/socketpair.c Outdated
FD_SET(listener, &fds);
/* this is Windows-only code -> using select is fine regardless
of the _value_ of the socket descriptor, also nfds is ignored */
select(0, &fds, NULL, NULL, &timeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then shouldn't we #ifdef WIN32 this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I just realized my mistake: I assumed the #ifdef WIN32 in line 27 would span the entire file. But it doesn't.
The only part that keeps the function from being compiled on (our) other platforms is the #if !defined(HAVE_SOCKETPAIR) && !defined(CURL_DISABLE_SOCKETPAIR). But there might be other platforms - that we don't use but are supported by libcurl - which don't HAVE_SOCKETPAIR.

Hm. That makes things more difficult. Is there an easy, portable way to wait for "ready to accept" without using select? Do you know if poll is available on all supported non-Windows systems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ps: I tried skipping the select call entirely on Windows, and this causes sporadic failures. Sub 1% IIRC but still easily reproducible. So just not calling select is probably a bad idea. What I could do though is keep using blocking accept on other systems. It would be good enough for us. If you think that would be acceptable, that would of course be the easiest solution.

@pgroke-dt
Copy link
Contributor Author

Sorry for asking stupid questions without looking into the lincurl code myself. I fell into panic mode after I realized that I made that stupid mistake with the #ifdef WIN32.
I'll update the PR to use curlx_nonblock and Curl_poll with POLLIN. I think that should cover it.

@mback2k
Copy link
Member

mback2k commented May 31, 2021

I am wondering if this PR is still necessary with the latest curl release which avoids using socketpairs in the multi interface (on Win32).

@pgroke-dt
Copy link
Contributor Author

@mback2k I don't know :)
But I think if the function is there, it should not have the potential to block forever. So unless the function is removed completely, I would prefer merging it.

@pgroke-dt
Copy link
Contributor Author

ps: What about the threaded DNS resolver, does that still use socketpairs?

@bagder
Copy link
Member

bagder commented Jun 1, 2021

ps: What about the threaded DNS resolver, does that still use socketpairs?

Yes it does, also in the Windows build, right @mback2k ?

@pgroke-dt
Copy link
Contributor Author

pgroke-dt commented Jun 2, 2021

@bagder I have updated the PR to use curlx_nonblock and Curl_poll. Would you mind having a look at it again? Or should I squash it before people review it?

@mback2k
Copy link
Member

mback2k commented Jun 2, 2021

ps: What about the threaded DNS resolver, does that still use socketpairs?

Yes it does, also in the Windows build, right @mback2k ?

Yes, that seems to be the case. Also NTLM makes use of Curl_socketpair.

@bagder bagder closed this in c769d1e Jun 3, 2021
@bagder
Copy link
Member

bagder commented Jun 3, 2021

Thanks!

Copy link

@piru piru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to not use non-blocking IO here but replace the check with a getsockname + getpeername check instead -- because it's both easier and more secure.

Excuse me but how does the new code verify the identity of the connecting party?

getpeername of a accepted socket is always the getsockname of the listen socket, so now there is no verification of any kind.

@bagder
Copy link
Member

bagder commented Jun 3, 2021

ping @pgroke-dt

@bagder bagder reopened this Jun 3, 2021
@bagder
Copy link
Member

bagder commented Jun 3, 2021

Reopened for now so that we don't lose track of this.

@bagder
Copy link
Member

bagder commented Jun 3, 2021

To simplify, I'm reverting this PR

bagder added a commit that referenced this pull request Jun 3, 2021
This reverts commit c769d1e.

See #7144 for details
@piru
Copy link

piru commented Jun 3, 2021

The old code is insecure indeed in a way that address of the socks array isn't guaranteed to be totally random (it is mostly random with systems with ASLR, but if there is no ASLR it could be constant for multiple app invocations with identical parameters). Since we can't make such assumption about addresses being random what I suggest is to generate 16 bytes of cryptographically strong random, and then check for that same value can be read back. This is similar to old code, but the potential security issue fixed.

@bagder
Copy link
Member

bagder commented Jun 3, 2021

Curl_rand() should be cryptographically strong when possible.

@bagder bagder self-requested a review June 4, 2021 21:27
Copy link
Member

@bagder bagder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The anti-hang changes are good, but it needs adjustments in the way it proofs the other end.

@pgroke-dt
Copy link
Contributor Author

I decided to not use non-blocking IO here but replace the check with a getsockname + getpeername check instead -- because it's both easier and more secure.

Excuse me but how does the new code verify the identity of the connecting party?

getpeername of a accepted socket is always the getsockname of the listen socket, so now there is no verification of any kind.

Actually, no. getpeername always gives you the address that a socket is connected to. The accepted socket is not connected to the listening socket, it's connected to whoever made the incoming connection to the listening socket.

In the code, socks[0] is the connecting socket, see

if(connect(socks[0], &a.addr, sizeof(a.inaddr)) == -1)

socks[1] is the accepted socket, see
socks[1] = accept(listener, NULL, NULL);

Then the code compares getsockname(socks[0]) = the address of our connecting socket to getpeername(socks[1]) = the peer address of the accepted socket. If they are equal, it means the peer of the accepted connection was our connecting socket = what we want.

@pgroke-dt
Copy link
Contributor Author

So, as I can not see anything wrong with the way the connection is verified and it's a lot simpler and also more secure than sending random numbers, I'd suggest to merge the PR as is. Unless there are other issues of course.

@piru
Copy link

piru commented Jun 5, 2021

@pgroke-dt

You're absolutely correct! The verification code is working correctly, as you say.

@pgroke-dt @bagder

Sorry for the trouble. I agree that it can be merged.

@bagder
Copy link
Member

bagder commented Jun 5, 2021

ok, thanks a lot for the extra round of verification and sorry for the trouble!

@bagder bagder closed this in 0a51355 Jun 5, 2021
@pgroke-dt
Copy link
Contributor Author

@piru @bagder
Better to err on the side of caution than to overlook a mistake!
Thanks for your help & reviews and for accepting the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Windows Windows-specific
Development

Successfully merging this pull request may close these issues.

None yet

4 participants