Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ws: curl_ws_recv does not return from PING auto-responder #10615

Closed
marski opened this issue Feb 26, 2023 · 5 comments
Closed

ws: curl_ws_recv does not return from PING auto-responder #10615

marski opened this issue Feb 26, 2023 · 5 comments
Assignees

Comments

@marski
Copy link

marski commented Feb 26, 2023

I use libcurl 7.88.1 with multi interface and epoll to run a WebSocket client. When EPOLLIN is signalled for an associated fd, I call the curl_ws_recv function. It works correctly in typical cases, however after auto-responding to a PING frame, it does not return. Looking into the branch at lib/ws.c:485, the issue seems to be that the outer loop does not terminate. Returning with CURLE_AGAIN seems to solve the problem. Subsequent frames are also received correctly.

/* we handled the data part of the PING, advance over that */
wsp->stillb += nsent;
wsp->stillblen -= nsent;
return CURLE_AGAIN;
@bagder bagder self-assigned this Feb 26, 2023
@bagder
Copy link
Member

bagder commented Feb 27, 2023

Thanks for your report!

I am not convinced your proposed fix is the best one here. If libcurl has received more data into its buffer than just PING, returning CURLE_AGAIN might then make the application go back and wait for socket activity rather than also acting on the additional websocket frames already drained from the socket.

You say "the outer loop does not terminate" without this, which of course is a bug but I'm curious on what makes it possible to keep looping and I was hoping maybe you can help me understand this is if you can reproduce this problem?

When the autoping is handled, and it goes back to the outer loop, what makes it stick in the loop?

Line 432 if(!wsp->stillblen) { - presumably this does not match

Line 449: if(!wsp->frame.bytesleft) { - this should match, there was nothing left of the old frame

what next? how does it get stuck?

@marski
Copy link
Author

marski commented Feb 27, 2023

Thank you for your swift reply!

Being focused on the application code, I didn't step through the debug build of libcurl yet, relying so far on the verbose logs to try to resolve the problem. Therefore I'm just speculating and returning CURLE_AGAIN might not be correct indeed. However, the expected behavior would be that after responding to a PING frame and any other buffered frames, the control yields to the event loop, waiting for another EPOLLIN. Currently, this only happens if I get another non-PING frame from the server, which gets the curl_ws_recv unblocked.

After connecting to wss://socketsbay.com/wss/v2/1/demo/ public test server and waiting for a while, I get one or more PING frames which get logged as:

* reached /home/marski/Sources/curl/lib/easy.c:1231
* WS: 2 bytes left to decode
* WS:235 received FIN bit 1
* WS: received OPCODE PING
* WS: received 0 bytes payload (0 left, buflen was 2)
* WS: auto-respond to PING with a PONG, 0 bytes payload
* WS: send OPCODE PONG
* WS: set FIN
* WS: send FIN bit 1 (byte 8a)
* WS: send payload len 0
* WS: wanted to send 6 bytes, sent 6 bytes
* WS: bytesleft 0 datalen 0
* reached /home/marski/Sources/curl/lib/easy.c:1231
* WS: 2 bytes left to decode
* WS:235 received FIN bit 1
* WS: received OPCODE PING
* WS: received 0 bytes payload (0 left, buflen was 2)
* WS: auto-respond to PING with a PONG, 0 bytes payload
* WS: send OPCODE PONG
* WS: set FIN
* WS: send FIN bit 1 (byte 8a)
* WS: send payload len 0
* WS: wanted to send 6 bytes, sent 6 bytes
* WS: bytesleft 0 datalen 0
* reached /home/marski/Sources/curl/lib/easy.c:1231
* WS: 2 bytes left to decode
* WS:235 received FIN bit 1
* WS: received OPCODE PING
* WS: received 0 bytes payload (0 left, buflen was 2)
* WS: auto-respond to PING with a PONG, 0 bytes payload
* WS: send OPCODE PONG
* WS: set FIN
* WS: send FIN bit 1 (byte 8a)
* WS: send payload len 0
* WS: wanted to send 6 bytes, sent 6 bytes
* WS: bytesleft 0 datalen 0

When the autoping is handled, and it goes back to the outer loop, what makes it stick in the loop?

Whenever the client gets stuck in curl_ws_recv after receiving a PING frame and I break into the debugger, it shows the following call stack:

1  __libc_recv                            recv.c                   28  0x7ffff7ae435c 
2  __libc_recv                            recv.c                   23  0x7ffff7ae435c 
3  cf_socket_recv                                                      0x7ffff7f36055 
4  bio_cf_in_read                                                      0x7ffff7f97da7 
5  ??                                                                  0x7ffff76de6ae 
6  ??                                                                  0x7ffff76dd504 
7  BIO_read                                                            0x7ffff76ddad7 
8  ??                                                                  0x7ffff7930b91 
9  ??                                                                  0x7ffff7934e1e 
10 ??                                                                  0x7ffff79326d0 
11 ??                                                                  0x7ffff7939c45 
12 ??                                                                  0x7ffff7944a3f 
13 SSL_read                                                            0x7ffff7944b47 
14 ossl_recv                                                           0x7ffff7f94d60 
15 ssl_cf_recv                                                         0x7ffff7f99ec7 
16 Curl_read                                                           0x7ffff7f6f060 
17 curl_easy_recv                                                      0x7ffff7f45dd9 
18 curl_ws_recv                                                        0x7ffff7f8bca4 
19 (anonymous namespace)::on_socket_event websocket.cpp            130 0x4060e7       
20 EventLoop_wait                         eventloop.cpp            108 0x403525       

Perhaps the underlying BIO, that SSL_read calls into, is blocking?

Let me know how else I could help you to resolve this problem.

bagder added a commit that referenced this issue Feb 27, 2023
Reported-by: marski on github
Fixes #10615
@bagder
Copy link
Member

bagder commented Feb 27, 2023

You are right. I made it blocking for the connect_only-mode but I forget why... I need to get rid of that and I'm sure your case will work differently! Try #10625 if you can.

@marski
Copy link
Author

marski commented Feb 27, 2023

I confirm that applying your patch from #10625 fixes the problem, and curl_ws_recv does not block after handling PONG frame. Thank you for looking into this!

@bagder
Copy link
Member

bagder commented Feb 27, 2023

Excellent, thanks for confirming!

@bagder bagder closed this as completed in 3b23dbe Feb 27, 2023
bch pushed a commit to bch/curl that referenced this issue Jul 19, 2023
Reported-by: marski on github
Fixes curl#10615
Closes curl#10625
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants