Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/2.0 multiplexing: broken POST request when re-adding GET request #2520

Closed
steini2000 opened this issue Apr 23, 2018 · 17 comments
Closed
Labels

Comments

@steini2000
Copy link
Contributor

I did this

  • enable HTTP/2.0 multiplexing
  • send GET request that is answered after 5s
  • also send POST request, send data infinitely in read_callback()
  • if GET request completes, re-add GET handle to multi handle immediately
  • read_callback() is maybe called one more time after re-adding the GET handle, but never again after that

The problem appears when adding the GET handle.
When adding the GET handle again is delayed, the read_callback() is called until shortly after the curl_multi_add_handle().

I expected the following

read_callback() is still called continuously, data can be send continuously.
With HTTP/2.0 multiplexing disabled, everything works as expected.

curl/libcurl version

build from source a3f3853
curl 7.60.0-DEV (x86_64-pc-linux-gnu) libcurl/7.60.0-DEV OpenSSL/1.0.2g zlib/1.2.8 nghttp2/1.31.0
Release-Date: [unreleased]
Protocols: dict file ftp ftps gopher http https imap imaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS Debug TrackMemory IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy

Also observed with curl 7.56 on ARM

operating system

Ubuntu 16.04

Servlet source and pcap on request.

http2-mpx.c.txt
http2-mpx.log.zip

@bagder bagder added the HTTP/2 label Apr 23, 2018
@geekodour
Copy link

@bagder is this a beginner friendly issue? I am exploring the codebase now, if it's a low complexity issue then would like to work on it.

@bagder
Copy link
Member

bagder commented Apr 25, 2018

Thanks for considering this @geekodour, but while I haven't yet investigated this issue very closely I think I can say that this is not a simple issue. It involves multiple HTTP/2 streams and something not being right in the data handling in (one of) them.

To give you a better picture, let me describe roughly how I imagine I'd proceed to work on this bug:

  1. Make use of @steini2000's test source and make it reproducible using a local HTTP server. I'd probably use my local apache server and just make sure h2c is enabled in it.
  2. Make sure that I can keep sending data to a URL on the server to do the POST stream as described.
  3. Run the app and verify that it shows the described symptoms
  4. Now, assuming that step (3) is successful, put some logging/printf code around the logic that calls the read callback for the POST stream to figure out what conditions that make it not call the read callback
  5. If the wrong conditions or failed check can be figured out, the next step is to understand why it ends up in that state. It would probably involve making sure we figure out exactly when the problem appears (as in when the other stream stops or when it is started again or something else).
  6. Single-step through the functions involved in triggering the bad condition and see what makes it go wrong
  7. log/printf more
  8. try changing the wrong code with something that makes sense
  9. rerun the test app and make sure it works, or go back to step 5 or 6 or 7.
  10. when it works in (9), run the entire test suite and make sure nothing else broke
  11. clean up the code change, leave only the relevant fixes. Maybe add a comment?
  12. commit the change in a new local branch
  13. create a PR on github and have the CI tests verify the code even further

@bagder
Copy link
Member

bagder commented Apr 26, 2018

My first attempt to reproduce this issue has failed. My test application (that only changed the two URLs in the source posted above) that downloads a 4KB file while doing the POST, managed to get it 5000 times without problems by the time I stopped it...

I'm using libcurl from current git master.

@steini2000
Copy link
Contributor Author

I'm also using git master now. And I've updated to Tomcat 9.0.7.

What I've found out so far:
read_callback() is not called because in Curl_readwrite(), select_res does not have the out bit sit.
The out bit is not set because for Curl_socket_check(), fd_write is -1.
fd_write is -1 because conn->writesockfd is -1.
conn->writesockfd is -1 because curl_setup_transfer() is called from Curl_http() in default case (GET request) with writesockindex -1 (no postdata).

I also changed the size of data for the GET request from a few bytes to ~4k , but I can still reproduce the issue.

I'll investigate further and report back next week.

@steini2000
Copy link
Contributor Author

The servlet I'm using for testing:
https://github.com/steini2000/curl-2520

@steini2000
Copy link
Contributor Author

I hope someone else can reproduce the issue.
This really doesn't look like a simple issue.

The problem is somewhere here:

curl/lib/http.c

Lines 2793 to 2808 in f84139f

default:
result = Curl_add_buffer(req_buffer, "\r\n", 2);
if(result)
return result;
/* issue the request */
result = Curl_add_buffer_send(req_buffer, conn,
&data->info.request_size, 0, FIRSTSOCKET);
if(result)
failf(data, "Failed sending HTTP request");
else
/* HTTP GET/HEAD download: */
Curl_setup_transfer(conn, FIRSTSOCKET, -1, TRUE, &http->readbytecount,
http->postdata?FIRSTSOCKET:-1,
http->postdata?&http->writebytecount:NULL);

Curl_setup_transfer() is called for the GET request. As there is no postdata, the connections writesockfd is nuked. But the connection is still used for the POST request. But the writesockfd is -1 now, the read_callback() is not called.

Please be patient, I won't be able to work on this for some days.

@bagder
Copy link
Member

bagder commented Apr 27, 2018

I'm not convinced the problem is there. That part of the code sets up the request for this transfer. A transfer is pretty much a 1-1 mapping with a Curl_easy struct. Several (or even many) transfers may share a single connection (struct connectdata) for transfers, like when doing HTTP/2 over it.

The function that controls what gets waited for, per transfer, is the multi_getsock function. Your application calls curl_multi_fdset which iterates over all current transfers and calls multi_getsock for each.

@steini2000
Copy link
Contributor Author

Did a little bit more research.

The problem might be solved somewhere else, but the problem can be seen there.
This part sets up exactly one GET request:

curl/lib/http.c

Lines 2792 to 2809 in 1621aed

default:
result = Curl_add_buffer(req_buffer, "\r\n", 2);
if(result)
return result;
/* issue the request */
result = Curl_add_buffer_send(req_buffer, conn,
&data->info.request_size, 0, FIRSTSOCKET);
if(result)
failf(data, "Failed sending HTTP request");
else
/* HTTP GET/HEAD download: */
Curl_setup_transfer(conn, FIRSTSOCKET, -1, TRUE, &http->readbytecount,
http->postdata?FIRSTSOCKET:-1,
http->postdata?&http->writebytecount:NULL);
}

It calls this function to setup the transfer:

curl/lib/transfer.c

Lines 1989 to 2017 in 1621aed

void
Curl_setup_transfer(
struct connectdata *conn, /* connection data */
int sockindex, /* socket index to read from or -1 */
curl_off_t size, /* -1 if unknown at this point */
bool getheader, /* TRUE if header parsing is wanted */
curl_off_t *bytecountp, /* return number of bytes read or NULL */
int writesockindex, /* socket index to write to, it may very well be
the same we read from. -1 disables */
curl_off_t *writecountp /* return number of bytes written or NULL */
)
{
struct Curl_easy *data;
struct SingleRequest *k;
DEBUGASSERT(conn != NULL);
data = conn->data;
k = &data->req;
DEBUGASSERT((sockindex <= 1) && (sockindex >= -1));
/* now copy all input parameters */
conn->sockfd = sockindex == -1 ?
CURL_SOCKET_BAD : conn->sock[sockindex];
conn->writesockfd = writesockindex == -1 ?
CURL_SOCKET_BAD:conn->sock[writesockindex];
k->getheader = getheader;

In setup_transfer() it sets conn->writesockfd (writesockfd of the connection !!!) to CURL_BAD_SOCK if writesockindex parameter is -1.
And writesockindex is -1 for a GET request (no postdata for this request).

So a GET requests always sets writesockfd of the connection (!!!) to CURL_BAD_SOCK.

For testing, I changed in setup_transfer():
conn->sockfd = sockindex == -1 ?
CURL_SOCKET_BAD : conn->sock[sockindex];
conn->writesockfd = writesockindex == -1 ?
CURL_SOCKET_BAD:conn->sock[writesockindex];
to
if (sockindex != -1) {
conn->sockfd = conn->sock[sockindex];
}
if (writesockindex != -1) {
conn->writesockfd = conn->sock[writesockindex];
}

and
if(conn->writesockfd != CURL_SOCKET_BAD) {
to
if((conn->writesockfd != CURL_SOCKET_BAD) && (writesockindex != -1)) {

With this changes, the read_callback() of the application is called.
But there are sometimes gaps of about 1 sec where the read_callback() is not called, so there are more problems...

@bagder
Copy link
Member

bagder commented May 4, 2018

Your observation seems correct and it's curious this hasn't caused more problems. I think that's because conn->writesockfd isn't used everywhere. http2.c for example always accesses the socket with conn->sock[FIRSTSOCKET].

So, while you've identified a problem correctly I unfortunately don't think that's the major one that causes your problems.

My proposed fix for this issue (and I'll file a separate PR in a sec) looks likes this:

diff --git a/lib/transfer.c b/lib/transfer.c
index 5c8eb31d3..db08a6b4d 100644
--- a/lib/transfer.c
+++ b/lib/transfer.c
@@ -2016,15 +2016,22 @@ Curl_setup_transfer(
   data = conn->data;
   k = &data->req;
 
   DEBUGASSERT((sockindex <= 1) && (sockindex >= -1));
 
-  /* now copy all input parameters */
-  conn->sockfd = sockindex == -1 ?
+  if(conn->bits.multiplex) {
+    /* when multiplexing, the read/write sockets need to be the same! */
+    conn->sockfd = sockindex == -1 ?
+      conn->sock[writesockindex] : conn->sock[sockindex];
+    conn->writesockfd = conn->sockfd;
+  }
+  else {
+    conn->sockfd = sockindex == -1 ?
       CURL_SOCKET_BAD : conn->sock[sockindex];
-  conn->writesockfd = writesockindex == -1 ?
+    conn->writesockfd = writesockindex == -1 ?
       CURL_SOCKET_BAD:conn->sock[writesockindex];
+  }
   k->getheader = getheader;
 
   k->size = size;
   k->bytecountp = bytecountp;
   k->writebytecountp = writecountp;

@bagder
Copy link
Member

bagder commented May 4, 2018

Hm, no that was a bit too quick. Doesn't work properly...

bagder added a commit that referenced this issue May 4, 2018
Curl_setup_transfer() can be called to setup a new individual transfer
over a multiplexed connection so it shouldn't unset writesockfd.

Bug: #2520
@bagder
Copy link
Member

bagder commented May 4, 2018

I polished it slightly and created #2549. But again: while I think this fixes a bug, I'm not convinced this is the fix for you. Let me know what you find out!

bagder added a commit that referenced this issue May 4, 2018
Curl_setup_transfer() can be called to setup a new individual transfer
over a multiplexed connection so it shouldn't unset writesockfd.

Bug: #2520
Closes #2549
@steini2000
Copy link
Contributor Author

Thank you. With #2549 I could investigate further.

The FD sets for select() are created here:

curl/lib/http2.c

Lines 103 to 126 in d63bada

static int http2_perform_getsock(const struct connectdata *conn,
curl_socket_t *sock, /* points to
numsocks
number of
sockets */
int numsocks)
{
const struct http_conn *c = &conn->proto.httpc;
int bitmap = GETSOCK_BLANK;
(void)numsocks;
/* TODO We should check underlying socket state if it is SSL socket
because of renegotiation. */
sock[0] = conn->sock[FIRSTSOCKET];
/* in a HTTP/2 connection we can basically always get a frame so we should
always be ready for one */
bitmap |= GETSOCK_READSOCK(FIRSTSOCKET);
if(nghttp2_session_want_write(c->h2))
bitmap |= GETSOCK_WRITESOCK(FIRSTSOCKET);
return bitmap;
}

But nghttp2_session_want_write() returns FALSE, so we are not listening to the write socket. It returns FALSE because pq is empty.
When listening to the write write socket (for testing, ignore !want_write()), everything works as expected.

But why is pq empty? Does the new GET request do that?
Any ideas?

@steini2000
Copy link
Contributor Author

After startup:
curl_multi_perform():

  1. perform GET
  2. perform POST

After the GET has finished and is re-added:
curl_multi_perform():

  1. perform POST
  2. perform GET

After "perform GET", nghttp2_session_want_write() is always FALSE.

Perform GET:
In http2_send(), nghttp2_session_send() is called. It sends the headers for the new GET request. After that, want_write() is false. Calling nghtt2_session_resume_data() afterwards doesn't help.

@bagder
Copy link
Member

bagder commented May 11, 2018

Thanks for the added details. I think http2_perform_getsock is wrong and doesn't consider the upload state correctly! Please try something like this:

diff --git a/lib/http2.c b/lib/http2.c
index 770ebdab5..58f4c9920 100644
--- a/lib/http2.c
+++ b/lib/http2.c
@@ -106,10 +106,11 @@ static int http2_perform_getsock(const struct connectdata *conn,
                                                          number of
                                                          sockets */
                                  int numsocks)
 {
   const struct http_conn *c = &conn->proto.httpc;
+  struct SingleRequest *k = &conn->data->req;
   int bitmap = GETSOCK_BLANK;
   (void)numsocks;
 
   /* TODO We should check underlying socket state if it is SSL socket
      because of renegotiation. */
@@ -117,11 +118,13 @@ static int http2_perform_getsock(const struct connectdata *conn,
 
   /* in a HTTP/2 connection we can basically always get a frame so we should
      always be ready for one */
   bitmap |= GETSOCK_READSOCK(FIRSTSOCKET);
 
-  if(nghttp2_session_want_write(c->h2))
+  /* we're still uploading or the HTTP/2 layer wants to send data */
+  if((k->keepon & KEEP_SEND|KEEP_SEND_PAUSE) == KEEP_SEND) ||
+    nghttp2_session_want_write(c->h2))
     bitmap |= GETSOCK_WRITESOCK(FIRSTSOCKET);
 
   return bitmap;
 }
 

bagder added a commit that referenced this issue May 12, 2018
When there's an upload in progress, make sure to wait for the socket to
become writable.

Detected-by: steini2000 on github
Bug: #2520
bagder added a commit that referenced this issue May 14, 2018
When there's an upload in progress, make sure to wait for the socket to
become writable.

Detected-by: steini2000 on github
Bug: #2520
Closes #2567
@bagder
Copy link
Member

bagder commented May 14, 2018

That patch now landed. Take it for a spin!

@steini2000
Copy link
Contributor Author

With the 2 patches, the test case works now.
Thank you!

@bagder
Copy link
Member

bagder commented May 14, 2018

Lovely, thanks for confirming!

@lock lock bot locked as resolved and limited conversation to collaborators Aug 12, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

3 participants