curl / Mailing Lists / curl-library / Single Mail

Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: WebSocket feature request: is it possible to call write function when full frame is loaded only?

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]

From: Dmitry Karpov via curl-library <curl-library_at_lists.haxx.se>
Date: Fri, 3 Feb 2023 19:52:41 +0000

“I think the reason is that websocket frame is divided between different TCP packets.”

Just a small reminder or observation that WebSocket is “message” based-protocol and “frames” are underlying transport layer, which is based on TCP packets down the line.
WS messages can be delivered using different WS frame partitioning, and for example, “Hello beautiful world!” text message can be delivered like:

Frame (“Hello beautiful world!”) -1 frame or
Frame(“He”) + Frame (“llo beauti”) + Frame(“beautiful wor”) + Frame(“ld!”) – 4 frames

It is up to WebSocket server implementation how it partitions its messages and in the second case, full WS frames contained partial pieces of the whole message.

And if some text/binary WS message are very large, WS server can split it into multiple parts delivered via WS frames which may be not aligned with some internal boundaries like ends of words or JSON objects.
In other words, “full frame” mechanism in libcurl covers only a very specific case – when one WS message uses only on WS Frame (no matter how huge it is), which is not the case for all WS server implementations.

When we were discussing here in the past future WebSocket support in libcurl, I mentioned that after implementing support for “raw” and “frame” WS level, eventually libcurl should provide the “message” level as well.

The WS message level should assemble incoming WS frames into messages on the fly (and handle such cases like Control messages received in the middle of large Text/Binary messages)

and provide both “streaming” interface with some kind of “write functions”, which would allow to handle very large WS messages without blowing up the memory,
and the “buffer” mode, when incoming message is stored in the message buffer (with the option to specify its size) and delivered to the client in one piece.

And in the “buffer” mode, if the message is too large for to keep it in the buffer, it should trigger “Too Large” WS error as WebSocket protocol prescribes.

So, if we are talking about roadmap for future WebSocket features, I think that the “message” level support and implementation should be the next step.
I have such approach in my C++ implementation of WebSocket protocol using libcurl, and it works well with different WS server implementations and can handle huge WS messages even on embedded devices.

Thanks,
Dmitry Karpov

From: curl-library <curl-library-bounces_at_lists.haxx.se> On Behalf Of Timothe Litt via curl-library
Sent: Friday, February 3, 2023 6:36 AM
To: curl-library_at_lists.haxx.se
Cc: Timothe Litt <litt_at_acm.org>
Subject: [EXTERNAL] Re: WebSocket feature request: is it possible to call write function when full frame is loaded only?

On 03-Feb-23 03:27, Daniel Stenberg via curl-library wrote:
On Fri, 3 Feb 2023, Vitalii B. Avramenko via curl-library wrote:

Such partial data may be OK for HTTP protocol when we know for sure that we have "request/response" pattern and we can detect the end of data by HTTP protocol itself, for example, with `Content-Length` header. But with websocket generally speaking we don't have any way to know where is end of frame with `CURLOPT_WRITEFUNCTION`.

Yes we do: curl_ws_meta() is provided to give you exactly that information!

we need a guarantee that `CURLOPT_WRITEFUNCTION` will call our callback when full frame is downloaded only, or at least we need the option that will allow us to request such behavior (something like `CURLOPT_WEBSOCKET_FULL_FRAMES_ONLY`).

I have been thinking about adding a mode for the websocket API that delivers full frames only, but I have hesitated a bit since frames can be up to 2^63 bytes big we need to decide on how to handle (too) big frames for such a mode.

What do you think is a reasonable behavior for a full-frame mode when it receives (ridiculously) large frames?

There's always an upper bound - no one has 2^63 bytes of swap space, memory, or disk space to store an extremely large frame. And it's not likely in the foreseeable future.

I think it's up to the application to decide what it's willing to handle. I don't think there's a universal answer of how. Maybe it calls getrlimit (RLIMIT_DATA) - or RLIMIT_FSIZE. Or it looks at free space on its output disk. Or bases it on estimated processing time. Or ...

For full frames, if you can't set an upper bound, your protocol user needs to rethink its usage. If your application really can deal with huge (beyond practical VM sized) data, it pretty much has to handle in in a stream - so FULL_FRAMES would be inappropriate.

So, here's a simple answer: Provide a setting for the maximum acceptable full frame size. On a FULL_FRAMES_ONLY connection, curl buffers any frame up to that size and provides it in the callback. Anything bigger (or curl can't allocate the buffer memory, times out waiting for it, etc) and curl returns an error (FRAME_TOO_BIG), aborts the connection and calls writefunction with NULL in the *ptr argument, and the actual size in 'size".

This provides the application with sufficient information to log the failure or even retry the request.

And to simplify the API, perhaps the setting should be "CURLOPT_WEBSOCKET_FULL_FRAMES_UPTO, <size>", and let zero be the current incremental delivery mode.

Vitalii can set <size> to a few GB if he can handle it. Or if he is willing to go until the OOM killer hits him, he can set size to 2^63-1 and see where fate takes him. Having lived thru "32 bits is so big that limits aren't necessary", I don't think that's a wise approach...
Timothe Litt

ACM Distinguished Engineer

--------------------------

This communication may not represent the ACM or my employer's views,

if any, on the matters discussed.

-- 
Unsubscribe: https://lists.haxx.se/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html

Received on 2023-02-03

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ] [ by messages with attachments ]