curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Re: [PATCH] New protocol: gemini

From: Daniel Stenberg via curl-library <>
Date: Sat, 28 Nov 2020 12:08:02 +0100 (CET)

On Fri, 27 Nov 2020, Dmitry Bogatov via curl-library wrote:

> This patch implements gemini protocol as described in specification[^1],
> with following pieces missing:

Thanks for your contribution!

I have sooo many questions on this but I'll try to focus on the most important
ones here.

To me, Gemini looks like a mix of Gopher and HTTP/0.9 and it's a mystery to me
why you would rather write a new protocol so similar to those rather than just
stick to what already exists. But okay, everyone is free to work on whatever
they please.

The forced closing of the connection after every transfer will also make this
protocol impossible to ever perform well for multiple or small transfers. I
would imagine you'll also forbid using SSL session-id and TLS session tickets
too so repeated handshake times will be siginificant.

As a step 1 for this to get considered for inclusion in curl you need to
submit this patch as a PR on github (so that all tests and CI jobs get to
check it out), and you need to make sure that you add tests for at least the
major features and components of the protocol - and of course a necesary
amount of docs. Doing that will not *guarantee* that we accept it, but without
that step we certainly cannot.

> * Redirects. 3x status codes are not handled any specially from any
> other status code that has no body, -L option has no effect on gemini
> URLs.

The lack of "permanent" redirects and the restricted length on the URLs to be
shorter than 1024 would make it hard to work as web replacement protocol.

But that's just my thoughts.

'gemini' is not a registered URI scheme:

> * Certificate verification. Gemini servers rarely use certificates with
> trust chain from certificates in /etc/ssl/certs; self-signed
> certificates are the norm. Option -k should be the default for gemini
> protocol.

I will not accept that.

Doing -k is almost like not having TLS at all and I will not subject users of
curl to that - as default. If you use self-signed certificates you need to put
those certs in your local CA store and deal with them that way. Or use -k
against all recommendations and suffer the consquences. But not by default.

> /* XXX: This conditional can probably be eliminated by fixing
> * doing_get_proto function, but I do not know how.
> */
> if(err == CURLE_AGAIN)
> return CURLE_OK;

It seems very strange to equal AGAIN to OK. How do you know when a blocking
operation was stopped then and needs to get continued again?

> Also, about redirects, I tried to call Curl_follow if (status == '3'),
> but it seems to be specific to HTTP(s) and did not result in expected
> behaviour with -L flag.

The only protocol curl supports that has redirects is HTTP so yes it is
certainly HTTP specific!

> +static char *gemini_request(const struct urlpieces *up)
> +{
> + if(up->query)
> + return aprintf("gemini://%s%s?%s\r\n", up->hostname, up->path, up->query);
> + else
> + return aprintf("gemini://%s%s\r\n", up->hostname, up->path);
> +}

Isn't data->change.url already good enough?

> +static int gemini_doing_getsock(struct connectdata *conn, curl_socket_t *socks)
> +{
> + socks[0] = conn->sock[FIRSTSOCKET];

Does it really need reading when issuing the request?

> +/*
> + * According to specification, response has following format:
> + *
> + *
> + * and <META> is UTF-8 string up to 1024 bytes long, so buffer of
> + * size >= (2 + 1 + 1024 + 1 + 1) = 1029 is enough to read whole
> + * response header into memory. It is more efficient than reading
> + * byte-after-byte until \n is found.
> + */

I didn't spot the code that rejects the server response if a larger response
than this comes?

> +struct GEMINI {
> + struct {
> + size_t amount; /* Count of bytes read */
> + bool done;
> + char *lf; /* Pointer to linefeed character in {data} */
> + } block;
> + struct {
> + char *data; /* Allocated string */

I'd ask you do not call this 'data', as we usually save that name for the
pointer to the easy handle. (Also for the other 'data[]' above.)

It just makes it easier when glancing over code.

  | Commercial curl support up to 24x7 is available!
  | Private help, bug fixes, support, ports, new features
Received on 2020-11-28