curl / Mailing Lists / curl-library / Single Mail
Buy commercial curl support from WolfSSL. We help you work out your issues, debug your libcurl applications, use the API, port to new platforms, add new features and more. With a team lead by the curl founder himself.

Adding IPFS Trustless Gateway Protocol Questions

From: Hugo Valtier via curl-library <curl-library_at_lists.haxx.se>
Date: Thu, 26 Oct 2023 00:33:23 +0200

Hello, I work at Protocol Labs on the IPFS Stewards Team, on the Go
implementations and have done some specs rewriting.
(GH Profile https://github.com/Jorropo/)

Mark recently added ipfs:// support in the curl CLI
<https://github.com/curl/curl/commit/65b563a96a226649ba12cb1ec7b5c4c538ec1c08>,
however sadly it does not perform validation on the data received. I am
interested in fixing that as well as moving the support into libcurl.
I already wrote some PoC in Go: https://github.com/Jorropo/go-featheripfs
Which I'm productionising this code in our Boxo go library:
https://github.com/ipfs/boxo/pull/347

I have questions about how I should go about it on 2 points.

# How should a libcurl protocol go about reusing another libcurl protocol ?
What I implement is the trustless gateway protocol
<https://specs.ipfs.tech/http-gateways/trustless-gateway/> this is a
request streaming response protocol over HTTP. I do point to point impl, no
tricky P2P or concurrent download thing.
To consume HTTP in the IPFS protocol I have created an IPFS statemachine
struct which I have added to the SingleRequest.p union.
This object itself contains a CURL* field of the userfacing libcurl API (I
will probably have to move CURLM* but I didn't get to this problem yet).
Is it acceptable for libcurl internals to use the public libcurl API ?
Given I have to implement Curl_handler myself I could see a world where
targeting Curl_handler_http makes sense as I can forward my implementations
of the Curl_handler methods to Curl_handler_http with the required wrapping
added.
However url.c (and more ?) does preprocessing which is nice for me and
having two code paths to reach HTTP would lead to duplicated code and smell
breeding grounds for bugs, it also is not as easy just forward to http
since sometimes I need HTTPS, H2, H3, ...

This could become tricky based on the semantics we want to attach, for
example some parameters like proxying, headers (if the user wants to add
auth token, but not Range as this handled by IPFS), --insecure, ... should
really be passed as-is to the existing http* stack.

Is there a third better option I overlooked ?

Also it would be nice if whatever solution allows to do more than one HTTP
request, the state machine can use the DFS merkle-tree stack to do
resumption of downloads so you can imagine the user supplying multiple IPFS
gateways and the code could transparently restart streaming with another
server exactly where it left off with the previous server if an IO error
happen.

# Where should the true "IPFS" IPFS code live ?
Concretely the implementation could be split in 3 parts:
1. Decoders
  - Multibase <http://github.com/multiformats/multibase> (1 char prefix,
followed by hex, base32, base64, ...)
  - Unixfs <https://github.com/ipfs/specs/pull/331> protobuf which is the
merkle-tree we use to encode unix-like objects.
    - I've already been asked to also support cbor encoding.
  - Multihash <https://github.com/multiformats/multihash> (prefix which
maps to some hash function), currently it's not really a thing in my curl
code because I only support sha256 to dodge codesize debates about adding
more hash functions. In the case this grows, it is a switch statement from
one ids to a hash function.
2. Car <https://ipld.io/specs/transport/car/carv1/> Decoding unixfs
Validation state machine, this reads the incoming stream from the server,
parse blocks, maintains a DFS stack of future blocks expected, this reads 1
block (maximum 2MiB in IPFS world currently) and copy decoded data to the
consumer once validated (repeat until the stack is empty)
3. Interfacing with the underlying HTTP library.

I am not thinking about reusing this code in other projects which don't use
libcurl right now so this is not a concern.
It is a maintenance issue, is having this code in curl acceptable ? I can
pledge time to maintain the IPFS parts in curl if time needed (fix bugs and
help on reviews).
If not, how should I go about it ? Write, maintain and distribute my own .h
and .so which curl can target ? I'm scared by the absence of a C module
management story (I'm used to the exceptional Go modules) and I've heard
more than once: "we just use libc, openssl, zlib, libcurl, ... (stable
librairies a huge share of linux systems already ship with) because adding
anything else makes devops too hard".
Note: I say .so not .c because if I would maintain my own separated
greenfield library I would like to use Zig instead of C.

I don't need an answer right now on this one, it is fair to make a decision
once I have a pull request up.

Thx


-- 
Unsubscribe: https://lists.haxx.se/mailman/listinfo/curl-library
Etiquette:   https://curl.se/mail/etiquette.html
Received on 2023-10-26