Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS/IPNS URLs ignore the path #12148

Closed
Stebalien opened this issue Oct 17, 2023 · 9 comments
Closed

IPFS/IPNS URLs ignore the path #12148

Stebalien opened this issue Oct 17, 2023 · 9 comments

Comments

@Stebalien
Copy link

I did this

I run:

curl -v ipns://ipfs.tech/community

I get:

*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET /ipns/ipfs.tech HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Access-Control-Allow-Headers: Content-Type
< Access-Control-Allow-Headers: Range
< Access-Control-Allow-Headers: User-Agent
< Access-Control-Allow-Headers: X-Requested-With
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Methods: HEAD
< Access-Control-Allow-Methods: OPTIONS
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Content-Length
< Access-Control-Expose-Headers: Content-Range
< Access-Control-Expose-Headers: X-Chunked-Output
< Access-Control-Expose-Headers: X-Ipfs-Path
< Access-Control-Expose-Headers: X-Ipfs-Roots
< Access-Control-Expose-Headers: X-Stream-Output
< Content-Type: text/html; charset=utf-8
< Location: /ipns/ipfs.tech/
< X-Ipfs-Path: /ipns/ipfs.tech
< X-Ipfs-Roots: QmbMTqnawMAbD4Fzgd5NM9GpKT67FdNZ1EsH8D4r52PTeT
< Date: Tue, 17 Oct 2023 18:26:48 GMT
< Content-Length: 51
< 
<a href="/ipns/ipfs.tech/">Moved Permanently</a>.

The same goes for all ipfs:// urls:

curl -v ipfs://bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/cURL

Output:

*   Trying [::1]:8080...
* Connected to localhost (::1) port 8080
> GET /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.4.0
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Access-Control-Allow-Headers: Content-Type
< Access-Control-Allow-Headers: Range
< Access-Control-Allow-Headers: User-Agent
< Access-Control-Allow-Headers: X-Requested-With
< Access-Control-Allow-Methods: GET
< Access-Control-Allow-Methods: HEAD
< Access-Control-Allow-Methods: OPTIONS
< Access-Control-Allow-Origin: *
< Access-Control-Expose-Headers: Content-Length
< Access-Control-Expose-Headers: Content-Range
< Access-Control-Expose-Headers: X-Chunked-Output
< Access-Control-Expose-Headers: X-Ipfs-Path
< Access-Control-Expose-Headers: X-Ipfs-Roots
< Access-Control-Expose-Headers: X-Stream-Output
< Content-Type: text/html; charset=utf-8
< Location: /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/
< X-Ipfs-Path: /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze
< X-Ipfs-Roots: bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze
< Date: Tue, 17 Oct 2023 18:28:37 GMT
< Content-Length: 101
< 
<a href="/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/">Moved Permanently</a>.

I expected the following

I'd expect curl to fetch the requested pages. Instead, it ignores the entire path and tries to fetch the root CID/domain.

E.g., in the first request, I should see:

> GET /ipns/ipfs.tech/community HTTP/1.1

But instead I see:

> GET /ipns/ipfs.tech HTTP/1.1

curl/libcurl version

curl 8.4.0

operating system

Arch Linux

@jay
Copy link
Member

jay commented Oct 17, 2023

@markg85

@markg85
Copy link
Contributor

markg85 commented Oct 17, 2023

On it. I know why it happens. Just finding a neat way to fix it at the moment.

@markg85
Copy link
Contributor

markg85 commented Oct 17, 2023

cc @lidel
To summarize, there's actually a couple bugs here (will add testcases for each):

  • curl ipfs://<cid>/a/b/c would be handled as curl ipfs://<cid>
  • curl ipfs://<cid>?arg=val would be handled as curl ipfs://<cid>
  • curl ipfs://<cid>/a/b/c?arg=val would be handled as curl ipfs://<cid>

This is because we only use the HOST and SCHEME part from the user input, anything else is ignored.
I'm now rewriting this to take the input url as base and "add in" the gateway, which is the complete opposite approach of what was done before.

Tricky part, what to do with gateways containing a PATH and/or QUERY. I'm going to maintain the path but throw a malformed URL if the gateway contains a query. As mixing the query from 2 url's is just asking for needless complexity. This also means that magic gateway query arguments won't be possible anymore. Imho, that's fine because it should - and can! - be done through headers instead.

The issue is the same for IPNS, same code path. A fix for one fixes it for the other.

@dfandrich
Copy link
Contributor

dfandrich commented Oct 17, 2023 via email

@markg85
Copy link
Contributor

markg85 commented Oct 17, 2023

The new(-ish) URL API should make this kind of rewriting easy to do.

Feel free to share a link.
Unless you mean the curl_url_get and curl_url_set functions, i am using those.

@dfandrich
Copy link
Contributor

dfandrich commented Oct 17, 2023 via email

@markg85
Copy link
Contributor

markg85 commented Oct 18, 2023

@dfandrich It works quite well for the case where you want to replace the host from one url with another.
Although, that still requires you to:

  • get the host in a temp variable from url X + error handling
  • set that host in url Y + error handling
  • clean up

Even simpler would be a setter function that instead of a const char *part would accept a source CURLU *source and would do this effectively in a one-liner with no cleanup required. An example of how that would look:

CURLU *dest = ...
CURLU *otherUrl = ...

/* init, etc... */

if(curl_url_set(dest, CURLUPART_HOST, otherUrl, CURLU_URLENCODE) != CURLUE_OK) {
  /* error handling */
}

The intent in the above example is to get the CURLUPART_HOST from the otherUrl and put it in the dest url object.

But this is just replacing.
Things get hairy if you want to append/prepend with for example combining paths. That - as it is now - requires you to get the path part from both url's, do checks on it (like to prevent double slashes). That quickly gets quite a bit. It would be super if the curl API would instead allow the above with a prepend/append flag too. Example:

CURLU *dest = ... /* say http://localhost/a/b */
CURLU *otherUrl = ... /* say https://curl.se/foo/bar */

/* init, etc... */

if(curl_url_set(dest, CURLUPART_PATH | CURLUPART_PREPEND, otherUrl, CURLU_URLENCODE) != CURLUE_OK) {
  /* error handling */
}

Note the CURLUPART_PREPEND. What this should do is prepend the path part from otherUrl to dest.
So dest would become: http://localhost/foo/bar/a/b
In an hypothetical CURLUPART_APPEND the result would become http://localhost/a/b/foo/bar

Then again, what i'm doing with an url here is probably as much of an edge case as it can be.

@dfandrich
Copy link
Contributor

dfandrich commented Oct 18, 2023 via email

@bagder
Copy link
Member

bagder commented Oct 19, 2023

I'm not sold on the arguments here for extending the URL API, but I am certainly open for discussing extending and improving it.

I think the examples in this thread are a little too specific for this exact use case and shows the problem: sure, if the API provided these functions with exact these properties that would help this particular use case, but when adding combined actions, decisions have to be made and it is not certain those decisions will be fine for everyone and then people will end up having to write it themselves anyway, just leaving us with code that can rarely actually be used.

@bagder bagder closed this as completed in 859e88f Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants