Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Octets after ? in gopher:// URL are interpreted as a query #3369

Closed
iamleot opened this issue Dec 12, 2018 · 7 comments
Closed

Octets after ? in gopher:// URL are interpreted as a query #3369

iamleot opened this issue Dec 12, 2018 · 7 comments

Comments

@iamleot
Copy link
Contributor

iamleot commented Dec 12, 2018

According RFC 4266 a gopher:// URL is:

[...]
A Gopher URL takes the form:

  gopher://<host>:<port>/<gopher-path>

where is one of:

  <gophertype><selector>
  <gophertype><selector>%09<search>
  <gophertype><selector>%09<search>%09<gopher+_string>

[...]

is the Gopher selector string. In the Gopher protocol,
Gopher selector strings are a sequence of octets that may contain any
octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal
(US-ASCII character LF), and 0D (US-ASCII character CR).
[...]

However, everything after a `?' is ignored.

I did this

I will denote with a S> the server, and C> the client:

S> % nc -l 7070
C> % curl 'gopher://localhost:7070/1/foo?bar'
S> /foo
S> % nc -l 7070
C> % curl 'gopher://localhost:7070/1/foo?'
S> /foo

I expected the following

S> % nc -l 7070
C> % curl 'gopher://localhost:7070/1/foo?bar'
S> /foo?bar
S> % nc -l 7070
C> % curl 'gopher://localhost:7070/1/foo?'
S> /foo?

No ? and (if any) no octets after the ? should be omitted.

curl/libcurl version

% curl -V
curl 7.63.0 (x86_64--netbsd) libcurl/7.63.0 OpenSSL/1.1.0i zlib/1.2.10 libidn2/2.0.5
Release-Date: 2018-12-12
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP UnixSockets HTTPS-proxy

operating system

NetBSD

PS: Please note that AFAICS we have no way to distunguish between a:
gopher://host:port/path and gopher://host:port/path? and so I think
that gopher_do() in lib/gopher.c could not be adjusted to just take
the concatenation of data->state.up.path and data->state.up.query
(in both cases data->state.up.query seems NULL).
Of course, if you have any idea how to better implement that,
feel free to share it and I'll try to write a possible patch!

Thank you very much!

@iamleot
Copy link
Contributor Author

iamleot commented Dec 12, 2018

And, sorry, forgot to share!

ATM as a possible workaround instead of directly using ? in the URL all ? should be manually encoded to %3f.

@bagder
Copy link
Member

bagder commented Dec 12, 2018

Probably a regression since 46e1640, but not detected due lack of proper tests.

Do you feel like taking on writing a fix?

@iamleot
Copy link
Contributor Author

iamleot commented Dec 12, 2018 via email

@bagder
Copy link
Member

bagder commented Dec 12, 2018

If we indeed need to know if '?' was present in the URL then we need to store that information when parsing it, which isn't done now. Probably by adding a boolean to the Curl_URL struct and getting the info for it here:

curl/lib/urlapi.c

Lines 812 to 814 in d8607da

query = strchr(path, '?');
if(query)
*query++ = 0;

Possibly a smoother approach is that we work around it and avoid extending that struct, so that if the URL is a gopher one, we store a zero length query.

@infinnovation-dev
Copy link

Surely URLs with no query and with an empty query are semantically different? If so, then it may be as simple as changing

if(query && query[0]) {
    u->query = strdup(query);

to

if(query) {
    u->query = strdup(query);

Similarly for fragment.

@iamleot
Copy link
Contributor Author

iamleot commented Dec 12, 2018 via email

@iamleot
Copy link
Contributor Author

iamleot commented Dec 12, 2018

Please note that I have not adjusted fragments handling similarly.

In my experience it is less common (and probably pretty rare!) to
appear in the selector and, unlike ?, I am not sure if it violates
RFC 3986 (that is quoted by RFC 4266) where:

[...]
3.5. Fragment
[...]
[...] Fragment identifier semantics are independent of the
URI scheme and thus cannot be redefined by scheme specifications.

[...]

Fragment identifiers have a special role in information retrieval
systems as the primary form of client-side indirect referencing,
allowing an author to specifically identify aspects of an existing
resource that are only indirectly provided by the resource owner. As
such, the fragment identifier is not used in the scheme-specific
processing of a URI; instead, the fragment identifier is separated
from the rest of the URI prior to a dereference, and thus the
identifying information within the fragment itself is dereferenced
solely by the user agent, regardless of the URI scheme. Although
this separate handling is often perceived to be a loss of
information, particularly for accurate redirection of references as
resources move over time, it also serves to prevent information
providers from denying reference authors the right to refer to
information within a resource selectively. Indirect referencing also
provides additional flexibility and extensibility to systems that use
URIs, as new media types are easier to define and deploy than new
schemes of identification.
[...]

IIUC the fragment part - unlike the query - should be not sent to
the server and only consumed solely by the user agent.

@bagder bagder closed this as completed in 305d25e Dec 13, 2018
bagder pushed a commit that referenced this issue Dec 13, 2018
After the migration to URL API all octets in the selector after the
first `?' were interpreted as query and accidentally discarded and not
passed to the server.

Add a gopherpath to always concatenate possible path and query URL
pieces.

Fixes #3369
Closes #3370
@lock lock bot locked as resolved and limited conversation to collaborators Mar 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

3 participants