curl / Mailing Lists / curl-library / Single Mail


Re: a URL API ?

From: Daniel Stenberg via curl-library <>
Date: Mon, 13 Aug 2018 09:44:53 +0200 (CEST)

On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote:

> I think you're right, it should work. Documenting
> (CURLU_URLDECODE|CURLU_URLENCODE) as performing canonicalization is probably
> all you'd need, besides ensuring decode and encode happen in the correct
> order.

We could perhaps even make it a separate flag to make it more obvious to the
user: CURLU_CANONICAL. Only recognized when getting the URL.

> Actually, does CURLU_URLDECODE do anything on the curl_url_get call? It
> sounds like something that should only do something on the curl_url_set
> call.

The code keeps the strings URL encoded in the struct, pretty much as they were
in the original URL so if you want the "raw" version of them you ask for URL
decoding on *get().

On *set() you're expected to pass in the URL encoded version or ask libcurl to
encode it for you.

> This means that the preferred form of a URI differs depending on the scheme.
> Do we want to build in knowledge of the preferred encoding sets for all the
> different URI schemes out there today, or even just the ones curl supports?

The URI syntax is or can be subtly different depending on scheme already even
without canonicalization (like the options part of the authority section). My
approach so far is to only recognize libcurl-supported schemes by default,
allowing that to be overridden with a flag. For unsupported schemes, it will
of course just become a "best effort" and a generic handling.

I *suspect* libcurl users will most likely often only care for schemes that
libcurl supports.

> I think there should be a new option for this kind of encoding so the
> canonical form stays canonical for every URI scheme, but programs that would
> prefer merely a fairly consistent human-readable form using an encoding set
> optimized for the scheme in use could use the other
> CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option instead.

I'm not sure I see the difference between these two approaches. Can you show
them with some example URLs?

Received on 2018-08-13