Re: URL validation
Date: Mon, 13 Jul 2020 23:32:00 +0200 (CEST)
On Mon, 13 Jul 2020, Stephan Mühlstrasser via curl-library wrote:
> when using curl's URL functions, is it possible to validate the URL?
Let's start with what does it mean to "validate" a URL ? By which critera is
the URL correct? URLs are unfortunately not defined very strictly (these days
- if they ever were).
> What exactly does "RFC 3986+" mean?
It means that libcurl parses URLs as RFC 3986 dictates they work, with a few
extra extensions that we've deemed necessary to make curl slightly more
"browser and real world"-compatible.
For example you can specify the URL with one, two or three slashes...
But also, as Jakub already said, libcurl focuses on extracting the right parts
from the URL. It will accept a little more than what a strict RFC 3986
adhering parser would (if there ever was one).
> And what happens if the URL string is not a correct "RFC 3986+" URL?
The libcurl URL parser returns an error *when it detects a problem*. Which of
course isn't the same thing as "validating" a URL.
> validate_url(url_handle, "https://curl.haxx.se/<invalid>");
> validate_url(url_handle, "https://curl.haxx.se/%XY");
> But curl_url_set() returns CURLUE_OK for them. Is this expected?
Yes, more or less expected anyway.
The first one is probably pointless to refuse since the brackets have no other
meaning to URLs and people use those characters already in URLs with browsers.
The second one is accepted by the parser since it doesn't verify
percent-encoded octets. Maybe it should.
-- / daniel.haxx.se | Commercial curl support up to 24x7 is available! | Private help, bug fixes, support, ports, new features | https://www.wolfssl.com/contact/
-------------------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette: https://curl.haxx.se/mail/etiquette.html
Received on 2020-07-13