cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Curl fails to get some URLs that browsers get - here's why

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 8 Feb 2001 09:04:43 +0100 (MET)

On Thu, 8 Feb 2001, Chris wrote:

> This might be a bug due to some HTML specs not being adhered to?

It is rather a URL definition that is a bit vague here. Let's pick this thing
apart.

> This page:-
>
> http://invest.foxmarketwire.com/news/../news/news_t_st4.sht
>
> loads under normal browsers, but not curl, because the browsers are
> replacing the "../" section, while curl is not.

curl is actually written to accept a URL as input (actually it takes several
these days). A URL is a sequence of characters forming an entity. RFC 2396 is
king.

curl is designed to accept and pass on the URL you tell it to. curl is not a
browser. I don't think of it as curl's job to clean up and correct mistakes
in the input. On the contrary, what if you wanted to test your web server for
security leaks in the "../" aspect? curl is a somewhat unforgiving tool at
times, but that makes it even better and allows more raw power to the user of
it.

If you want a URL such as the above to remove the ../ section as a browser
would do, you do it before you pass the URL to curl. That is what curl
assumes.

I would agree that it could be useful to have a separate option to curl that
performs this kind of "Resolving Relative References to Absolute Form" (as it
is specified in RFC 2396 section 5.2). I would most likely accept a patch if
anyone would code this functionality and send me a fix.

Thanks a lot for taking your time to report this. Are you up to providing me
with code that does this?

-- 
  Daniel Stenberg -- curl project maintainer -- http://curl.haxx.se/
Received on 2001-02-08