curl / Mailing Lists / curl-users / Single Mail

curl-users

Re: URL parsing for the command line tool? (Daniel Stenberg)

From: Timothe Litt <litt_at_acm.org>
Date: Fri, 17 Aug 2018 07:11:38 -0400

On 17-Aug-18 06:00, curl-users-request_at_cool.haxx.se wrote:
>
> Message: 1
> Date: Thu, 16 Aug 2018 15:45:46 +0200 (CEST)
> From: Daniel Stenberg <daniel_at_haxx.se>
> To: the curl tool <curl-users_at_cool.haxx.se>
> Subject: Re: URL parsing for the command line tool?
> Message-ID: <alpine.DEB.2.20.1808161145320.9053_at_tvnag.unkk.fr>
> Content-Type: text/plain; format=flowed; charset=US-ASCII
>
> On Wed, 15 Aug 2018, David Niklas wrote:
>
>> Please try something more real world (even contrived). Why would you want to
>> change hosts? Your example makes no sense to me.
> I was just trying to provoke thoughts and ideas, I didn't have any particular
> use case in mind.
>
> Features we can consider:
>
> 1. host-specific sections in config files. So .curlrc can specify for example
> a specific user-agent to use only when connecting to example.com and a
> different user-agent for example.org.
>
> 2. command-line variables based on the most recently used URL. If you want to
> save the output from a download in a directory named as the host name with the
> file name part also from the URL:
> "curl https://example.com/file -o "%{url_host}/%{url_file}".
>
>> export in1="../download.html"
>>
>> for i in DragonFlyBSD FreeBSD NetBSD; do
>> curl --base-url $base --output-url - $in1 "#" $i | ./download_curl.sh
>> done
> Or just a way to apply a relative URL on the absolute one before it is used:
>
> curl http://example.org/foo --rel-url "../here/it/is" -O
>
> That could be fun for those who download a HTML page and want to download
> something that is pointed to with a relative URL within that.
>
> curl $url > raw.html
> extract_hrefs;
> for i in $all_hrefs; do
> curl $url --rel-url "$i" -O
> done
>
Most of this seems like feature bloat. 

host-specific sections in config files could be useful.

The rest can be easily (and unless you put a lot of work in, more
flexibly and easily) in the scripting language of your choice.

For Perl, see URI for parsing/dissecting URIs of all sorts.  And
HTML::TreeBuilder for parsing HTML (including finding hrefs, <img> srcs,
etc.)  Also, there are other Perl modules that provide a direct
interface to Curl.  So it's quite easy to handle your examples - and the
more complex usages that they presage.  (e.g. $uri = URI->new(
"http://example.org";)  print $uri->host; $uri->path, $uri->fragment;
$abs = URI->abs( "bar/nil.jpg", "https://example.org/foo" ); print
$abs->path ; $rel = $abs->rel("https://example.org./"); ... ).

Python has similar URI parsing & Curl access modules.

I don't think you want Curl to become a scripting language.  I'd stick
with the Unix philosophy of small tool, each of which does one thing
well, that can be composed for more complex tasks.

But if you go this way, be prepared to see a long list of "enhancement"
requests that will add development & maintenance effort to your plate.

Timothe Litt
ACM Distinguished Engineer
--------------------------
This communication may not represent the ACM or my employer's views,
if any, on the matters discussed.

-----------------------------------------------------------
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-users
Etiquette: https://curl.haxx.se/mail/etiquette.html

Received on 2018-08-17