curl-library
Re: Finding library regex
Date: Fri, 18 Oct 2002 17:59:05 -0700
On Friday, Oct 18, 2002, at 17:48 US/Pacific, $B2&(B $BgK(B wrote:
> Since curl do not extract links from html file. I wanna use regex to
> parse it. Anyone knows where can I find it, or any better way to
> extract links? Thanks in advance.
Here is a shell function I used to extract URLs from the default apache
file listing (no index.html or equivalent). You can probably adapt it
a bit to extract any link?
listurls() {
baseurl="$1"
[[ "$baseurl" != */ ]] && baseurl="${baseurl}/"
#echo baseurl: $baseurl >&2
$WGET $WGETCATOPTS "$baseurl"|sed -n '
/\[ \]/s!.*HREF="\([^"]*\).*$!'"$baseurl"'\1!p
/\[DIR\]/s!.*HREF="/\([^"]*\).*$!'"$baseurl"'!p
/\[DIR\]/s!.*HREF="\([^"]*\).*$!'"$baseurl"'\1!p'
}
$WGET is usually curl, but for older OS X (where the function was first
written), it was GNU's wget.
WGETCATOPTS="-L"
for curl
WGETCATOPTS="-nv -O -"
for wget.
-- Soren Spies Apple Computer, Inc. ------------------------------------------------------- This sf.net email is sponsored by: Access Your PC Securely with GoToMyPC. Try Free Now https://www.gotomypc.com/s/OSND/DDReceived on 2002-10-19