curl-users
URL parsing ehchancement
Date: Wed, 09 Apr 2003 16:23:46 -0700
Hi,
I have been fixing a hotmail scraper script that uses curl, and noticed
that, hotmail now uses URLs like
http://example.com?param=blah...
Notice the missing slash after 'example.com' but before the question mark.
I do not know if this is a valid URL, but nevertheless both IE and
mozilla, and even lynx didn't mind it while curl did - it tried to
append the query string to the hostname part and resolve that ugliness.
I have created a small patch to allow parsing of URLs like that. I am
not an expert in curl or C, so I did it to the best of my understanding.
It works for me. If there is a better way to do that, please correct
the patch - it shouldn't be too hard.
If it looks alright, can someone apply the patch? The patch was made
against version 7.10.4. Thanks!
P.S. I am not subscribed to the list. Please, cc: me when replying.
Vlad
-- Vlad Krupin Software Engineer echospace.com
--- url.c-old Wed Apr 9 16:04:33 2003
+++ url.c Wed Apr 9 16:14:19 2003
@@ -1921,9 +1921,13 @@
/* Set default host and default path */
strcpy(conn->gname, "curl.haxx.se");
strcpy(conn->path, "/");
-
+ /* we need to search for '/' OR '?' - whichever comes first after host
+ * name but before the path. We need to change that to handle things
+ * like http://example.com?param= (notice the missing '/'). Later we'll
+ * insert that missing slash at the beginning of the path.
+ */
if (2 > sscanf(data->change.url,
- "%64[^\n:]://%512[^\n/]%[^\n]",
+ "%64[^\n:]://%512[^\n/?]%[^\n]",
conn->protostr, conn->gname, conn->path)) {
/*
@@ -1974,6 +1978,14 @@
buf = data->state.buffer; /* this is our buffer */
+ /* If URL is malformed (missing a '/' after hostname before path)
+ * we insert a slash here
+ */
+ if(conn->path[0] == '?'){
+ strcpy(&conn->path[1],conn->path);
+ conn->path[0] = '/';
+ }
+
/*
* So if the URL was A://B/C,
* conn->protostr is A
-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger
for complex code. Debugging C/C++ programs can leave you feeling lost and
disoriented. TotalView can help you find your way. Available on major UNIX
and Linux platforms. Try it free. www.etnus.com
Received on 2003-04-10