cURL / Mailing Lists / curl-library / Single Mail

curl-library

Re: libcurl question, Range support for http

From: Guillaume Arluison <ga_at_lacde.net>
Date: Fri, 26 Nov 2004 13:51:39 -0000

Ok it wasnt enough clear before.
I must admit I dont really understand why using a web server for that (the
example is exagerated of course so I dont know when you ll need this but it
makes more sense) maybe because it isnt a choice !
I dont know if you control the targeted webserver but the simplest solution
would be a cgi on this one that cuts/split/add the bits from your file(s)
before sending it...

If you dont really need all the clever features of curl and do only simple
gets, a simple program using sockets may be sufficient instead of using the
curl library ?
Depends on what do you need of course and I didnt test this pipelining
feature before in a simple open socket program (beware that your webserver
must handle this as well, IIS v5 has some bugs around that actually) but
coding a basic version like :

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    #now connect to the web server on port 80
    s.connect((domain, 80))
    s.send('GET ' + filePathName + ' HTTP/1.1\n')
    s.send('Host: '+domain+'\n') # to add if we use HTTP/1.1
    s.send('\n')
    while 1:
        data=s.recv(1024)
        l=len(data)
        if l==0: break
        # put some send headers here too, check size to cut your answers...
    s.close()

is not too difficult the problem is of course the number of features you
want to add (handle errors/timeouts with alarms etc...) but it is still no
too complex is it ?
____________________
<Gu>[W]ill[i]a<u>m<e>
http://www.lacde.net
----- Original Message -----
From: "Goswin von Brederlow" <brederlo_at_informatik.uni-tuebingen.de>
To: "libcurl development" <curl-library_at_cool.haxx.se>
Sent: Friday, November 26, 2004 1:28 PM
Subject: Re: libcurl question, Range support for http

> "Guillaume Arluison" <ga_at_lacde.net> writes:
>
> >> > for url in urls
> >> > do
> >> > curlObj->set(url)
> >> > curlObj->perform()
> >> > do whatever you want with the data retrieved
> >> > done
> >>
> >> Which sends one header, waits for the round robin, read all the
> >> data, send second header, wait for the round robin, read, send, wait,
> >> read, send, wait, read.
> >>
> >> Notice all that waiting? If I have to do 10000+ requests all that
> >> waiting accumulates to a severe performance penalty.
> > No actually I dont notice it sorry because it doesnt happen.
> > It depends what do you call 'round robin' but nevertheless with any
method
> > of load balancing you use (dns/cookie whatever): once the TCP connection
is
> > made on the chosen server it is still open for the next transactions
(what
> > you call send header/read data).
> > It is the purpose of the keepAlive thinggy and the difference between
curl
> > and lot of other similar get programs is that curl doesnt cut the
connection
> > between perform !
> >
> > So unless you have a weird and innefficient load balancer the pseud code
> > above will do :
> > Make a tcp connection/waits for your round robin
> > sends header
>
> 300ms for the header to travel throguh the net
> a few ms to parse and send the data
> 300ms for the data to travel back
>
> > read data
> > sends header
>
> another 600+ms wait
>
> > read data
> > ...
> >
> > until either your server close the connection because of internal
> > configuration / waited too much / client disconnect / network pbs.
(which
> > may happen if you have 10000+ requests) but curl will still be able to
> > recreate the connection when needed or r-e-u-s-e the previous one if it
is
> > available.
>
> Let do a little (exagerated) example. Requests are for blocks, which
> might be as little as 2K, and Range only allows a limited number or
> ranges in one request, say the server only accepts 10. So I can
> request 20K discontinous blocks per header.
>
> Now say I have the worst case of a 5GB dvd iso image and I need every
> other block. I need to send 5*1024*1024/20/2 = 131072 headers to the
> server.
>
> With a 0.6s round-trip delay for each header that is nearly 22 hours
> just waiting.
>
>
> I know a 2K blocksize for dvd isos is small but I hope this shows my
> point.
>
> >> Which sends one header, waits for the round robin, read all the
> >> data, send second header, wait for the round robin, read, send, wait,
> > Actually with curl or any other program that internally does what you
say
> > only one perfom with 10000+ files plz give us how http will handle this
in
> > only ONE header/read data, data, data, data.
>
> It doesn't.
>
> You just send the next header before the first one is finished.
>
> send header 1
> send header 2
> send header 3
> read reply 1
> send header 4
> read reply 2
> ...
>
> As stated in RFC-2086: http://www.freesoft.org/CIE/RFC/2068/52.htm
>
> | 8.1.1 Purpose [of persistant connections]
> |
> | # HTTP requests and responses can be pipelined on a connection.
> | Pipelining allows a client to make multiple requests without
> | waiting for each response, allowing a single TCP connection to be
> | used much more efficiently, with much lower elapsed time.
>
> > Dont mix HTTP layer and TCP layer.
>
> MfG
> Goswin
>

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.789 / Virus Database: 534 - Release Date: 09/11/2004
Received on 2004-11-26