cURL / Mailing Lists / curl-users / Single Mail

curl-users

RE: Commonalities between cURL instances

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Fri, 8 Mar 2002 08:31:36 +0100 (MET)

On Thu, 7 Mar 2002, David Burley wrote:

> I am connecting to a web site that detects connections and will block you
> if you try to connect more then once. I have found that if I run cURL in a
> loop using a shell script it runs forever without them caring, as you only
> have one connection.
>
> If you start multiple instances, they detect you and display that you have
> connected more then once. However, if you start up another browser
> (Internet Explorer, as I am running it in Cygwin), they only detect the
> instances of cURL as having more than one on their site and treat the
> instance of Internet Explorer independently of cURL. If I was to open
> multiple instances of IE they detect it and display that you are connected
> more then once.
>
> I know from looking at what I am posting to them "curl -v" that the post
> data, the unique values anyways, are unique for each and every instance.
> The cookie files are also being stored independently of one another and are
> being deleted after each round of requests.
>
> Thus, I come to the conclusion that something must be common between the
> instances of cURL. Else, I would think that they would detect multiple
> instances with one copy of cURL and one copy of Internet Explorer open and
> they do not. Could the commonalities be coming from the SSL implementation
> (in other words... not cURL or libcURL's problem)?

Actually, I fail to see how this is a problem at all. I think this is just a
matter of the site having a stupid check that doesn't work for generic
user-agents.

Of course, I can't rule out the posssibility that OpenSSL somehow has
something that looks similar between the requests, I don't know much about
OpenSSL internals to tell.

It might also be just this, that when you invoke curl to get single pages,
it'll close the connection really quick after it has performed its duties,
while when you get page with IE it'll keep the connection open for a while
afterwards. It makes the curl fetches very quick and thus when making
repeated requests with two different shell loops, it is likely that they can
run doing single-shots withot them occurring at the exactly same time.

How do the site detect multiple simultaneous connects anwyay? It would
probably check for the same remote IP address for more than one TCP connect,
and then I can't see how curl can circumvent that!

> Is there an easy hack to force it to create and kill connections with each
> request.

Each invoke of curl, the command line tool, will do just that. It can't keep
conections around or alive after it exits. Only if you specify multiple URLs
on the same command line will it try to re-use already opened connections.

When using libcurl, you can force single requests to not re-use an existing
connection and you can force connections to not be re-used afterwards too.

> I'll try to hack on that tomorrow and see if I can't code it... I haven't
> dealt much with this kind of programming but I can probably figure it out..
> maybe you can tell me if it is possible and then I'll try to implement it
> for myself. Would be good practice either way.

Well, I hope this mail will give you some further input and understanding. I
hope I didn't just completely misunderstand the whole issue! ;-)

-- 
    Daniel Stenberg -- curl groks URLs -- http://curl.haxx.se/
Received on 2002-03-08