cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: FTP third party transfer (proxy) support.

From: Daniel Stenberg <daniel-curl_at_haxx.se>
Date: Thu, 27 May 2004 17:25:06 +0200 (CEST)

On Thu, 27 May 2004, Alexander Krasnostavsky wrote:

> When you are planning to release 7.12.0?

If nothing else comes up, early next week. Your patch is too big a change for
me to add this late before a release. I'd rather have us smoothen out the
quirks over a slightly longer period.

> > * You always set the "primary" (is that the source?) connection to use
> > PASV and the second to use PORT. Won't people want that
> > configurable?
>
> As far as I know, both remote hosts should support third party transfer,
> meaning both of them support passive mode.

Well, both would probably support PASV at ftp-level, true. But... What if -
for example - one of those hosts are behind a firewall that makes it harder to
use PASV on? Then you'd want to use PASV on the _other_ host and use PORT on
the firewalled host...

> So I don't think it is important which one is PASV and which one is PORT.

But the network architecture might dictate rules we cannot control.
>
> > * At almost the same spot as the above mentioned code, you clone a
> full
> > 'SessionHandle' with memcpy(). That is major badness
>
> But this is exactly what I mean. I want all the data from the primary
> connection to be copied to the secondary, and only three members should be
> changed: source host, source port and source user:password. It means that
> except these three members all the data should be the same, like file data
> type. I did it because I am not sure what data from 'SessionHandle' is
> relevant for FTP. Indeed the relevant data can be copied field by field, but
> using memcpy(), the data will be cloned always, even when new fields will be
> added in future. You can change it in future if you decide that another way
> is better.

I understand why you did it this way and I'm not saying I can cough up an easy
way to avoid doing it this way.

But, allow me to take another hypothetical case:

1 we duplicate a sessionhandle and
2. we call Curl_connect_host()
3. stuff is allocated and setup within Curl_connect_host()
4. something fails, we need to bail out

How do we know what datain the duplicated struct that we must clean up after
this failure? We must only free() data that was allocated after the cloning...

> > * Code-wise, I think it would make sense to call the second
> connection something else than 'ftp_source_host'
>
> There's something in what you say, but the reason I called it "source" is
> because it is not just a connection to a host, but because it is an
> interaction between two remote hosts, where one of them source and another
> target. I can't see how other protocols can use two connections without
> interaction between the hosts; otherwise, still one of them will be the
> source.

I'm not arguing about the 'source' part but about the 'ftp' part in
protocol-agnostic code. If another protocol would add this support, you'd have
source and target, yes, but the host wouldn't be FTP.

> Another solution is to go back to the original word (that used in FTP CLI)
> "proxy" ('ftp_proxy_host').

I'm strongly against the use of the word "proxy" in this context as I believe
that in modern network terminology (at least where I live) proxy means
something else.

> > * sec_conn is a struct field within the connectdata struct. I find
> that
> > hard to make sense. Besides, how does this all affect connection
> re-use?
>
> In third party transfer mode I made the "connectdata struct" one entity,
> because this is the only way to pass sec_conn with the original conn to all
> functions without changing the interface.

It isn't the _only_ way, but it is one way. And since the second connection is
related to that connection it could make sense to have this pointer like this.

> It is not affect connection reuse because connection itself created by the
> same function. I test it by performing regular transfer (using the same host
> and user) after the third party transfer, and I saw that the connection was
> reused.

Nice!

> As I mentioned in the previous email, all the options have the same
> definitions as the original one (without "SOURCE") except
> CURLOPT_FTP_SOURCE_HOST, which is the same as a host name in url option.

I know you did, but I want them documented in the curl_easy_setopt.3 man page!
;-)

> > Have you given any thoughts on the ability to write/create test cases
>
> Actually I thought that it will be easy to somebody, who knows the curl test
> suit well, to write test cases using the example I send (which is actually
> includes test cases).

Well, as long all existing test cases run fine I won't stress this point, but
there is no "somebody" like that in this project, there is only we who are
here and we all already struggle hard to find time to do whatever we do.

[command line options]

Yeps. As soon as we get everything together and merged in, I'll try to get
time off to do my part and add the command line options for this.

> > * I don't see where/how the sec_conn connection is closed down after
> use.
> > Have you tried the README.memoryleak approach to verify that you
> are
> > doing all the cleanups nicely?
>
> The sec_conn connection is closed as any other connection when
> disconnecting.

Ok, I simply must've missed that when I read through the code.

> Until then the connection can be reused. Mixing third party transfer and the
> regular one is possible. I don't familiar with README.memoryleak approach,
> but I will try it.

It is very easy, just run configure with --enable-debug, set the necessary
environment variable and you're set to verify that your additions don't leak
memory.

> > * Regarding your changes to Curl_debug() to include the host name as
>
> Regarding the debug callback: it was already different from the internal
> function by the following line:
> fwrite(s_infotype[type], 2, 1, data->set.err);

Not quite true. That same info is provided to the callback so it can in fact
do the exact same output as the library does by default.

> So one solution, as you said, is to change the debug callback, but then it
> won't be backward compatible.

I ruled out that option! ;-)

> The second solution is to prepare a temporary string that will contain
> final output for both of them. And therefore it will be compatible in
> future when we will need to add some more information.
> I think that the second one is better.

I don't follow you here, can you elaborate?

What about if we (in the 3rd party case) just provide CURLINFO_TEXT chunks
with "host %s sends this:" or "host %s receivies this:" before the actual
data/header callbacks are called?

-- 
     Daniel Stenberg -- http://curl.haxx.se -- http://daniel.haxx.se
      Dedicated custom curl help for hire: http://haxx.se/curl.html
Received on 2004-05-27