cURL / Mailing Lists / curl-library / Single Mail

curl-library

RE: libcurl + https + multi = lost information

From: REISS Pierre <Pierre.REISS_at_3ds.com>
Date: Thu, 27 Mar 2008 14:58:03 +0100

Dear developers,

On Wed, 26 Mar 2008, Daniel Stenberg wrote:

> Here' s your write callback (...)
> What is this for loop and if condition doing in there and why?

    (See the Callback routine "CallBaPR")
    You are absolutely right, the for and if are remnants of the
complete
program, where I addressed the FILE structure internals in an attempt
to go faster.
(I know that my used URLs are about 80 to 100 Kbytes big, so a setvbuf
 with 256 Kbtes was big enough to deal with, without a lot of memcpy's
 between memory, buffer and files. This handling is removed here.)
Now, only remain the fwrite and the return.

> What is the Door[] array's purpose? It seems (ab)used in strange
ways
> in the app.

  1 _ Each CURL* in TabCurl[2] is linked through setopt(WRITEDATA) to a
      specific FILE* in the array Door[2].
      (See the function "CurlInit").
      By setopt(URL), each CURL* is also linked to each URL in the
      array CurUrl[2].

  2 _ Somewhere, during the time curl_multi_perform is doing it's job,
      the callback routine is invoked several times, but each time
      with the appropriate void* Stream, which is in fact Door[0] or
Door[1].
      (I checked the values of Door[0], Door[1] and Stream).

      My fwrite fills then the file Door[0] with the information coming
      from the URL CurUrl[0]. And the same correspondance exists between
      Door[1] and CurUrl[1].

> Now what is that if() expression for and what do you really expect
> the break instruction will add to this?

  3 _ When curl_multi_info_read is not zero, it points to a CURLMsg
      structure, in relation with a completed single handle's transfer.
      I compare then the related "easy_handle" to each item in
TabCurl[],
      in order to identify which URL GET was completed.

      Logically, after the break in the little for, Idx may be only 0 or
1.
      This value is passed to "Extrait" and "FillBuffer": essentially,
      fflush(Door[Idx]) creates the file Temp000 or Temp001.

      One may stop there, but the check of Temp000 and Temp001 would be
      painful. So, in the remainder of "Extrait" and "FillBuffer",
      I added a full read in Buffer and the recognition of the pattern
      ")/20" associated with 8 blanks beginning the line.
      ("ReadString" scans Buffer, giving 1 call per line,
       NbRead characters are useful in the remainder of Buffer,
       Utile characters are useful in a line,
       Champ is the pointer on the next line,
       '\n' is locally replaced by '\0' for later correct string
handling).

      I tried to provide a condensed code, removing many error handling
      protections. Perhaps too much?

  Is there a mistake in my understanding?
  If so, please do not consider what follows, and I'll will make one's
  apologies to you.

  As a complement of information, I just will describe the begin of
  the program.
  The curl_easy_perform on the initial URL extracts information.
  With that, the "VeriSign" function prepares then 2 other URL names,
  ready to be accessed through https.
  This part of code cannot be avoided.

> The example code is still very complicated and I can't even
understand
> half of
> all magic going on in there so it is hard to see if or why or when
> anything goes wrong - and if libcurl really is to blame.

  The initial URL, when accessed by Internet, displays essentially
  a gray array of 8 lines (2 for "Paris", 4 for "USA", 2 for "Europe").
  With "VeriSign", I want to target the 2nd and the 3rd of those 8
lines.

  My 2 URLs are exactly what you get (today) when you click on:
        "CAC 40 INDEX FUTURE (FCE)/200805" (2nd line)
  and
        "NASDAQ-100 (ND) - CME (FLOOR)/200806" (3rd line)

  For instance, the URL corresponding to
        "CAC 40 INDEX FUTURE (FCE)/200805"
  displays an orange colored area with
        "Actions Warrants Certificats (...)",
  and the next line repeats the title
        "CAC 40 INDEX FUTURE (FCE)/200805".

  Now my sample program would normally associate Temp000 with the
  contents of this page. And this title appears line 1257 of Temp000.
  And "NASDAQ-100 (ND) - CME (FLOOR)/200806" would normally appear
  line 1257 of Temp001.
  Sometimes (50% of probability), the SAME title appears line 1257
  of both files! This is what goes wrong.
  I checked it again just one minute ago.

  Except the callback routine, it seems difficult to shorten the sample
  code further.

> As long as you keep sending us very complicated examples stuffed
with
> *application* errors I really can't see any libcurl errors.

  Ouch! I have to meet with the question: why it works here and not
elsewhere?
  I feel really disappointed.

  Thanks for all,
    Pierre Reiss
Received on 2008-03-27