cURL / Mailing Lists / curl-library / Single Mail

curl-library

POP3 returns line data and CRLF separately, drops final CRLF

From: Rich Gray <rgray_at_plustechnologies.com>
Date: Wed, 8 Feb 2012 13:10:10 -0500

As noted in a previous e-mail (Jan 31, "State of POP3 in curl?"), I'm
working on a prototype POP3 download program utilizing libcurl. I've got
my part of that prototype pretty much completed, but have noticed a couple
of anomalies:

1. Libcurl is returning message data line-by-line, with two callbacks per
line - one for the line data and the other for the CRLF. This seems like
strange behavior. I'd coded as if I were getting the data off a TCP
connection - might get one byte, might get the whole message in one shot,
might get anything between. So I'm curious as to what the intent is here.
If it's going to return line at a time, it would be nice to get the line
with the CRLF in one callback. If, as a function of the dot de-stuffing,
libcurl returns whole chunks of message data on CRLF boundaries, that
would be fine too. I can deal with full, unaligned chunks of data too.
For the moment, I'm not going to consider any sort of alignment
entitlement.

2. Libcurl is dropping the final CRLF from the data. Although it can be
coped with, this seems wrong. E-mail messages are always CRLF terminated
lines. Not getting the final CRLF leaves a hanging, incomplete, line. I
think this might be a mis-interpretation of RFC 1939, section 3 - Basic
Operation:

   Responses to certain commands are multi-line. In these cases, which
   are clearly indicated below, after sending the first line of the
   response and a CRLF, any additional lines are sent, each terminated
   by a CRLF pair. When all lines of the response have been sent, a
   final line is sent, consisting of a termination octet (decimal code
   046, ".") and a CRLF pair. If any line of the multi-line response
   begins with the termination octet, the line is "byte-stuffed" by
   pre-pending the termination octet to that line of the response.
   Hence a multi-line response is terminated with the five octets
   "CRLF.CRLF". When examining a multi-line response, the client checks
   to see if the line begins with the termination octet. If so and if
   octets other than CRLF follow, the first octet of the line (the
   termination octet) is stripped away. If so and if CRLF immediately
   follows the termination character, then the response from the POP
   server is ended and the line containing ".CRLF" is not considered
   part of the multi-line response.

I think the libcurl implementation has keyed off the "CRLF.CRLF" sentence
in the middle of this paragraph, whereas the final sentence clearly states
that the final ".CRLF" is not part of the data. By implication, the
immediately preceding CRLF of the last line is part of the data. Or, it's
just a bug! ;P

Using this write callback routine for a LIST command,

 size_t pop_list_data(char *ptr, size_t size, size_t nmemb, void
*userdata)
 {
    int bytes = (int)(size * nmemb);
    int *num_msgs = userdata;
    int n;

    printf("list >%.*s<\n", bytes, ptr);
    if (*ptr >= '0' && *ptr <= '9')
       if ((n = atoi(ptr)) > 0)
          *num_msgs = n;
    return bytes;
 }

with two messages in the mailbox, I get:

> LIST
< +OK 2 messages (31754 octets)
list >1 16050<
list >
<
list >2 15704<
* Connection #0 to host XXXXXX left intact

which shows both issues 1 & 2. (Yes, I did shamelessly take advantage of
the line-by-line data return for the prototype. ;) This will be redone in
a final version anyway, using STAT or UIDL.)

Cheers!
Rich
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-library
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2012-02-08