Menu

#758 VMS-Alpha, http-upload not working

closed-fixed
5
2013-07-29
2008-08-18
No

On VMS-Alpha:
When using an http-file-upload the file is not sent to the Server with the correct content-length.
Sending a file with 511 or less bytes, content-length 512 is used.
Sending a file with 513 - 1023 bytes, content-length 1024 is used.
Files with a length of a multiple of 512 Bytes show the correct content-length. Only these files work for upload.

Test-Commands on VMS-Alpha, Linux (Suse) and Windows:
curl --form myfile=@test.ras --trace-ascii testras.log http://my_web_test_server/cgi-bin/upload.pl
curl --form myfile=@test.doc --trace-ascii testdoc.log http://my_web_test_server/cgi-bin/upload.pl

File test.doc has the size 21.504 (multiple of 512)
File test.ras has the size 5.686 (not a multiple of 512).

Using test.doc everything works.
Using test.ras the Content-Length in the LOG-File differs from the content length on Linux/Windows.
The receiving program (in my testcase "upload.pl") says that I have a post-data mismatch.

VMS shows:
002c: Content-Disposition: form-data; name="myfile"; filename="test.ra
006c: s"
0070: Content-Type: application/octet-stream
0098:
=> Send data, 6144 bytes (0x1800)

Linux + Windows show:
002c: Content-Disposition: form-data; name="myfile"; filename="test.ra
006c: s"
0070: Content-Type: application/octet-stream
0098:
=> Send data, 5686 bytes (0x1636)

My workaround: I wrote a perl-program to add some bytes to the file to have a multiple of 512 Bytes. The CGI-Program (upload.pl) then reduces the filesize back to the origin.

This workaround works for all files with all sizes - but the error in curl still remains.

Now I have to say, that I'm not a C-Programmer, so I kindly ask for help.

If you need a VMS-Alpha for testing (incl. Compiler, ...), please have a look at www.polarhome.com.
On VMS-Alpha: before you can use "curl" as a program you have to define a symbol that points to CURL.EXE.
curl = "$...[...]CURL.EXE"
Then "curl" can be startet as described above.

Used CURL-Version on all systems: 7.18.2.

Discussion

  • Daniel Stenberg

    Daniel Stenberg - 2008-08-19

    Logged In: YES
    user_id=1110
    Originator: NO

    As mentioned on the mailing list, we depend on people who have VMS and a compiler to debug this problem and work on a fix. Since nobody has stepped up yet, I'll most likely add this bug to the KNOWN_BUG list within soon and close this report.

     
  • Daniel Stenberg

    Daniel Stenberg - 2008-08-19
    • labels: --> http
    • milestone: --> portability_problem
    • assigned_to: nobody --> bagder
    • status: open --> open-later
     
  • Daniel Stenberg

    Daniel Stenberg - 2008-08-21
    • status: open-later --> closed-later
     
  • Daniel Stenberg

    Daniel Stenberg - 2008-08-21

    Logged In: YES
    user_id=1110
    Originator: NO

    Added to KNOWN_BUGS as item #57. If anyone has further details or considers working on a fix, I'm ready to reopen this issue again. Closed for now.

     
  • Harald Schwarz

    Harald Schwarz - 2008-09-05

    Logged In: YES
    user_id=2185003
    Originator: YES

    There's additional information for this problem at
    http://forums.itrc.hp.com/service/forums/questionanswer.do?threadId=1265138.

    short summary of the topic:

    VMS has several ways to store a file - with different "record formats".
    Using the record format "Fixed length 512 byte records", the file-size for the calculation of the "Content-Length" is detected correctly. During read-file -> send process it seems that CURL does ignore the FFP (EOF marker).
    A workaround for binary files is to change the record format before uploading the file with the command:
    SET FILE/ATTRIB=RFM=STMLF filename

    But: not all applications on VMS support Stream_LF record formats.
    I hope that anyone with VMS-skills can fix the bug.

     
  • Marty Kuhrt

    Marty Kuhrt - 2008-09-11

    This is pretty much the same problem as Known Bug #22 ( http://curl.haxx.se/bug/view.cgi?id=1156287 ). VMS, more often than not, does not treat files as a stream of bytes. It seems to force some of the C RTL routines to match the underlying record layout. Sometimes it is a good thing, sometimes not so much.

    I'm looking into it right now. This bug is platform independent, so it'll be in the Alpha, IA64 and VAX versions, as well.

    Since this was on the curl-user mailing list and not the curl-library one, I didn't see this problem until I tripped over it on ITRC. I don't follow the curl-user list.

    Marty

     
  • John Malmberg

    John Malmberg - 2013-07-15

    I just was made aware of this bug.

    On VMS, the size returned from the fstat() call is the size of the file on the disk, not the size of the file being read.

    In order to get the exact size of the file for all VMS file formats, it must be read() in until the end of file.

    A search of the code reveals the only call to fstat() that is likely used for a file transfer is in lib/file.c routine file_upload().

    The problem here seems to be that the information from fstat() is being used for the content-length field and is sent to the server before the entire file is read() and we know the true file size on VMS.

    Now we can easily look at some VMS specific fields in structure returned by fstat() on VMS and detect the conditions when the st_size may not be correct.

    And that leaves us with what to do when it may not. What we have to do is read the entire file to get the actual size. The quick and dirty solution is to simply read the file and toss the data away in a wrapper to stat().

    Otherwise we have to buffer the data for the file somewhere and so that the later read() can pull it in.

    There is a similar issue in gnu tar that if it is used to create a tar archive with a VMS binary file, the resulting archive will be corrupted and unreadable.

    Only the CPAN Perl Tar module has the correct code to save VMS binary files.

     
  • John Malmberg

    John Malmberg - 2013-07-15

    LION> newcurl --version
    curl 7.31.0 (IA64-HP-VMS) libcurl/7.31.0 OpenSSL/0.9.8w zlib/1.2.8
    Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smtp smtps telnet tftp
    Features: GSS-Negotiate IPv6 Largefile NTLM SSL libz

    Post upload of small files that are STREAM_LF are successful.
    Record format: Stream_LF, maximum 0 bytes, longest 0 bytes
    Record attributes: Carriage return carriage control

    STREAM_LF Tells native VMS tools to treat line feeds in the file as a record delimiter.

    Post upload of small file of the format below fails. The curl program hangs.

    Record format: Variable length, maximum 0 bytes, longest 5 bytes
    Record attributes: Carriage return carriage control

    Variable length means that each record of the file is preceded by a count that indicates how long the record is. No line feed or carriage returns are in the files, a line-feed is appended by the C runtime when the file is read in by the program. For these files, fstat() st_size is 1 byte larger per line than the file that is read in.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-07-15
    • labels: http --> http, VMS
    • status: closed-later --> open-confirmed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-07-15

    Thanks John, I'm opening this again then to see if we can work it out this time!

    Is there really NO other way than read()in the entire thing? Can't we perhaps fseek() and ftell() for example?

    Either way, I suggest we introduce this fix in lib/formdata.c as a wrapper function that returns the size of the file in bytes, which then in the VMS case will need to go through some extra hoops. You agree?

     
  • John Malmberg

    John Malmberg - 2013-07-15

    In this case fseek() has to operate by reading the file as the internal format is translated on the fly, so calling fseek() is the same amount of work as just reading the file.

    I just ran the hanging case in the VMS debugger and a breakpoint at lib/file.c file_upload did not get tripped.

    Interrupting the hang gets me to:

     r = select(3, empty struct/array, empty , empty, tv_sec=1)
    
     select        Curl_poll
     multi         curl_multi_wait
     easy          curl_easy_perform
     tool_operate  operate
     tool_main     main
    

    So the hang may be a different bug than this. I will have to look at this more later as it is clear that curl in wait loop, so I have missed where I need to set the break point.

    The server in this case is Apache 2.0 running on OpenVMS Alpha 8.4.

     
  • John Malmberg

    John Malmberg - 2013-07-16

    I found the stat() call in this case from the AddFormData case.

    The stat structure from VMS has some additional fields. The fields that are of most interest in this case are:

       st_fab_rfm:  Record format - see fabdef.h fragment below.
       st_fab_rat:  Record attributes.
    
    From <fabdef.h>
    #define FAB$C_UDF 0               /* undefined (also stream binary)   */
    #define FAB$C_FIX 1               /* fixed length records             */
    #define FAB$C_VAR 2               /* variable length records          */
    #define FAB$C_VFC 3               /* variable fixed control           */
    #define FAB$C_STM 4               /* RMS-11 stream (valid only for sequen♦
    #define FAB$C_STMLF 5             /* LF stream (valid only for sequential♦
    #define FAB$C_STMCR 6             /* CR stream (valid only for sequential♦
    

    So basically if st_fab_rfm is one of 0, 5, 6, the st_stat size can be used.

    For the other values, st_stat size can not be trusted for most file type that would be uploaded, it will be larger than the actual size.

    While FAB$C_FIX might seem safe, it is not because the st_size is apparently rounded up to whole blocks and not adjusted for any partial block at the end.

    The only way to get the exact size as seen from a C program is to read in the files. For a quick fix, that would be to read in the file, discarding the data. While it would seem wasteful, it would actually warm the caches used in the read operation that is going to follow.

    So would you prefer a wrapper to stat() or something like:

    curl_off_t get_vms_file_size(file) {
        /* Read and discard the data */
    }
    
    ....
    
    #ifdef __VMS
    
        switch(stat.st_fab_rfm)
        {
            case 0, 5, 6:
                  break;
            default:
                  *size += get_vms_file_size()
        }
    #endif
    

    Now I saw fstat() calls in a few other modules, so is similar treatment needed form them?

    The implications of this work around also need to be added to the VMS readme. file.

     
  • John Malmberg

    John Malmberg - 2013-07-16

    I am looking through the source to see the impact of working around this and if there are related bugs.

    Generically wrapping stat() is not good because there are several places that stat() is called where the exact file size is not needed, and the workaround has performance impact.

    In file.c the call to fstat(fd, ...) is affected by this bug. Because on VMS, the open() done before the call to fstat() locks the file, doing a second open() to read the actual size of the file will fail.

    On VMS, we would have to test state.resume_from before the fd = open(file->...) and then calculate the true path and then use that value instead of the fstat() call later.

    I do not have a reproducer for this behavior.

    In tool_operate.c, for the stat(outfile, ...) section on VMS, additional arguments can be passed to fopen() to force the file format to be the needed STREAM_LF format.

    By default if the output file does not exist, it will be created in STREAM_LF format and all will be well.

    If the file exists already a new file with the same name will be in the same format as the original, and there is only a problem if that file is not in a STREAM_LF or STREAM_CR format.

    For this a reference in the VMS readme file about what the behavior should be the only thing needed as how many times will someone be using curl to replace a VMS format file in a download? I do not have a reproducer for this issue.

    Later in tool_operate.c the O_BINARY is a noop on VMS, so the file is opened in "cooked" mode with the record formats translated for text files. The size from fstat() will have the same issue as in this bug. I do not have a reproducer to verify a fix for this issue.

    In tool_getparam.c, the st_size member is not referenced, so no VMS specific issues are present.

    So it looks like somewhere in lib we need a:

    curl_off_t get_vms_file_size(const char , struct stat ) that can return both the true file size and optionally a stat structure that can be called as needed instead of stat(), or before the open() as above.

    So the questions are what name for the function, what module to put it in, and how to patch in the calls to it.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-07-16

    For the formdata, I was thinking about a specific stat() wrapper solely for the purpose of getting the file size for multipart formposts, not replacing any other stat use that possibly may exist in the code.

    Like

    static curl_off_t filesize(file) {
    #ifdef VMS
     ...
    #else
     ...
    #endif
    }
    
     
  • John Malmberg

    John Malmberg - 2013-07-18

    I am planning on trying to fix the formdata issue first and then looking at the other cases.

    What I am looking at is writing a VmsFileSize() routine and then using a macro so as not to disrupt the existing logic in the code that is checking to see if it is a directory not a file.

    I need the filename, as I can not easily reconstruct the filename from the stat buffer on VMS.

    #ifndef __VMS
    #define filesize(name, stat_data) (stat_data.st_size)
    #else
    #define filesize(name, stat_data) VmsFileSize(name, &stat_data)
    #endif
    ~~~~~~
    
       **. . .**
    
    ~~~~~~
          if(!strequal("-", newform->line)) {
            struct_stat file;
            if(!stat(newform->line, &file) && !S_ISDIR(file.st_mode))
    /**/      *size += filesize(newform->line, file)
            else
              return CURLE_BAD_FUNCTION_ARGUMENT;
          }
    

    I also should be able to get the size of (st_fab_rfm == FAB$C_FIXED) with out reading the in the file if my tests show that it stat() on VMS is returning the wrong value.

    The large file uploads should typically be either FAB$C_FIXED or FAB$C_STMLF so there should be no penalty on them. ZIP archives created on VMS are FAB$C_FIXED format. So should disk images.

    The variable length formats will typically be text files and should not be too large, so reading them twice should not be a significant slowdown.

     
  • John Malmberg

    John Malmberg - 2013-07-19

    I did some tests with some typical VMS text files and a simple ZIP file.

    The binary read is what the formdata module would read from the file.

    This indicates that just fixing the file size will not solve the problems on VMS.

    VMS_ROOT:[curl.packages.vms]dcl_test.test;1
    hello
    
    VMS_ROOT:[curl.packages.vms]vms_text.test;1
    
    5char
    
    VMS_ROOT:[curl.packages.vms]hello.test;1
    
    hello
    
    EAGLE> stat_test test.zip
    size:       160
    st_fab_rfm: 1
    st_fab_rat: 0
    st_fab_fsz: 0
    st_fab_mrs: 512
    binary read:   512
    normal read:   512
    

    Binary file - fixed record size.
    stat appears to be correct but read is reading entire last block instead of up to the EOF count.

    EAGLE> stat_test dcl_text.test
    size:       10
    st_fab_rfm: 3
    st_fab_rat: 4
    st_fab_fsz: 2
    st_fab_mrs: 0
    binary read:   7
    normal read:   6
    

    The "normal" read is what the recipient is probably expecting, not a binary read. Normal read is "hello\n". Not sure what binary read has for last two characters.

    EAGLE> stat_test vms_text.test
    size:       8
    st_fab_rfm: 2
    st_fab_rat: 2
    st_fab_fsz: 0
    st_fab_mrs: 0
    binary read:   5
    normal read:   6
    

    Again, the "normal" read is what the recipient is probably expecting. The binary read omits the "\n" from the record.

    EAGLE> stat_test hello.test
    size:       6
    st_fab_rfm: 5
    st_fab_rat: 2
    st_fab_fsz: 0
    st_fab_mrs: 0
    binary read:   6
    normal read:   6
    

    The one case that always works, STM_LF.

     
  • John Malmberg

    John Malmberg - 2013-07-19

    Updating the test program to show the results of forcing fopen to be:

       file = fopen(argv[1], "r", "rfm = stmlf", "ctx = stm");
    

    Results in the STM_LF and the FIXED record cases to now read the files as expected. STM_CR should also work, but the resulting file will have \r as the record delimiter instead of \n, and when the file is read back on VMS, it will be created as a STM_LF file, so will become unreadable.

    The above results in an exact binary copy of the file in all cases, but the VMS specific record attributes are lost, making the file unusable on VMS except for the STM_LF case. For the FIXED size, it is usually not too hard to repair the attributes.

    So for STM_CR the fopen below works in a DO WHAT I MEAN mode as below, with out the "b" to allow the record translation.

       file = fopen(argv[1], "r");
    

    It does not work for the variable file organizations used for text files, we need the fopen above, but need to read the files that way once to get their correct size instead of relying on stat st_size.

    This then brings up a limitation of fread on VMS. fread on VMS will read in a maxium of 65536 bytes or the size of a record for the above fopen call. For "ctx=stm", only the 65536 limit applies.

    On VMS, multiple fread calls are needed in a loop to get around this limitation to attempt to read the desired amount of data.

     
  • John Malmberg

    John Malmberg - 2013-07-23

    A preliminary patch has been uploaded to the curl-library mailing list that fixes this issue.

     
  • Daniel Stenberg

    Daniel Stenberg - 2013-07-29
    • status: open-confirmed --> closed-fixed
     
  • Daniel Stenberg

    Daniel Stenberg - 2013-07-29

    Thanks, merged and pushed as commit db2deba6b4b. Case fixed and closed!