Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701

Closed
wants to merge 3 commits into from

Conversation

emanuele6
Copy link
Contributor

@emanuele6 emanuele6 commented Apr 12, 2022

This loop was using the number of bytes read from the file as condition to keep reading.

From Linux's fread(3) man page:

On success, fread() and fwrite() return the number of items read or
written. This number equals the number of bytes transferred only when
size is 1. If an error occurs, or the end of the file is reached, the
return value is a short item count (or zero).

The file position indicator for the stream is advanced by the number
of bytes successfully read or written.

fread() does not distinguish between end-of-file and error, and
callers must use feof(3) and ferror(3) to determine which occurred.

This means that nread!=0 doesn't make much sense as an end condition for the loop: nread==0 doesn't necessarily mean that EOF has been reached or an error has occured (but that is usually the case) and nread!=0 doesn't necessarily mean that EOF has not been reached or that no read errors have occured. feof(3) and ferror(3) should be uses when using fread(3).

Currently curl has to performs an extra fread(3) call to get a return value equal to 0 to stop looping.

This usually "works" (even though nread==0 shouldn't be interpreted as EOF) if stdin is a pipe because EOF usually marks the "real" end of the stream, so the extra fread(3) call will return immediately and the extra read syscall won't be noticeable:

bash-5.1$ strace -e read curl -s -F file=@- 0x0.st <<< a 2>&1 |
> tail -n 5
read(0, "a\n", 4096)                    = 2
read(0, "", 4096)                       = 0
read(0, "", 4096)                       = 0
http://0x0.st/oRs.txt
+++ exited with 0 +++
bash-5.1$

But this doesn't work if curl is reading from stdin, stdin is a terminal, and the EOF is being emulated using a shell with ^D. Two consecutive ^D will be required in this case to actually make curl stop reading:

bash-5.1$ curl -F file=@- 0x0.st
a
^D^D
http://0x0.st/oRs.txt
bash-5.1$

A possible workaround to this issue is to use a program that handles EOF correctly to indirectly send data to curl's stdin:

bash-5.1$ cat - | curl -F file=@- 0x0.st
a
^D
http://0x0.st/oRs.txt
bash-5.1$

This patch makes curl handle EOF properly when using fread(3) in file2memory() so that the workaround is not necessary.

Since curl was previously ignoring read errors caused by this fread(3), ferror(3) is also used in the condition of the loop: read errors and EOF will have the same meaning; this is done to somewhat preserve the old behaviour instead of making the command fail when a read error occurs.

@emanuele6 emanuele6 force-pushed the fixeof branch 2 times, most recently from 69ea456 to 80ad695 Compare April 12, 2022 20:33
@emanuele6
Copy link
Contributor Author

emanuele6 commented Apr 12, 2022

Note that this patch doesn't handle read errors caused by fread(3), it simply ignores them as it did before.

Should curl fail when a read error occurs here?

To handle read errors we could add an if(ferror(file)) branch in the code and handle the error somehow, but it's unclear how these errors should be handled since curl was previously ignoring them.

One possible way to handle the read errors caused by these reads is to make curl fail and return 26 ("Read error. Various reading problems."), but this may be a breaking change.

One way to implement that is to add a new PARAM_READ_ERROR ParameterError and make file2memory return that when a read error occurs; i.e.:

do {
  char buffer[4096];
  nread = fread(buffer, 1, sizeof(buffer), file);
  if (ferror(file))
    return PARAM_READ_ERROR;
  if(nread)
    if(curlx_dyn_addn(&dyn, buffer, nread))
      return PARAM_NO_MEM;
} while(!feof(file));

@emanuele6 emanuele6 force-pushed the fixeof branch 6 times, most recently from 5d0ffea to 367ceb0 Compare April 12, 2022 21:30
This loop was using the number of bytes read from the file as condition
to keep reading.

From Linux's fread(3) man page:
> On success, fread() and fwrite() return the number of items read or
> written. This number equals the number of bytes transferred only when
> size is 1. If an error occurs, or the end of the file is reached, the
> return value is a short item count (or zero).
>
> The file position indicator for the stream is advanced by the number
> of bytes successfully read or written.
>
> fread() does not distinguish between end-of-file and error, and
> callers must use feof(3) and ferror(3) to determine which occurred.

This means that nread!=0 doesn't make much sense as an end condition for
the loop: nread==0 doesn't necessarily mean that EOF has been reached or
an error has occured (but that is usually the case) and nread!=0 doesn't
necessarily mean that EOF has not been reached or that no read errors
have occured. feof(3) and ferror(3) should be uses when using fread(3).

Currently curl has to performs an extra fread(3) call to get a return
value equal to 0 to stop looping.

This usually "works" (even though nread==0 shouldn't be interpreted as
EOF) if stdin is a pipe because EOF usually marks the "real" end of the
stream, so the extra fread(3) call will return immediately and the extra
read syscall won't be noticeable:

    bash-5.1$ strace -e read curl -s -F file=@- 0x0.st <<< a 2>&1 |
    > tail -n 5
    read(0, "a\n", 4096)                    = 2
    read(0, "", 4096)                       = 0
    read(0, "", 4096)                       = 0
    http://0x0.st/oRs.txt
    +++ exited with 0 +++
    bash-5.1$

But this doesn't work if curl is reading from stdin, stdin is a
terminal, and the EOF is being emulated using a shell with ^D. Two
consecutive ^D will be required in this case to actually make curl stop
reading:

    bash-5.1$ curl -F file=@- 0x0.st
    a
    ^D^D
    http://0x0.st/oRs.txt
    bash-5.1$

A possible workaround to this issue is to use a program that handles EOF
correctly to indirectly send data to curl's stdin:

    bash-5.1$ cat - | curl -F file=@- 0x0.st
    a
    ^D
    http://0x0.st/oRs.txt
    bash-5.1$

This patch makes curl handle EOF properly when using fread(3) in
file2memory() so that the workaround is not necessary.

Since curl was previously ignoring read errors caused by this fread(3),
ferror(3) is also used in the condition of the loop: read errors and EOF
will have the same meaning; this is done to somewhat preserve the old
behaviour instead of making the command fail when a read error occurs.
@@ -97,7 +97,7 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file)
if(nread)
if(curlx_dyn_addn(&dyn, buffer, nread))
return PARAM_NO_MEM;
} while(nread);
} while(!feof(file) && !ferror(file));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we run into ferror() wouldn't it be better to actually handle it? How about something like this?

diff --git a/src/tool_helpers.c b/src/tool_helpers.c
index 9244d1fb5..967943ad8 100644
--- a/src/tool_helpers.c
+++ b/src/tool_helpers.c
@@ -72,6 +72,8 @@ const char *param2text(int res)
     return "showing headers and --remote-header-name cannot be combined";
   case PARAM_CONTDISP_RESUME_FROM:
     return "--continue-at and --remote-header-name cannot be combined";
+  case PARAM_FILE_ERROR:
+    return "error encountered in file operation";
   default:
     return "unknown error";
   }
diff --git a/src/tool_paramhlp.c b/src/tool_paramhlp.c
index 273805ef0..56266a823 100644
--- a/src/tool_paramhlp.c
+++ b/src/tool_paramhlp.c
@@ -97,7 +97,9 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file)
       if(nread)
         if(curlx_dyn_addn(&dyn, buffer, nread))
           return PARAM_NO_MEM;
-    } while(nread);
+      if(ferror(file))
+        return PARAM_FILE_ERROR;
+    } while(!feof(file));
     *size = curlx_dyn_len(&dyn);
     *bufp = curlx_dyn_ptr(&dyn);
   }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielgustafsson yeah, i totally i agree with that, i suggested basically the same thing in my comment here #8701 (comment).

I wasn't sure if we actually wanted to do that, because i was afraid of changing the behaviour to much, but i guess, you never actually want to ignore read errors in this case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think this changes the behavior in any breaking way.

Copy link
Contributor Author

@emanuele6 emanuele6 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed the other patch i wrote yesterday (renaming the ParamaterError from PARAM_READ_ERROR to PARAM_FILE_ERROR and using your error message).

I placed the if(ferror(file)) check right after the fread() instead of at the end of the loop because I think it makes more sense there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the fread manpage:

RETURN VALUES
The functions fread() and fwrite() advance the file position indicator for the stream by the number of bytes read or written. They return the number of objects read or written.
If an error occurs, or the end-of-file is reached, the return value is a short object count (or zero).

"short object count" to me sounds like fread() can raise an error while still having read data (at least in some cases/implementations). In the end it wont matter since we abort and return with an error, but I wonder if we should still persist any read data to aid debugging forensics.Thougths?

Copy link
Contributor Author

@emanuele6 emanuele6 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short object count" to me sounds like fread() can raise an error while still having read data (at least in some cases/implementations).

I also interpret the manpage that way.

In the end it wont matter since we abort and return with an error, but I wonder if we should still persist any read data to aid debugging forensics.Thougths?

As fair as i can tell, curl is not doing that in other places where reading fails (for example the snipped mentioned below), and I don't know how that could be useful to be honest.


I spotted some code that is checking ferror manually after calling file2memory in src/tool_formparse.c: If we are going to make file2memory not ignore fread errors, I think that should be refactored to use the new file2memory return values instead of assuming != PARAM_OK means PARAM_NO_MEM (otherwise behaviour would actually change):

curl/src/tool_formparse.c

Lines 128 to 143 in b7c0bd6

if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {
/* Out of memory. */
return m;
}
if(ferror(stdin)) {
result = CURLE_READ_ERROR;
Curl_safefree(data);
data = NULL;
}
else if(!stdinsize) {
/* Zero-length data has been freed. Re-create it. */
data = strdup("");
if(!data)
return m;
}

I will push another commit to fix that in a bit.

Copy link
Contributor Author

@emanuele6 emanuele6 Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danielgustafsson having seen that code in tool_formparse.c, i think we should maybe consider dropping the last two commits and only keep the first since curl's source code seems to be checking ferror() errors manually after running file2memory(). What do you think?

EDIT: Well, i guess it would also be fine to keep the last commits since that seems to be the only place where the code that handles file2memory()'s return value needs to be changed.

Copy link
Contributor Author

@emanuele6 emanuele6 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed we didn't need to free dyn when returning PARAM_FILE_ERROR since the file2memory() function didn't explictly do it in case of an OOM error, but I just checked inside the curlx_dyn_addn (Curl_dyn_add) and saw that it is implicitly freeing dyn when it's OOM.

I will change the second commit to handle that.

I think I should also set *bufp to NULL and set *size to 0 in that case, but i will probably need to check and change more places that call file2memory() if i do that.

Copy link
Contributor Author

@emanuele6 emanuele6 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but i will probably need to check and change more places that call file2memory() if i do that.

It looks like I don't need to change the code around any other file2memory() call even with that change. (I searched for uses of file2memory() using git grep '\bfile2memory\b' '*.[ch]'.)

I will change the second commit to handle that.

I also changed the third commit since now it's not necessary to call free on data and set it to NULL; actually the code I was using before for the third (see note below) was wrong because the assertion in Curl_safefree would always have failed in case of an error since *bufp = curlx_dyn_ptr(&dyn); in file2memory() would never have been run if the function returns PARAM_FILE_ERROR.

previous version of the third commit

diff --git a/src/tool_formparse.c b/src/tool_formparse.c
index 6e9c5d6a0..2657db556 100644
--- a/src/tool_formparse.c
+++ b/src/tool_formparse.c
@@ -125,21 +125,21 @@ static struct tool_mime *tool_mime_new_filedata(struct tool_mime *parent,
     else {  /* Not suitable for direct use, buffer stdin data. */
       size_t stdinsize = 0;
 
-      if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {
+      switch(file2memory(&data, &stdinsize, stdin)) {
+      case PARAM_NO_MEM:
         /* Out of memory. */
         return m;
-      }
-
-      if(ferror(stdin)) {
+      case PARAM_FILE_ERROR:
         result = CURLE_READ_ERROR;
         Curl_safefree(data);
         data = NULL;
-      }
-      else if(!stdinsize) {
-        /* Zero-length data has been freed. Re-create it. */
-        data = strdup("");
-        if(!data)
-          return m;
+        break;
+      default:
+        if(!stdinsize) {
+          /* Zero-length data has been freed. Re-create it. */
+          data = strdup("");
+          if(!data)
+            return m;
+        }
+        break;
       }
       size = curlx_uztoso(stdinsize);
       origin = 0;

Copy link
Contributor Author

@emanuele6 emanuele6 Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed I can move the result = CURLE_READ_ERROR; line the operate() function in tool_operate.c (added in the second commit) and simplify the code even more (it also makes exit codes for PARAM_FILE_ERROR more consisent). I also renamed the PARAM_FILE_ERROR back to my original PARAM_READ_ERROR and adjusted the error message since now it will always cause curl to exit with the CURLE_READ_ERROR exit code.

previous version of the second commit

commit d5f90352a18bd9e903594ba5d1666610cf9983fc
Author: Emanuele Torre <torreemanuele6@gmail.com>
Date:   Wed Apr 13 14:12:50 2022 +0200

    tool_paramhlp: make file2memory() report when fread(3) fails.
    
    A new ParameterError value has been added for this error:
    PARAM_FILE_ERRROR
    
    Co-authored-by: Daniel Gustafsson <daniel@yesql.se>

diff --git a/src/tool_getparam.h b/src/tool_getparam.h
index 599a0ac88..3ad0a5d20 100644
--- a/src/tool_getparam.h
+++ b/src/tool_getparam.h
@@ -45,6 +45,7 @@ typedef enum {
   PARAM_NO_NOT_BOOLEAN,
   PARAM_CONTDISP_SHOW_HEADER, /* --include and --remote-header-name */
   PARAM_CONTDISP_RESUME_FROM, /* --continue-at and --remote-header-name */
+  PARAM_FILE_ERROR,
   PARAM_LAST
 } ParameterError;
 
diff --git a/src/tool_helpers.c b/src/tool_helpers.c
index 9244d1fb5..967943ad8 100644
--- a/src/tool_helpers.c
+++ b/src/tool_helpers.c
@@ -72,6 +72,8 @@ const char *param2text(int res)
     return "showing headers and --remote-header-name cannot be combined";
   case PARAM_CONTDISP_RESUME_FROM:
     return "--continue-at and --remote-header-name cannot be combined";
+  case PARAM_FILE_ERROR:
+    return "error encountered in file operation";
   default:
     return "unknown error";
   }
diff --git a/src/tool_paramhlp.c b/src/tool_paramhlp.c
index 1d475ea4e..f03b2a46f 100644
--- a/src/tool_paramhlp.c
+++ b/src/tool_paramhlp.c
@@ -94,10 +94,16 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file)
     do {
       char buffer[4096];
       nread = fread(buffer, 1, sizeof(buffer), file);
+      if(ferror(file)) {
+        curlx_dyn_free(&dyn);
+        *size = 0;
+        *bufp = NULL;
+        return PARAM_FILE_ERROR;
+      }
       if(nread)
         if(curlx_dyn_addn(&dyn, buffer, nread))
           return PARAM_NO_MEM;
-    } while(!feof(file) && !ferror(file));
+    } while(!feof(file));
     *size = curlx_dyn_len(&dyn);
     *bufp = curlx_dyn_ptr(&dyn);
   }

previous version of the third commit

commit 5991804592c078ff3fb39d424e3a78ff6ebe0628
Author: Emanuele Torre <torreemanuele6@gmail.com>
Date:   Wed Apr 13 16:27:04 2022 +0200

    tool_formparse: handle file2memory()'s return value correctly
    
    Now file2memory() returns PARAM_FILE_ERROR when ferror(stdin) would be
    true; since `PARAM_FILE_ERROR != PARAM_OK', the `if(ferror(stdin))' code
    would not have run.

diff --git a/src/tool_formparse.c b/src/tool_formparse.c
index 844b18d15..fde264027 100644
--- a/src/tool_formparse.c
+++ b/src/tool_formparse.c
@@ -125,21 +125,21 @@ static struct tool_mime *tool_mime_new_filedata(struct tool_mime *parent,
     else {  /* Not suitable for direct use, buffer stdin data. */
       size_t stdinsize = 0;
 
-      if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {
+      switch(file2memory(&data, &stdinsize, stdin)) {
+      case PARAM_NO_MEM:
         /* Out of memory. */
         return m;
-      }
-
-      if(ferror(stdin)) {
+      case PARAM_FILE_ERROR:
         result = CURLE_READ_ERROR;
-        Curl_safefree(data);
-        data = NULL;
-      }
-      else if(!stdinsize) {
-        /* Zero-length data has been freed. Re-create it. */
-        data = strdup("");
-        if(!data)
-          return m;
+        break;
+      default:
+        if(!stdinsize) {
+          /* Zero-length data has been freed. Re-create it. */
+          data = strdup("");
+          if(!data)
+            return m;
+        }
+        break;
       }
       size = curlx_uztoso(stdinsize);
       origin = 0;


EDIT: I noticed I can't actually remove that result = CURLE_READ_ERROR line from tool_mime_new_filedata() because of how that function is written, but I still added a branch for PARAM_READ_ERROR in operate() since it probably makes sense to have consistent exit codes for the PARAM_READ_ERROR error.

@emanuele6 emanuele6 force-pushed the fixeof branch 5 times, most recently from 2657db5 to 5991804 Compare April 14, 2022 05:00
A new ParameterError value has been added for this error:
PARAM_READ_ERROR
@emanuele6 emanuele6 force-pushed the fixeof branch 4 times, most recently from c0a8799 to 167f75e Compare April 14, 2022 05:53
Now, file2memory() returns PARAM_FILE_ERROR when ferror(stdin) would be
true; since `PARAM_READ_ERROR != PARAM_OK', the `if(ferror(stdin))' code
became unreachable.

Also, it's no longer necessary to free and set `data' to NULL in this
case.
@bagder bagder closed this in 77a6bf8 Apr 17, 2022
@bagder
Copy link
Member

bagder commented Apr 17, 2022

Thanks!

@emanuele6 emanuele6 deleted the fixeof branch April 17, 2022 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

3 participants