tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701

emanuele6 · 2022-04-12T19:22:45Z

This loop was using the number of bytes read from the file as condition to keep reading.

From Linux's fread(3) man page:

On success, fread() and fwrite() return the number of items read or
written. This number equals the number of bytes transferred only when
size is 1. If an error occurs, or the end of the file is reached, the
return value is a short item count (or zero).

The file position indicator for the stream is advanced by the number
of bytes successfully read or written.

fread() does not distinguish between end-of-file and error, and
callers must use feof(3) and ferror(3) to determine which occurred.

This means that nread!=0 doesn't make much sense as an end condition for the loop: nread==0 doesn't necessarily mean that EOF has been reached or an error has occured (but that is usually the case) and nread!=0 doesn't necessarily mean that EOF has not been reached or that no read errors have occured. feof(3) and ferror(3) should be uses when using fread(3).

Currently curl has to performs an extra fread(3) call to get a return value equal to 0 to stop looping.

This usually "works" (even though nread==0 shouldn't be interpreted as EOF) if stdin is a pipe because EOF usually marks the "real" end of the stream, so the extra fread(3) call will return immediately and the extra read syscall won't be noticeable:

bash-5.1$ strace -e read curl -s -F file=@- 0x0.st <<< a 2>&1 |
> tail -n 5
read(0, "a\n", 4096)                    = 2
read(0, "", 4096)                       = 0
read(0, "", 4096)                       = 0
http://0x0.st/oRs.txt
+++ exited with 0 +++
bash-5.1$

But this doesn't work if curl is reading from stdin, stdin is a terminal, and the EOF is being emulated using a shell with ^D. Two consecutive ^D will be required in this case to actually make curl stop reading:

bash-5.1$ curl -F file=@- 0x0.st
a
^D^D
http://0x0.st/oRs.txt
bash-5.1$

A possible workaround to this issue is to use a program that handles EOF correctly to indirectly send data to curl's stdin:

bash-5.1$ cat - | curl -F file=@- 0x0.st
a
^D
http://0x0.st/oRs.txt
bash-5.1$

This patch makes curl handle EOF properly when using fread(3) in file2memory() so that the workaround is not necessary.

Since curl was previously ignoring read errors caused by this fread(3), ferror(3) is also used in the condition of the loop: read errors and EOF will have the same meaning; this is done to somewhat preserve the old behaviour instead of making the command fail when a read error occurs.

emanuele6 · 2022-04-12T20:41:08Z

Note that this patch doesn't handle read errors caused by fread(3), it simply ignores them as it did before.

Should curl fail when a read error occurs here?

To handle read errors we could add an if(ferror(file)) branch in the code and handle the error somehow, but it's unclear how these errors should be handled since curl was previously ignoring them.

One possible way to handle the read errors caused by these reads is to make curl fail and return 26 ("Read error. Various reading problems."), but this may be a breaking change.

One way to implement that is to add a new PARAM_READ_ERROR ParameterError and make file2memory return that when a read error occurs; i.e.:

do {
  char buffer[4096];
  nread = fread(buffer, 1, sizeof(buffer), file);
  if (ferror(file))
    return PARAM_READ_ERROR;
  if(nread)
    if(curlx_dyn_addn(&dyn, buffer, nread))
      return PARAM_NO_MEM;
} while(!feof(file));

This loop was using the number of bytes read from the file as condition to keep reading. From Linux's fread(3) man page: > On success, fread() and fwrite() return the number of items read or > written. This number equals the number of bytes transferred only when > size is 1. If an error occurs, or the end of the file is reached, the > return value is a short item count (or zero). > > The file position indicator for the stream is advanced by the number > of bytes successfully read or written. > > fread() does not distinguish between end-of-file and error, and > callers must use feof(3) and ferror(3) to determine which occurred. This means that nread!=0 doesn't make much sense as an end condition for the loop: nread==0 doesn't necessarily mean that EOF has been reached or an error has occured (but that is usually the case) and nread!=0 doesn't necessarily mean that EOF has not been reached or that no read errors have occured. feof(3) and ferror(3) should be uses when using fread(3). Currently curl has to performs an extra fread(3) call to get a return value equal to 0 to stop looping. This usually "works" (even though nread==0 shouldn't be interpreted as EOF) if stdin is a pipe because EOF usually marks the "real" end of the stream, so the extra fread(3) call will return immediately and the extra read syscall won't be noticeable: bash-5.1$ strace -e read curl -s -F file=@- 0x0.st <<< a 2>&1 | > tail -n 5 read(0, "a\n", 4096) = 2 read(0, "", 4096) = 0 read(0, "", 4096) = 0 http://0x0.st/oRs.txt +++ exited with 0 +++ bash-5.1$ But this doesn't work if curl is reading from stdin, stdin is a terminal, and the EOF is being emulated using a shell with ^D. Two consecutive ^D will be required in this case to actually make curl stop reading: bash-5.1$ curl -F file=@- 0x0.st a ^D^D http://0x0.st/oRs.txt bash-5.1$ A possible workaround to this issue is to use a program that handles EOF correctly to indirectly send data to curl's stdin: bash-5.1$ cat - | curl -F file=@- 0x0.st a ^D http://0x0.st/oRs.txt bash-5.1$ This patch makes curl handle EOF properly when using fread(3) in file2memory() so that the workaround is not necessary. Since curl was previously ignoring read errors caused by this fread(3), ferror(3) is also used in the condition of the loop: read errors and EOF will have the same meaning; this is done to somewhat preserve the old behaviour instead of making the command fail when a read error occurs.

danielgustafsson · 2022-04-13T09:58:55Z

src/tool_paramhlp.c

@@ -97,7 +97,7 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file)
      if(nread)
        if(curlx_dyn_addn(&dyn, buffer, nread))
          return PARAM_NO_MEM;
-    } while(nread);
+    } while(!feof(file) && !ferror(file));


If we run into ferror() wouldn't it be better to actually handle it? How about something like this?

diff --git a/src/tool_helpers.c b/src/tool_helpers.c index 9244d1fb5..967943ad8 100644 --- a/src/tool_helpers.c +++ b/src/tool_helpers.c @@ -72,6 +72,8 @@ const char *param2text(int res) return "showing headers and --remote-header-name cannot be combined"; case PARAM_CONTDISP_RESUME_FROM: return "--continue-at and --remote-header-name cannot be combined"; + case PARAM_FILE_ERROR: + return "error encountered in file operation"; default: return "unknown error"; } diff --git a/src/tool_paramhlp.c b/src/tool_paramhlp.c index 273805ef0..56266a823 100644 --- a/src/tool_paramhlp.c +++ b/src/tool_paramhlp.c @@ -97,7 +97,9 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file) if(nread) if(curlx_dyn_addn(&dyn, buffer, nread)) return PARAM_NO_MEM; - } while(nread); + if(ferror(file)) + return PARAM_FILE_ERROR; + } while(!feof(file)); *size = curlx_dyn_len(&dyn); *bufp = curlx_dyn_ptr(&dyn); }

@danielgustafsson yeah, i totally i agree with that, i suggested basically the same thing in my comment here #8701 (comment).

I wasn't sure if we actually wanted to do that, because i was afraid of changing the behaviour to much, but i guess, you never actually want to ignore read errors in this case.

Yeah, I don't think this changes the behavior in any breaking way.

I just pushed the other patch i wrote yesterday (renaming the ParamaterError from PARAM_READ_ERROR to PARAM_FILE_ERROR and using your error message).

I placed the if(ferror(file)) check right after the fread() instead of at the end of the loop because I think it makes more sense there.

From the fread manpage:

RETURN VALUES
The functions fread() and fwrite() advance the file position indicator for the stream by the number of bytes read or written. They return the number of objects read or written.
If an error occurs, or the end-of-file is reached, the return value is a short object count (or zero).

"short object count" to me sounds like fread() can raise an error while still having read data (at least in some cases/implementations). In the end it wont matter since we abort and return with an error, but I wonder if we should still persist any read data to aid debugging forensics.Thougths?

short object count" to me sounds like fread() can raise an error while still having read data (at least in some cases/implementations).

I also interpret the manpage that way.

In the end it wont matter since we abort and return with an error, but I wonder if we should still persist any read data to aid debugging forensics.Thougths?

As fair as i can tell, curl is not doing that in other places where reading fails (for example the snipped mentioned below), and I don't know how that could be useful to be honest.

I spotted some code that is checking ferror manually after calling file2memory in src/tool_formparse.c: If we are going to make file2memory not ignore fread errors, I think that should be refactored to use the new file2memory return values instead of assuming != PARAM_OK means PARAM_NO_MEM (otherwise behaviour would actually change):

curl/src/tool_formparse.c

Lines 128 to 143 in b7c0bd6

if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {

/* Out of memory. */

return m;

}

if(ferror(stdin)) {

result = CURLE_READ_ERROR;

Curl_safefree(data);

data = NULL;

}

else if(!stdinsize) {

/* Zero-length data has been freed. Re-create it. */

data = strdup("");

if(!data)

return m;

}

I will push another commit to fix that in a bit.

@danielgustafsson having seen that code in tool_formparse.c, i think we should maybe consider dropping the last two commits and only keep the first since curl's source code seems to be checking ferror() errors manually after running file2memory(). What do you think?

EDIT: Well, i guess it would also be fine to keep the last commits since that seems to be the only place where the code that handles file2memory()'s return value needs to be changed.

I assumed we didn't need to free dyn when returning PARAM_FILE_ERROR since the file2memory() function didn't explictly do it in case of an OOM error, but I just checked inside the curlx_dyn_addn (Curl_dyn_add) and saw that it is implicitly freeing dyn when it's OOM.

I will change the second commit to handle that.

I think I should also set *bufp to NULL and set *size to 0 in that case, but i will probably need to check and change more places that call file2memory() if i do that.

but i will probably need to check and change more places that call file2memory() if i do that.

It looks like I don't need to change the code around any other file2memory() call even with that change. (I searched for uses of file2memory() using git grep '\bfile2memory\b' '*.[ch]'.)

I will change the second commit to handle that.

I also changed the third commit since now it's not necessary to call free on data and set it to NULL; actually the code I was using before for the third (see note below) was wrong because the assertion in Curl_safefree would always have failed in case of an error since *bufp = curlx_dyn_ptr(&dyn); in file2memory() would never have been run if the function returns PARAM_FILE_ERROR.

previous version of the third commit

diff --git a/src/tool_formparse.c b/src/tool_formparse.c index 6e9c5d6a0..2657db556 100644 --- a/src/tool_formparse.c +++ b/src/tool_formparse.c @@ -125,21 +125,21 @@ static struct tool_mime *tool_mime_new_filedata(struct tool_mime *parent, else { /* Not suitable for direct use, buffer stdin data. */ size_t stdinsize = 0; - if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) { + switch(file2memory(&data, &stdinsize, stdin)) { + case PARAM_NO_MEM: /* Out of memory. */ return m; - } - - if(ferror(stdin)) { + case PARAM_FILE_ERROR: result = CURLE_READ_ERROR; Curl_safefree(data); data = NULL; - } - else if(!stdinsize) { - /* Zero-length data has been freed. Re-create it. */ - data = strdup(""); - if(!data) - return m; + break; + default: + if(!stdinsize) { + /* Zero-length data has been freed. Re-create it. */ + data = strdup(""); + if(!data) + return m; + } + break; } size = curlx_uztoso(stdinsize); origin = 0;

I also noticed I can move the result = CURLE_READ_ERROR; line the operate() function in tool_operate.c (added in the second commit) and simplify the code even more (it also makes exit codes for PARAM_FILE_ERROR more consisent). I also renamed the PARAM_FILE_ERROR back to my original PARAM_READ_ERROR and adjusted the error message since now it will always cause curl to exit with the CURLE_READ_ERROR exit code.

previous version of the second commit

commit d5f90352a18bd9e903594ba5d1666610cf9983fc Author: Emanuele Torre <torreemanuele6@gmail.com> Date: Wed Apr 13 14:12:50 2022 +0200 tool_paramhlp: make file2memory() report when fread(3) fails. A new ParameterError value has been added for this error: PARAM_FILE_ERRROR Co-authored-by: Daniel Gustafsson <daniel@yesql.se> diff --git a/src/tool_getparam.h b/src/tool_getparam.h index 599a0ac88..3ad0a5d20 100644 --- a/src/tool_getparam.h +++ b/src/tool_getparam.h @@ -45,6 +45,7 @@ typedef enum { PARAM_NO_NOT_BOOLEAN, PARAM_CONTDISP_SHOW_HEADER, /* --include and --remote-header-name */ PARAM_CONTDISP_RESUME_FROM, /* --continue-at and --remote-header-name */ + PARAM_FILE_ERROR, PARAM_LAST } ParameterError; diff --git a/src/tool_helpers.c b/src/tool_helpers.c index 9244d1fb5..967943ad8 100644 --- a/src/tool_helpers.c +++ b/src/tool_helpers.c @@ -72,6 +72,8 @@ const char *param2text(int res) return "showing headers and --remote-header-name cannot be combined"; case PARAM_CONTDISP_RESUME_FROM: return "--continue-at and --remote-header-name cannot be combined"; + case PARAM_FILE_ERROR: + return "error encountered in file operation"; default: return "unknown error"; } diff --git a/src/tool_paramhlp.c b/src/tool_paramhlp.c index 1d475ea4e..f03b2a46f 100644 --- a/src/tool_paramhlp.c +++ b/src/tool_paramhlp.c @@ -94,10 +94,16 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file) do { char buffer[4096]; nread = fread(buffer, 1, sizeof(buffer), file); + if(ferror(file)) { + curlx_dyn_free(&dyn); + *size = 0; + *bufp = NULL; + return PARAM_FILE_ERROR; + } if(nread) if(curlx_dyn_addn(&dyn, buffer, nread)) return PARAM_NO_MEM; - } while(!feof(file) && !ferror(file)); + } while(!feof(file)); *size = curlx_dyn_len(&dyn); *bufp = curlx_dyn_ptr(&dyn); }

previous version of the third commit

commit 5991804592c078ff3fb39d424e3a78ff6ebe0628 Author: Emanuele Torre <torreemanuele6@gmail.com> Date: Wed Apr 13 16:27:04 2022 +0200 tool_formparse: handle file2memory()'s return value correctly Now file2memory() returns PARAM_FILE_ERROR when ferror(stdin) would be true; since `PARAM_FILE_ERROR != PARAM_OK', the `if(ferror(stdin))' code would not have run. diff --git a/src/tool_formparse.c b/src/tool_formparse.c index 844b18d15..fde264027 100644 --- a/src/tool_formparse.c +++ b/src/tool_formparse.c @@ -125,21 +125,21 @@ static struct tool_mime *tool_mime_new_filedata(struct tool_mime *parent, else { /* Not suitable for direct use, buffer stdin data. */ size_t stdinsize = 0; - if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) { + switch(file2memory(&data, &stdinsize, stdin)) { + case PARAM_NO_MEM: /* Out of memory. */ return m; - } - - if(ferror(stdin)) { + case PARAM_FILE_ERROR: result = CURLE_READ_ERROR; - Curl_safefree(data); - data = NULL; - } - else if(!stdinsize) { - /* Zero-length data has been freed. Re-create it. */ - data = strdup(""); - if(!data) - return m; + break; + default: + if(!stdinsize) { + /* Zero-length data has been freed. Re-create it. */ + data = strdup(""); + if(!data) + return m; + } + break; } size = curlx_uztoso(stdinsize); origin = 0;

EDIT: I noticed I can't actually remove that result = CURLE_READ_ERROR line from tool_mime_new_filedata() because of how that function is written, but I still added a branch for PARAM_READ_ERROR in operate() since it probably makes sense to have consistent exit codes for the PARAM_READ_ERROR error.

A new ParameterError value has been added for this error: PARAM_READ_ERROR

Now, file2memory() returns PARAM_FILE_ERROR when ferror(stdin) would be true; since `PARAM_READ_ERROR != PARAM_OK', the `if(ferror(stdin))' code became unreachable. Also, it's no longer necessary to free and set `data' to NULL in this case.

bagder · 2022-04-17T09:37:10Z

Thanks!

emanuele6 force-pushed the fixeof branch 2 times, most recently from 69ea456 to 80ad695 Compare April 12, 2022 20:33

emanuele6 force-pushed the fixeof branch 6 times, most recently from 5d0ffea to 367ceb0 Compare April 12, 2022 21:30

emanuele6 force-pushed the fixeof branch from 367ceb0 to caa1033 Compare April 12, 2022 21:31

danielgustafsson reviewed Apr 13, 2022

View reviewed changes

emanuele6 force-pushed the fixeof branch 5 times, most recently from 2657db5 to 5991804 Compare April 14, 2022 05:00

tool_paramhlp: make file2memory() handle fread(3) failures.

f8d8332

A new ParameterError value has been added for this error: PARAM_READ_ERROR

emanuele6 force-pushed the fixeof branch 4 times, most recently from c0a8799 to 167f75e Compare April 14, 2022 05:53

emanuele6 force-pushed the fixeof branch from 167f75e to 8d6eb8f Compare April 14, 2022 08:26

bagder added the cmdline tool label Apr 14, 2022

bagder approved these changes Apr 17, 2022

View reviewed changes

bagder closed this in 77a6bf8 Apr 17, 2022

emanuele6 deleted the fixeof branch April 17, 2022 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701

tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701

emanuele6 commented Apr 12, 2022 •

edited

emanuele6 commented Apr 12, 2022 •

edited

danielgustafsson Apr 13, 2022

emanuele6 Apr 13, 2022

danielgustafsson Apr 13, 2022

emanuele6 Apr 13, 2022 •

edited

danielgustafsson Apr 13, 2022

emanuele6 Apr 13, 2022 •

edited

emanuele6 Apr 13, 2022 •

edited

emanuele6 Apr 14, 2022 •

edited

emanuele6 Apr 14, 2022 •

edited

emanuele6 Apr 14, 2022 •

edited

bagder commented Apr 17, 2022

	if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {
	/* Out of memory. */
	return m;
	}

	if(ferror(stdin)) {
	result = CURLE_READ_ERROR;
	Curl_safefree(data);
	data = NULL;
	}
	else if(!stdinsize) {
	/* Zero-length data has been freed. Re-create it. */
	data = strdup("");
	if(!data)
	return m;
	}

tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701

tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701

Conversation

emanuele6 commented Apr 12, 2022 • edited

emanuele6 commented Apr 12, 2022 • edited

danielgustafsson Apr 13, 2022

Choose a reason for hiding this comment

emanuele6 Apr 13, 2022

Choose a reason for hiding this comment

danielgustafsson Apr 13, 2022

Choose a reason for hiding this comment

emanuele6 Apr 13, 2022 • edited

Choose a reason for hiding this comment

danielgustafsson Apr 13, 2022

Choose a reason for hiding this comment

emanuele6 Apr 13, 2022 • edited

Choose a reason for hiding this comment

emanuele6 Apr 13, 2022 • edited

Choose a reason for hiding this comment

emanuele6 Apr 14, 2022 • edited

Choose a reason for hiding this comment

emanuele6 Apr 14, 2022 • edited

Choose a reason for hiding this comment

emanuele6 Apr 14, 2022 • edited

Choose a reason for hiding this comment

bagder commented Apr 17, 2022

emanuele6 commented Apr 12, 2022 •

edited

emanuele6 commented Apr 12, 2022 •

edited

emanuele6 Apr 13, 2022 •

edited

emanuele6 Apr 13, 2022 •

edited

emanuele6 Apr 13, 2022 •

edited

emanuele6 Apr 14, 2022 •

edited

emanuele6 Apr 14, 2022 •

edited

emanuele6 Apr 14, 2022 •

edited