New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tool_paramhlp: use feof(3) with fread(3) to identify EOF #8701
Conversation
69ea456
to
80ad695
Compare
Note that this patch doesn't handle read errors caused by Should To handle read errors we could add an One possible way to handle the read errors caused by these reads is to make curl fail and return One way to implement that is to add a new do {
char buffer[4096];
nread = fread(buffer, 1, sizeof(buffer), file);
if (ferror(file))
return PARAM_READ_ERROR;
if(nread)
if(curlx_dyn_addn(&dyn, buffer, nread))
return PARAM_NO_MEM;
} while(!feof(file)); |
5d0ffea
to
367ceb0
Compare
This loop was using the number of bytes read from the file as condition to keep reading. From Linux's fread(3) man page: > On success, fread() and fwrite() return the number of items read or > written. This number equals the number of bytes transferred only when > size is 1. If an error occurs, or the end of the file is reached, the > return value is a short item count (or zero). > > The file position indicator for the stream is advanced by the number > of bytes successfully read or written. > > fread() does not distinguish between end-of-file and error, and > callers must use feof(3) and ferror(3) to determine which occurred. This means that nread!=0 doesn't make much sense as an end condition for the loop: nread==0 doesn't necessarily mean that EOF has been reached or an error has occured (but that is usually the case) and nread!=0 doesn't necessarily mean that EOF has not been reached or that no read errors have occured. feof(3) and ferror(3) should be uses when using fread(3). Currently curl has to performs an extra fread(3) call to get a return value equal to 0 to stop looping. This usually "works" (even though nread==0 shouldn't be interpreted as EOF) if stdin is a pipe because EOF usually marks the "real" end of the stream, so the extra fread(3) call will return immediately and the extra read syscall won't be noticeable: bash-5.1$ strace -e read curl -s -F file=@- 0x0.st <<< a 2>&1 | > tail -n 5 read(0, "a\n", 4096) = 2 read(0, "", 4096) = 0 read(0, "", 4096) = 0 http://0x0.st/oRs.txt +++ exited with 0 +++ bash-5.1$ But this doesn't work if curl is reading from stdin, stdin is a terminal, and the EOF is being emulated using a shell with ^D. Two consecutive ^D will be required in this case to actually make curl stop reading: bash-5.1$ curl -F file=@- 0x0.st a ^D^D http://0x0.st/oRs.txt bash-5.1$ A possible workaround to this issue is to use a program that handles EOF correctly to indirectly send data to curl's stdin: bash-5.1$ cat - | curl -F file=@- 0x0.st a ^D http://0x0.st/oRs.txt bash-5.1$ This patch makes curl handle EOF properly when using fread(3) in file2memory() so that the workaround is not necessary. Since curl was previously ignoring read errors caused by this fread(3), ferror(3) is also used in the condition of the loop: read errors and EOF will have the same meaning; this is done to somewhat preserve the old behaviour instead of making the command fail when a read error occurs.
src/tool_paramhlp.c
Outdated
@@ -97,7 +97,7 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file) | |||
if(nread) | |||
if(curlx_dyn_addn(&dyn, buffer, nread)) | |||
return PARAM_NO_MEM; | |||
} while(nread); | |||
} while(!feof(file) && !ferror(file)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we run into ferror()
wouldn't it be better to actually handle it? How about something like this?
diff --git a/src/tool_helpers.c b/src/tool_helpers.c
index 9244d1fb5..967943ad8 100644
--- a/src/tool_helpers.c
+++ b/src/tool_helpers.c
@@ -72,6 +72,8 @@ const char *param2text(int res)
return "showing headers and --remote-header-name cannot be combined";
case PARAM_CONTDISP_RESUME_FROM:
return "--continue-at and --remote-header-name cannot be combined";
+ case PARAM_FILE_ERROR:
+ return "error encountered in file operation";
default:
return "unknown error";
}
diff --git a/src/tool_paramhlp.c b/src/tool_paramhlp.c
index 273805ef0..56266a823 100644
--- a/src/tool_paramhlp.c
+++ b/src/tool_paramhlp.c
@@ -97,7 +97,9 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file)
if(nread)
if(curlx_dyn_addn(&dyn, buffer, nread))
return PARAM_NO_MEM;
- } while(nread);
+ if(ferror(file))
+ return PARAM_FILE_ERROR;
+ } while(!feof(file));
*size = curlx_dyn_len(&dyn);
*bufp = curlx_dyn_ptr(&dyn);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielgustafsson yeah, i totally i agree with that, i suggested basically the same thing in my comment here #8701 (comment).
I wasn't sure if we actually wanted to do that, because i was afraid of changing the behaviour to much, but i guess, you never actually want to ignore read errors in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think this changes the behavior in any breaking way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed the other patch i wrote yesterday (renaming the ParamaterError from PARAM_READ_ERROR
to PARAM_FILE_ERROR
and using your error message).
I placed the if(ferror(file))
check right after the fread()
instead of at the end of the loop because I think it makes more sense there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the fread manpage:
RETURN VALUES
The functions fread() and fwrite() advance the file position indicator for the stream by the number of bytes read or written. They return the number of objects read or written.
If an error occurs, or the end-of-file is reached, the return value is a short object count (or zero).
"short object count" to me sounds like fread()
can raise an error while still having read data (at least in some cases/implementations). In the end it wont matter since we abort and return with an error, but I wonder if we should still persist any read data to aid debugging forensics.Thougths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
short object count" to me sounds like
fread()
can raise an error while still having read data (at least in some cases/implementations).
I also interpret the manpage that way.
In the end it wont matter since we abort and return with an error, but I wonder if we should still persist any read data to aid debugging forensics.Thougths?
As fair as i can tell, curl
is not doing that in other places where reading fails (for example the snipped mentioned below), and I don't know how that could be useful to be honest.
I spotted some code that is checking ferror
manually after calling file2memory
in src/tool_formparse.c
: If we are going to make file2memory
not ignore fread
errors, I think that should be refactored to use the new file2memory
return values instead of assuming != PARAM_OK
means PARAM_NO_MEM
(otherwise behaviour would actually change):
Lines 128 to 143 in b7c0bd6
if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) { | |
/* Out of memory. */ | |
return m; | |
} | |
if(ferror(stdin)) { | |
result = CURLE_READ_ERROR; | |
Curl_safefree(data); | |
data = NULL; | |
} | |
else if(!stdinsize) { | |
/* Zero-length data has been freed. Re-create it. */ | |
data = strdup(""); | |
if(!data) | |
return m; | |
} |
I will push another commit to fix that in a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danielgustafsson having seen that code in tool_formparse.c
, i think we should maybe consider dropping the last two commits and only keep the first since curl
's source code seems to be checking ferror()
errors manually after running file2memory()
. What do you think?
EDIT: Well, i guess it would also be fine to keep the last commits since that seems to be the only place where the code that handles file2memory()
's return value needs to be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed we didn't need to free dyn
when returning PARAM_FILE_ERROR
since the file2memory()
function didn't explictly do it in case of an OOM error, but I just checked inside the curlx_dyn_addn
(Curl_dyn_add
) and saw that it is implicitly freeing dyn
when it's OOM.
I will change the second commit to handle that.
I think I should also set *bufp
to NULL
and set *size
to 0
in that case, but i will probably need to check and change more places that call file2memory()
if i do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but i will probably need to check and change more places that call file2memory() if i do that.
It looks like I don't need to change the code around any other file2memory()
call even with that change. (I searched for uses of file2memory()
using git grep '\bfile2memory\b' '*.[ch]'
.)
I will change the second commit to handle that.
I also changed the third commit since now it's not necessary to call free on data
and set it to NULL
; actually the code I was using before for the third (see note below) was wrong because the assertion in Curl_safefree
would always have failed in case of an error since *bufp = curlx_dyn_ptr(&dyn);
in file2memory()
would never have been run if the function returns PARAM_FILE_ERROR
.
previous version of the third commit
diff --git a/src/tool_formparse.c b/src/tool_formparse.c
index 6e9c5d6a0..2657db556 100644
--- a/src/tool_formparse.c
+++ b/src/tool_formparse.c
@@ -125,21 +125,21 @@ static struct tool_mime *tool_mime_new_filedata(struct tool_mime *parent,
else { /* Not suitable for direct use, buffer stdin data. */
size_t stdinsize = 0;
- if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {
+ switch(file2memory(&data, &stdinsize, stdin)) {
+ case PARAM_NO_MEM:
/* Out of memory. */
return m;
- }
-
- if(ferror(stdin)) {
+ case PARAM_FILE_ERROR:
result = CURLE_READ_ERROR;
Curl_safefree(data);
data = NULL;
- }
- else if(!stdinsize) {
- /* Zero-length data has been freed. Re-create it. */
- data = strdup("");
- if(!data)
- return m;
+ break;
+ default:
+ if(!stdinsize) {
+ /* Zero-length data has been freed. Re-create it. */
+ data = strdup("");
+ if(!data)
+ return m;
+ }
+ break;
}
size = curlx_uztoso(stdinsize);
origin = 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed I can move the result = CURLE_READ_ERROR;
line the operate()
function in tool_operate.c
(added in the second commit) and simplify the code even more (it also makes exit codes for PARAM_FILE_ERROR
more consisent). I also renamed the PARAM_FILE_ERROR
back to my original PARAM_READ_ERROR
and adjusted the error message since now it will always cause curl to exit with the CURLE_READ_ERROR
exit code.
previous version of the second commit
commit d5f90352a18bd9e903594ba5d1666610cf9983fc
Author: Emanuele Torre <torreemanuele6@gmail.com>
Date: Wed Apr 13 14:12:50 2022 +0200
tool_paramhlp: make file2memory() report when fread(3) fails.
A new ParameterError value has been added for this error:
PARAM_FILE_ERRROR
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
diff --git a/src/tool_getparam.h b/src/tool_getparam.h
index 599a0ac88..3ad0a5d20 100644
--- a/src/tool_getparam.h
+++ b/src/tool_getparam.h
@@ -45,6 +45,7 @@ typedef enum {
PARAM_NO_NOT_BOOLEAN,
PARAM_CONTDISP_SHOW_HEADER, /* --include and --remote-header-name */
PARAM_CONTDISP_RESUME_FROM, /* --continue-at and --remote-header-name */
+ PARAM_FILE_ERROR,
PARAM_LAST
} ParameterError;
diff --git a/src/tool_helpers.c b/src/tool_helpers.c
index 9244d1fb5..967943ad8 100644
--- a/src/tool_helpers.c
+++ b/src/tool_helpers.c
@@ -72,6 +72,8 @@ const char *param2text(int res)
return "showing headers and --remote-header-name cannot be combined";
case PARAM_CONTDISP_RESUME_FROM:
return "--continue-at and --remote-header-name cannot be combined";
+ case PARAM_FILE_ERROR:
+ return "error encountered in file operation";
default:
return "unknown error";
}
diff --git a/src/tool_paramhlp.c b/src/tool_paramhlp.c
index 1d475ea4e..f03b2a46f 100644
--- a/src/tool_paramhlp.c
+++ b/src/tool_paramhlp.c
@@ -94,10 +94,16 @@ ParameterError file2memory(char **bufp, size_t *size, FILE *file)
do {
char buffer[4096];
nread = fread(buffer, 1, sizeof(buffer), file);
+ if(ferror(file)) {
+ curlx_dyn_free(&dyn);
+ *size = 0;
+ *bufp = NULL;
+ return PARAM_FILE_ERROR;
+ }
if(nread)
if(curlx_dyn_addn(&dyn, buffer, nread))
return PARAM_NO_MEM;
- } while(!feof(file) && !ferror(file));
+ } while(!feof(file));
*size = curlx_dyn_len(&dyn);
*bufp = curlx_dyn_ptr(&dyn);
}
previous version of the third commit
commit 5991804592c078ff3fb39d424e3a78ff6ebe0628
Author: Emanuele Torre <torreemanuele6@gmail.com>
Date: Wed Apr 13 16:27:04 2022 +0200
tool_formparse: handle file2memory()'s return value correctly
Now file2memory() returns PARAM_FILE_ERROR when ferror(stdin) would be
true; since `PARAM_FILE_ERROR != PARAM_OK', the `if(ferror(stdin))' code
would not have run.
diff --git a/src/tool_formparse.c b/src/tool_formparse.c
index 844b18d15..fde264027 100644
--- a/src/tool_formparse.c
+++ b/src/tool_formparse.c
@@ -125,21 +125,21 @@ static struct tool_mime *tool_mime_new_filedata(struct tool_mime *parent,
else { /* Not suitable for direct use, buffer stdin data. */
size_t stdinsize = 0;
- if(file2memory(&data, &stdinsize, stdin) != PARAM_OK) {
+ switch(file2memory(&data, &stdinsize, stdin)) {
+ case PARAM_NO_MEM:
/* Out of memory. */
return m;
- }
-
- if(ferror(stdin)) {
+ case PARAM_FILE_ERROR:
result = CURLE_READ_ERROR;
- Curl_safefree(data);
- data = NULL;
- }
- else if(!stdinsize) {
- /* Zero-length data has been freed. Re-create it. */
- data = strdup("");
- if(!data)
- return m;
+ break;
+ default:
+ if(!stdinsize) {
+ /* Zero-length data has been freed. Re-create it. */
+ data = strdup("");
+ if(!data)
+ return m;
+ }
+ break;
}
size = curlx_uztoso(stdinsize);
origin = 0;
EDIT: I noticed I can't actually remove that result = CURLE_READ_ERROR
line from tool_mime_new_filedata()
because of how that function is written, but I still added a branch for PARAM_READ_ERROR
in operate()
since it probably makes sense to have consistent exit codes for the PARAM_READ_ERROR
error.
2657db5
to
5991804
Compare
A new ParameterError value has been added for this error: PARAM_READ_ERROR
c0a8799
to
167f75e
Compare
Now, file2memory() returns PARAM_FILE_ERROR when ferror(stdin) would be true; since `PARAM_READ_ERROR != PARAM_OK', the `if(ferror(stdin))' code became unreachable. Also, it's no longer necessary to free and set `data' to NULL in this case.
Thanks! |
This loop was using the number of bytes read from the file as condition to keep reading.
From Linux's fread(3) man page:
This means that nread!=0 doesn't make much sense as an end condition for the loop: nread==0 doesn't necessarily mean that EOF has been reached or an error has occured (but that is usually the case) and nread!=0 doesn't necessarily mean that EOF has not been reached or that no read errors have occured. feof(3) and ferror(3) should be uses when using fread(3).
Currently curl has to performs an extra fread(3) call to get a return value equal to 0 to stop looping.
This usually "works" (even though nread==0 shouldn't be interpreted as EOF) if stdin is a pipe because EOF usually marks the "real" end of the stream, so the extra fread(3) call will return immediately and the extra read syscall won't be noticeable:
But this doesn't work if curl is reading from stdin, stdin is a terminal, and the EOF is being emulated using a shell with ^D. Two consecutive ^D will be required in this case to actually make curl stop reading:
A possible workaround to this issue is to use a program that handles EOF correctly to indirectly send data to curl's stdin:
bash-5.1$ cat - | curl -F file=@- 0x0.st a ^D http://0x0.st/oRs.txt bash-5.1$
This patch makes curl handle EOF properly when using fread(3) in file2memory() so that the workaround is not necessary.
Since curl was previously ignoring read errors caused by this fread(3), ferror(3) is also used in the condition of the loop: read errors and EOF will have the same meaning; this is done to somewhat preserve the old behaviour instead of making the command fail when a read error occurs.