cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: Prevent/confirm before dumping binary data to terminal?

From: Dan Fandrich <dan_at_coneharvesters.com>
Date: Wed, 26 Feb 2014 22:03:09 +0100

On Wed, Feb 26, 2014 at 10:48:03AM +0100, Gisle Vanem wrote:
> "Daniel Stenberg" <daniel_at_haxx.se> wrote:
>
> >No, I've not seen anyone suggest such an idea but I have to admit
> >it sounds rather attractive. Do you have any particular insights in
> >how less or other tools do the detection?
>
> It opens and reads at most 256 bytes in a file. It's a binary file if
> there are more than 5 "binary" characters in the buffer. A binary
> character is checked using a lookup table depending on the UTF-8
> mode. Hence, less is handling a file as a *text* if it has verified
> it's a UTF-8 file (and UTF-8 mode is active).

But this won't be valid on a non-utf-8 terminal if the text is ISO 8859/1, for
example.. A straightforward method I've seen used and used myself in the past
is to simply look for a binary '\0' (nul) byte in the first X bytes, where X
should be at least a couple of KB to increase the possibility that pure random
bytes result in at least one nul.

> So the check for a true binary file gets a bit complicated. And what
> about UTF-16 (little or big endian), BOM etc.?

A UTF-16 terminal is exceedingly unlikely, so UTF-16 can be considered
essentially binary.

>>> Dan
-------------------------------------------------------------------
List admin: http://cool.haxx.se/list/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2014-02-26