cURL / Mailing Lists / curl-users / Single Mail

curl-users

Re: curl and Japanese web content

From: Yuhui H <eyecat_at_gmail.com>
Date: Wed, 10 Sep 2008 14:51:20 -0700

On Wed, Sep 10, 2008 at 2:31 PM, denis.papathanasiou_at_gmail.com
<denis.papathanasiou_at_gmail.com> wrote:
> Can anyone explain why this command returns Japanese text properly:
>
> (1) curl "http://rss.asahi.com/f/asahi_international"
>
> While this command returns a series of "????????????" strings where the
> Japanese text should be:
>
> (2) curl "http://www.asahi.com/politics/update/0911/TKY200809100296.html"
>
> From what I can tell from the two urls, the first is an rss feed
> (managed by pheedo.com) whose content-type is utf-8.
>
> The second is an html page directly on asahi.com whose content-type is
> EUC-JP (n.b. the http reply header did not specify a content-type, but
> looking at the html which came in reply, one of the meta tags identified
> the encoding as EUC-JP).

It's not curl's job to decode/interpret the returned byte-stream. If
you try to save the content (for example use "-O" command line) then
load both pages in a browser, you should be able to see the content
interpreted with correct encoding. Most browsers can automatically
pick the correct one, or allow you to override that. Those
functionalities are out of the scope of curl.

Yuhui
-------------------------------------------------------------------
List admin: http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-users
FAQ: http://curl.haxx.se/docs/faq.html
Etiquette: http://curl.haxx.se/mail/etiquette.html
Received on 2008-09-10