curl-and-python

Re: How to use pycurl when url contain non-English chars?

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Thu, 4 Oct 2012 09:26:37 +0200 (CEST)

On Thu, 4 Oct 2012, Pan, Chenji wrote:

> Hi, everyone, I just start using pycurl. And I meet a problem that when
> setopt(url, "xxx"), if the url contain Chinese chars, it seems that curl is
> not able to catch the content successfully? Since it does not receive
> unicode, I do some process like str(unicode_str.encode('utf-8')), but it
> still not work. Anyone has any idea?

RFC3986 (http://www.ietf.org/rfc/rfc3986.txt) defines the URI/URL syntax that
libcurl (and thus pycurl) assume you feed them with. As you can see there, the
spec does not allow "raw" utf-8 like that. If you check the request libcurl
sends out on a given URL input you'll see that.

You're talking about an IRI (possibly as per RFC3987), but libcurl doesn't
deal with them. That spec also details how to convert from an IRI to URI.

libcurl does however support IDN, international domain names, if built with
the necessary support enabled.

-- 
  / daniel.haxx.se
_______________________________________________
http://cool.haxx.se/cgi-bin/mailman/listinfo/curl-and-python
Received on 2012-10-04