This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Lukasa
Recipients Lukasa
Date 2016-08-09.11:33:15
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1470742396.79.0.719108941731.issue27716@psf.upfronthosting.co.za>
In-reply-to
Content
Originally reported as Requests issue #3485: https://github.com/kennethreitz/requests/issues/3485

On Python 3, http.client uses the email module to parse its HTTP headers. The email module, for better or worse, requires that it parse headers as *text*: that is, that they be decoded from bytes first and then parsed.

This doesn't work for UTF-8 encoded headers. For example, the URL `'http://pl.bab.la/slownik/angielski-polski/'` returns the following Link header, encoded as UTF-8: `Link: <http://www.babla.cn/英语-波兰语/>; rel="alternate"; hreflang="zh-Hans", <http://cs.bab.la/slovnik/anglicky-polsky/>; rel="alternate"; hreflang="cs", <http://da.bab.la/ordbog/engelsk-polsk/>; rel="alternate"; hreflang="da", <http://de.bab.la/woerterbuch/englisch-polnisch/>; rel="alternate"; hreflang="de", <http://www.babla.gr/αγγλικα-πολωνικα/>; rel="alternate"; hreflang="el", <http://en.bab.la/dictionary/english-polish/>; rel="alternate"; hreflang="en", <http://eo.bab.la/vortaro/angla-pola/>; rel="alternate"; hreflang="eo", <http://es.bab.la/diccionario/ingles-polaco/>; rel="alternate"; hreflang="es", <http://fi.bab.la/sanakirja/englanti-puola/>; rel="alternate"; hreflang="fi", <http://fr.bab.la/dictionnaire/anglais-polonais/>; rel="alternate"; hreflang="fr", <http://www.babla.in/अंग्रेज़ी-पोलिश/>; rel="alternate"; hreflang="hi", <http://hu.bab.la/szótár/angol-lengyel/>; rel="alternate"; hreflang="hu", <http://www.babla.co.id/bahasa-inggris-bahasa-polandia/>; rel="alternate"; hreflang="id", <http://it.bab.la/dizionario/inglese-polacco/>; rel="alternate"; hreflang="it", <http://ja.bab.la/辞書/英語-ポーランド語/>; rel="alternate"; hreflang="ja", <http://www.babla.kr/영어-폴란드어/>; rel="alternate"; hreflang="ko", <http://nl.bab.la/woordenboek/engels-pools/>; rel="alternate"; hreflang="nl", <http://www.babla.no/engelsk-polsk/>; rel="alternate"; hreflang="no", <http://pl.bab.la/slownik/angielski-polski/>; rel="alternate"; hreflang="pl", <http://pt.bab.la/dicionario/ingles-polones/>; rel="alternate"; hreflang="pt", <http://ro.bab.la/dictionar/engleza-poloneza/>; rel="alternate"; hreflang="ro", <http://www.babla.ru/английский-польский/>; rel="alternate"; hreflang="ru", <http://sv.bab.la/lexikon/engelsk-polsk/>; rel="alternate"; hreflang="sv", <http://sw.bab.la/kamusi/kiingereza-kipolishi/>; rel="alternate"; hreflang="sw", <http://www.babla.co.th/english-polish/>; rel="alternate"; hreflang="th", <http://tr.bab.la/sozluk/ingilizce-lehce/>; rel="alternate"; hreflang="tr", <http://www.babla.vn/tieng-anh-tieng-ba-lan/>; rel="alternate"; hreflang="vi"`.

When decoded using ISO-8859-1, this header gets truncated and this also causes the header block parsing to stop. This means that we don't see the Content-Length header, causing the HTTP client to wait for connection closure to consider the body terminated.

Really the only correct fix for this is for http.client to stop insisting that the headers be decoded before they are parsed, and instead to decode *after*. That way, at least, user code can recover the headers and handle them more sensibly.
History
Date User Action Args
2016-08-09 11:33:16Lukasasetrecipients: + Lukasa
2016-08-09 11:33:16Lukasasetmessageid: <1470742396.79.0.719108941731.issue27716@psf.upfronthosting.co.za>
2016-08-09 11:33:16Lukasalinkissue27716 messages
2016-08-09 11:33:15Lukasacreate