Author vstinner
Recipients aronacher, georg.brandl, vstinner
Date 2011-01-22.13:04:27
SpamBayes Score 4.56815e-05
Marked as misclassified No
Message-id <1295701470.34.0.613555869801.issue10980@psf.upfronthosting.co.za>
In-reply-to
Content
Extract of PEP 3333: << Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding. >>

What is the best choice for portability (HTTP servers and web browsers): latin1 or MIME encoding? Latin1 is a small subset of Unicode: only U+0000..U+00FF.

We should maybe give the choice to the user between Latin1, MIME, or maybe something else (eg. UTF-8, cp1252, ...). Or at least, you should try something like:

try:
   bytes = text.encode('latin1')
except UnicodeEncodeError:
   bytes = encodeMIME(text, 'utf-8')

Would it be a good idea to accept raw bytes headers? HTTP is *supposed* to be correctly encoded using different RFC, but in practical, anyone is free to do whateven he wants.

Sentence extracted randomly from the WWW (dec. 2008): "it seems that neither Tomcat 5.5 or 6 properly decodes HTTP headers as per RFC 2047! The Tomcat code assumes everywhere that header values use ISO-8859-1."

Finally, why do you consider that this issue have to be fixed before Python 3.2?
History
Date User Action Args
2011-01-22 13:04:30vstinnersetrecipients: + vstinner, georg.brandl, aronacher
2011-01-22 13:04:30vstinnersetmessageid: <1295701470.34.0.613555869801.issue10980@psf.upfronthosting.co.za>
2011-01-22 13:04:28vstinnerlinkissue10980 messages
2011-01-22 13:04:27vstinnercreate