Message 126834 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	aronacher, georg.brandl, vstinner
Date	2011-01-22.13:04:27
SpamBayes Score	4.5681536e-05
Marked as misclassified	No
Message-id	<1295701470.34.0.613555869801.issue10980@psf.upfronthosting.co.za>
In-reply-to

Content
Extract of PEP 3333: << Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding. >> What is the best choice for portability (HTTP servers and web browsers): latin1 or MIME encoding? Latin1 is a small subset of Unicode: only U+0000..U+00FF. We should maybe give the choice to the user between Latin1, MIME, or maybe something else (eg. UTF-8, cp1252, ...). Or at least, you should try something like: try: bytes = text.encode('latin1') except UnicodeEncodeError: bytes = encodeMIME(text, 'utf-8') Would it be a good idea to accept raw bytes headers? HTTP is supposed to be correctly encoded using different RFC, but in practical, anyone is free to do whateven he wants. Sentence extracted randomly from the WWW (dec. 2008): "it seems that neither Tomcat 5.5 or 6 properly decodes HTTP headers as per RFC 2047! The Tomcat code assumes everywhere that header values use ISO-8859-1." Finally, why do you consider that this issue have to be fixed before Python 3.2?

Extract of PEP 3333: << Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding. >>

What is the best choice for portability (HTTP servers and web browsers): latin1 or MIME encoding? Latin1 is a small subset of Unicode: only U+0000..U+00FF.

We should maybe give the choice to the user between Latin1, MIME, or maybe something else (eg. UTF-8, cp1252, ...). Or at least, you should try something like:

try:
   bytes = text.encode('latin1')
except UnicodeEncodeError:
   bytes = encodeMIME(text, 'utf-8')

Would it be a good idea to accept raw bytes headers? HTTP is *supposed* to be correctly encoded using different RFC, but in practical, anyone is free to do whateven he wants.

Sentence extracted randomly from the WWW (dec. 2008): "it seems that neither Tomcat 5.5 or 6 properly decodes HTTP headers as per RFC 2047! The Tomcat code assumes everywhere that header values use ISO-8859-1."

Finally, why do you consider that this issue have to be fixed before Python 3.2?

History
Date	User	Action	Args
2011-01-22 13:04:30	vstinner	set	recipients: + vstinner, georg.brandl, aronacher
2011-01-22 13:04:30	vstinner	set	messageid: <1295701470.34.0.613555869801.issue10980@psf.upfronthosting.co.za>
2011-01-22 13:04:28	vstinner	link	issue10980 messages
2011-01-22 13:04:27	vstinner	create