Message 116886 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	pitrou
Recipients	Dmitry.Jemerov, giampaolo.rodola, ncoghlan, pitrou, r.david.murray
Date	2010-09-19.21:09:36
SpamBayes Score	1.6044707e-05
Marked as misclassified	No
Message-id	<1284930573.3205.13.camel@localhost.localdomain>
In-reply-to	<1284929657.45.0.0473281470654.issue9360@psf.upfronthosting.co.za>

Content
> To make the distinction easier to remember, would it help if the > methods that are currently set to return bytes instead accepted the > typical encoding+errors parameters, with parallel *b APIs to get at > the raw bytes? Not really, no. For raw messages, which encoding+errors must be used depends on the returned contents, it's not something the client can know up front; moreover, different parts of the returned bytes may need decoding using different encodings (for example if there are several MIME parts to the message). People should use the email package to parse the raw messages, as I assume they already do in 2.x. Apart from raw message bodies, NNTP data has well-defined encodings and that's why I can take and return unicode (although as stated, I also use surrogateescape to be fault-tolerant in the face of broken servers). > My concern with the current API is that there isn't a clear indicator > during normal programming as to which APIs return strings and which > return the raw bytes and hence require further decoding. That's a documentation issue. I haven't touched the docs yet :)

> To make the distinction easier to remember, would it help if the
> methods that are currently set to return bytes instead accepted the
> typical encoding+errors parameters, with parallel *b APIs to get at
> the raw bytes?

Not really, no. For raw messages, which encoding+errors must be used
depends on the returned contents, it's not something the client can know
up front; moreover, different parts of the returned bytes may need
decoding using different encodings (for example if there are several
MIME parts to the message). People should use the email package to parse
the raw messages, as I assume they already do in 2.x.

Apart from raw message bodies, NNTP data has well-defined encodings and
that's why I can take and return unicode (although as stated, I also use
surrogateescape to be fault-tolerant in the face of broken servers).

> My concern with the current API is that there isn't a clear indicator
> during normal programming as to which APIs return strings and which
> return the raw bytes and hence require further decoding.

That's a documentation issue. I haven't touched the docs yet :)

History
Date	User	Action	Args
2010-09-19 21:09:38	pitrou	set	recipients: + pitrou, ncoghlan, giampaolo.rodola, r.david.murray, Dmitry.Jemerov
2010-09-19 21:09:36	pitrou	link	issue9360 messages
2010-09-19 21:09:36	pitrou	create