Message 117718 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	eric.araujo, eric.smith, ncoghlan, pitrou, r.david.murray, vstinner
Date	2010-09-30.10:54:30
SpamBayes Score	2.5195834e-09
Marked as misclassified	No
Message-id	<1285844073.12.0.815861367924.issue9873@psf.upfronthosting.co.za>
In-reply-to

Content
From a function user perspective, the latter API (bytes->bytes, str->str) is exactly what I'm doing. Antoine's point is that there are two ways to achieve that: Option 1 (what my patch currently does): - provide bytes and str variants of all constants - choose which set to use at the start of each function - be careful never to index, only slice (even for single characters) - a few other traps that I don't remember off the top of my head Option 2 (the alternative Antoine suggested and I'm considering): - "decode" the ASCII compatible bytes to str objects by treating them as nominally latin-1 - use the same str-based constants as are used to handle actual str inputs - be able to index to your heart's content inside the algorithm - ensure that any bytes-as-pseudo-str objects are "encoded" back to actual bytes before they are returned From outside the function, a user shouldn't be able to tell which approach we're using internally. The nice thing about option 2 is to make sure you're doing it correctly, you only need to check three kinds of location: - the initial parameter handling in each function - any return statements, raise statements that allow a value to leave the function - any yield expressions (both input and output) The effects of option 1 are scattered all over your algorithms, so it's hard to be sure you've caught everything. The downside of option 2 is if you make a mistake and let your bytes-as-pseudo-str objects escape from the confines of your function, you're going to see some very strange behaviour.

From a function *user* perspective, the latter API (bytes->bytes, str->str) is exactly what I'm doing.

Antoine's point is that there are two ways to achieve that:

Option 1 (what my patch currently does):
- provide bytes and str variants of all constants
- choose which set to use at the start of each function
- be careful never to index, only slice (even for single characters)
- a few other traps that I don't remember off the top of my head

Option 2 (the alternative Antoine suggested and I'm considering):
- "decode" the ASCII compatible bytes to str objects by treating them as nominally latin-1
- use the same str-based constants as are used to handle actual str inputs
- be able to index to your heart's content inside the algorithm
- *ensure* that any bytes-as-pseudo-str objects are "encoded" back to actual bytes before they are returned

From outside the function, a user shouldn't be able to tell which approach we're using internally.

The nice thing about option 2 is to make sure you're doing it correctly, you only need to check three kinds of location:
- the initial parameter handling in each function
- any return statements, raise statements that allow a value to leave the function
- any yield expressions (both input and output)

The effects of option 1 are scattered all over your algorithms, so it's hard to be sure you've caught everything.

The downside of option 2 is if you make a mistake and let your bytes-as-pseudo-str objects escape from the confines of your function, you're going to see some very strange behaviour.

History
Date	User	Action	Args
2010-09-30 10:54:33	ncoghlan	set	recipients: + ncoghlan, pitrou, vstinner, eric.smith, eric.araujo, r.david.murray
2010-09-30 10:54:33	ncoghlan	set	messageid: <1285844073.12.0.815861367924.issue9873@psf.upfronthosting.co.za>
2010-09-30 10:54:31	ncoghlan	link	issue9873 messages
2010-09-30 10:54:30	ncoghlan	create