Message 127245 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	r.david.murray
Recipients	akuchling, georg.brandl, giampaolo.rodola, holdenweb, lregebro, pitrou, r.david.murray, rhettinger, sdaoden, vstinner
Date	2011-01-28.04:26:17
SpamBayes Score	3.330669e-16
Marked as misclassified	No
Message-id	<1296188786.27.0.168997258543.issue9124@psf.upfronthosting.co.za>
In-reply-to

Content
Attached is a patch that builds on Victor's patch, but takes the approach I discussed of maintaining backward compatibility (for the most part; see below). The test suite in this version is substantially unchanged. The major changes are adding tests for the bytes input and the new method (get_bytes). The changes to existing test methods are to methods that test internal interfaces (because those now handle bytes, not string). I've included doc changes, which are mostly adding notes about where bytes are accepted (add, __setitem__), and for the new get_bytes method. get_string is now implemented by calling get_bytes, passing the result to email.message.Message, and then calling as_string. This defeats the efficiency purpose of using get_string, but in that use case code should really be using get_bytes. I kept the change to get_file: it returns a binary file, as currently documented. I think this is less likely to cause backward compatibility issues (assuming any 3.x code exists that uses mailbox!) than get_string returning bytes (or dissapearing) would. As with email.message.Message's get_unixfrom method, get_from returns a string, and set_from takes a string. Although there are no real standards for this "header", I believe that it is restricted to ASCII, and have written the code accordingly. This is my answer to Victor's question about maybe_date and the from line: I think we should use ascii as the encoding. That is certainly true for the date; asctime does not use the locale, and English date fields are definitely the de-facto standard for From lines. I haven't looked at the mh_sequences question yet. I don't think there are any formal restrictions on what characters can be used in a sequence name, but I haven't looked to see if there are any standards documents for mh. I'll test to see if my nmh installation accepts non-ascci chars for sequence names tomorrow. I'm also going to try to go over Victor's changes section by section, but everything I've looked at other than the mh_sequences issue he raised looks good to me so far. I note that we still don't have an RM call on whether or not this can go in if it passes review. Oh, also note that neither Victor's patch nor my patch have any tests for non-ASCII characters. Some should be added :)

Attached is a patch that builds on Victor's patch, but takes the approach I discussed of maintaining backward compatibility (for the most part; see below). The test suite in this version is substantially unchanged. The major changes are adding tests for the bytes input and the new method (get_bytes). The changes to existing test methods are to methods that test internal interfaces (because those now handle bytes, not string).

I've included doc changes, which are mostly adding notes about where bytes are accepted (add, __setitem__), and for the new get_bytes method.

get_string is now implemented by calling get_bytes, passing the result to email.message.Message, and then calling as_string. This defeats the efficiency purpose of using get_string, but in that use case code should really be using get_bytes.

I kept the change to get_file: it returns a binary file, as currently documented. I think this is less likely to cause backward compatibility issues (assuming any 3.x code exists that uses mailbox!) than get_string returning bytes (or dissapearing) would.

As with email.message.Message's get_unixfrom method, get_from returns a string, and set_from takes a string. Although there are no real standards for this "header", I believe that it is restricted to ASCII, and have written the code accordingly. This is my answer to Victor's question about maybe_date and the from line: I think we should use ascii as the encoding. That is certainly true for the date; asctime does not use the locale, and English date fields are definitely the de-facto standard for From lines.

I haven't looked at the mh_sequences question yet. I don't think there are any formal restrictions on what characters can be used in a sequence name, but I haven't looked to see if there are any standards documents for mh. I'll test to see if my nmh installation accepts non-ascci chars for sequence names tomorrow. I'm also going to try to go over Victor's changes section by section, but everything I've looked at other than the mh_sequences issue he raised looks good to me so far.

I note that we still don't have an RM call on whether or not this can go in if it passes review.

Oh, also note that neither Victor's patch nor my patch have any tests for non-ASCII characters. Some should be added :)

History
Date	User	Action	Args
2011-01-28 04:26:26	r.david.murray	set	recipients: + r.david.murray, akuchling, georg.brandl, rhettinger, holdenweb, pitrou, vstinner, giampaolo.rodola, lregebro, sdaoden
2011-01-28 04:26:26	r.david.murray	set	messageid: <1296188786.27.0.168997258543.issue9124@psf.upfronthosting.co.za>
2011-01-28 04:26:20	r.david.murray	link	issue9124 messages
2011-01-28 04:26:20	r.david.murray	create