Message 141694 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	gdr@garethrees.org
Recipients	daniel.urban, eric.snow, ezio.melotti, gdr@garethrees.org, sandro.tosi
Date	2011-08-05.23:04:03
SpamBayes Score	1.8956503e-12
Marked as misclassified	No
Message-id	<1312585445.43.0.968598121256.issue12691@psf.upfronthosting.co.za>
In-reply-to

Content
Please find attached a patch containing four bug fixes for untokenize(): * untokenize() now always returns a bytes object, defaulting to UTF-8 if no ENCODING token is found (previously it returned a string in this case). * In compatibility mode, untokenize() successfully processes all tokens from an iterator (previously it discarded the first token). * In full mode, untokenize() now returns successfully (previously it asserted). * In full mode, untokenize() successfully processes tokens that were separated by a backslashed newline in the original source (previously it ran these tokens together). In addition, I've added some unit tests: * Test case for backslashed newline. * Test case for missing ENCODING token. * roundtrip() tests both modes of untokenize() (previously it just tested compatibility mode). and updated the documentation: * Update the docstring for untokenize to better describe its actual behaviour, and remove the false claim "Untokenized source will match input source exactly". (We can restore this claim if we ever fix tokenize/untokenize so that it's true.) * Update the documentation for untokenize in tokenize.rdt to match the docstring. I welcome review: this is my first proper patch to Python.

Please find attached a patch containing four bug fixes for untokenize():

* untokenize() now always returns a bytes object, defaulting to UTF-8 if no ENCODING token is found (previously it returned a string in this case).
* In compatibility mode, untokenize() successfully processes all tokens from an iterator (previously it discarded the first token).
* In full mode, untokenize() now returns successfully (previously it asserted).
* In full mode, untokenize() successfully processes tokens that were separated by a backslashed newline in the original source (previously it ran these tokens together).

In addition, I've added some unit tests:

* Test case for backslashed newline.
* Test case for missing ENCODING token.
* roundtrip() tests both modes of untokenize() (previously it just tested compatibility mode).

and updated the documentation:

* Update the docstring for untokenize to better describe its actual behaviour, and remove the false claim "Untokenized source will match input source exactly". (We can restore this claim if we ever fix tokenize/untokenize so that it's true.)
* Update the documentation for untokenize in tokenize.rdt to match the docstring.

I welcome review: this is my first proper patch to Python.

History
Date	User	Action	Args
2011-08-05 23:04:05	gdr@garethrees.org	set	recipients: + gdr@garethrees.org, ezio.melotti, daniel.urban, sandro.tosi, eric.snow
2011-08-05 23:04:05	gdr@garethrees.org	set	messageid: <1312585445.43.0.968598121256.issue12691@psf.upfronthosting.co.za>
2011-08-05 23:04:04	gdr@garethrees.org	link	issue12691 messages
2011-08-05 23:04:04	gdr@garethrees.org	create