This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Recipients daniel.urban, eric.snow, ezio.melotti,, sandro.tosi
Date 2011-08-05.23:04:03
SpamBayes Score 1.89565e-12
Marked as misclassified No
Message-id <>
Please find attached a patch containing four bug fixes for untokenize():

* untokenize() now always returns a bytes object, defaulting to UTF-8 if no ENCODING token is found (previously it returned a string in this case).
* In compatibility mode, untokenize() successfully processes all tokens from an iterator (previously it discarded the first token).
* In full mode, untokenize() now returns successfully (previously it asserted).
* In full mode, untokenize() successfully processes tokens that were separated by a backslashed newline in the original source (previously it ran these tokens together).

In addition, I've added some unit tests:

* Test case for backslashed newline.
* Test case for missing ENCODING token.
* roundtrip() tests both modes of untokenize() (previously it just tested compatibility mode).

and updated the documentation:

* Update the docstring for untokenize to better describe its actual behaviour, and remove the false claim "Untokenized source will match input source exactly". (We can restore this claim if we ever fix tokenize/untokenize so that it's true.)
* Update the documentation for untokenize in tokenize.rdt to match the docstring.

I welcome review: this is my first proper patch to Python.
Date User Action Args
2011-08-05 23:04:05gdr@garethrees.orgsetrecipients: +, ezio.melotti, daniel.urban, sandro.tosi, eric.snow
2011-08-05 23:04:05gdr@garethrees.orgsetmessageid: <>
2011-08-05 23:04:04gdr@garethrees.orglinkissue12691 messages
2011-08-05 23:04:04gdr@garethrees.orgcreate