Message117811
No idea if I'm getting the patch format right here, but tally ho!
This is keyed from release27-maint
Index: Lib/tokenize.py
===================================================================
--- Lib/tokenize.py (revision 85136)
+++ Lib/tokenize.py (working copy)
@@ -184,8 +184,13 @@
def add_whitespace(self, start):
row, col = start
- assert row <= self.prev_row
col_offset = col - self.prev_col
+ # Nearly all newlines are handled by the NL and NEWLINE tokens,
+ # but explicit line continuations are not, so they're handled here.
+ if row > self.prev_row:
+ row_offset = row - self.prev_row
+ self.tokens.append("\\\n" * row_offset)
+ col_offset = col # Recalculate the column offset from the start of our new line
if col_offset:
self.tokens.append(" " * col_offset)
Two issues remain with this fix, both of which replace the assert with something functional but not exactly what the original text is:
1) Whitespace leading up to a line continuation is not recreated. The information required to do this is not present in the tokenized data.
2) If EOF happens at the end of a line, the untokenized version will have a line continuation on the end, as the ENDMARKER token is represented on a line which does not exist in the original.
I spent some time trying to get a unit test written that demonstrates the original bug, but it would seem that doctest (which test_tokenize uses) cannot represent a '\' character properly. The existing unit tests involving line continuations pass due to the '\' characters being interpreted as ERRORTOKEN, which is not as they're done when read from file or interactive prompt. |
|
Date |
User |
Action |
Args |
2010-10-01 16:09:23 | Brian.Bossé | set | recipients:
+ Brian.Bossé, kristjan.jonsson |
2010-10-01 16:09:23 | Brian.Bossé | set | messageid: <1285949363.72.0.308652516013.issue9974@psf.upfronthosting.co.za> |
2010-10-01 16:09:20 | Brian.Bossé | link | issue9974 messages |
2010-10-01 16:09:19 | Brian.Bossé | create | |
|