Message 117811 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Brian.Bossé
Recipients	Brian.Bossé, kristjan.jonsson
Date	2010-10-01.16:09:19
SpamBayes Score	1.1668444e-13
Marked as misclassified	No
Message-id	<1285949363.72.0.308652516013.issue9974@psf.upfronthosting.co.za>
In-reply-to

Content
No idea if I'm getting the patch format right here, but tally ho! This is keyed from release27-maint Index: Lib/tokenize.py =================================================================== --- Lib/tokenize.py (revision 85136) +++ Lib/tokenize.py (working copy) @@ -184,8 +184,13 @@ def add_whitespace(self, start): row, col = start - assert row <= self.prev_row col_offset = col - self.prev_col + # Nearly all newlines are handled by the NL and NEWLINE tokens, + # but explicit line continuations are not, so they're handled here. + if row > self.prev_row: + row_offset = row - self.prev_row + self.tokens.append("\\\n" * row_offset) + col_offset = col # Recalculate the column offset from the start of our new line if col_offset: self.tokens.append(" " * col_offset) Two issues remain with this fix, both of which replace the assert with something functional but not exactly what the original text is: 1) Whitespace leading up to a line continuation is not recreated. The information required to do this is not present in the tokenized data. 2) If EOF happens at the end of a line, the untokenized version will have a line continuation on the end, as the ENDMARKER token is represented on a line which does not exist in the original. I spent some time trying to get a unit test written that demonstrates the original bug, but it would seem that doctest (which test_tokenize uses) cannot represent a '\' character properly. The existing unit tests involving line continuations pass due to the '\' characters being interpreted as ERRORTOKEN, which is not as they're done when read from file or interactive prompt.

No idea if I'm getting the patch format right here, but tally ho!

This is keyed from release27-maint

Index: Lib/tokenize.py
===================================================================
--- Lib/tokenize.py	(revision 85136)
+++ Lib/tokenize.py	(working copy)
@@ -184,8 +184,13 @@
 
     def add_whitespace(self, start):
         row, col = start
-        assert row <= self.prev_row
         col_offset = col - self.prev_col
+        # Nearly all newlines are handled by the NL and NEWLINE tokens,
+        # but explicit line continuations are not, so they're handled here.
+        if row > self.prev_row:  
+            row_offset = row - self.prev_row
+            self.tokens.append("\\\n" * row_offset)
+            col_offset = col  # Recalculate the column offset from the start of our new line
         if col_offset:
             self.tokens.append(" " * col_offset)

Two issues remain with this fix, both of which replace the assert with something functional but not exactly what the original text is:
1)  Whitespace leading up to a line continuation is not recreated.  The information required to do this is not present in the tokenized data.
2)  If EOF happens at the end of a line, the untokenized version will have a line continuation on the end, as the ENDMARKER token is represented on a line which does not exist in the original.

I spent some time trying to get a unit test written that demonstrates the original bug, but it would seem that doctest (which test_tokenize uses) cannot represent a '\' character properly.  The existing unit tests involving line continuations pass due to the '\' characters being interpreted as ERRORTOKEN, which is not as they're done when read from file or interactive prompt.

History
Date	User	Action	Args
2010-10-01 16:09:23	Brian.Bossé	set	recipients: + Brian.Bossé, kristjan.jonsson
2010-10-01 16:09:23	Brian.Bossé	set	messageid: <1285949363.72.0.308652516013.issue9974@psf.upfronthosting.co.za>
2010-10-01 16:09:20	Brian.Bossé	link	issue9974 messages
2010-10-01 16:09:19	Brian.Bossé	create