classification
Title: Remove backslash escapes from tokenize.c.
Type: Stage:
Components: Interpreter Core Versions: Python 3.0
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: gvanrossum Nosy List: christian.heimes, gvanrossum, ron_adam
Priority: normal Keywords: patch

Created on 2007-05-16 22:23 by ron_adam, last changed 2008-01-06 22:29 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
norawescape3.diff ron_adam, 2007-06-14 05:10 Rrmoves escape chrs from raw strings.
tokenize_cleanup_patch.diff ron_adam, 2007-11-16 00:36
no_raw_escapes_patch.diff ron_adam, 2007-11-16 00:36
Messages (11)
msg52631 - (view) Author: Ron Adam (ron_adam) * Date: 2007-05-16 22:23
This patch modifies tokanizer.c so that it does not skip the character after a backslash in determining the end of a string in raw strings only.

A few strings needed changes in order to compile.  Two in textwrap.py, and one in distutils/util.py.

This does not include changes needed for tests to pass.  I'll include those in a separate patch.
msg52632 - (view) Author: Ron Adam (ron_adam) * Date: 2007-05-16 22:31
Forgot to specify...

This is against the py3k-struni branch, revision 55388.
msg52633 - (view) Author: Ron Adam (ron_adam) * Date: 2007-05-20 02:14
Here's a more complete patch which modifies the following files... (in py3k_struni branch)

M      Python/ast.c
M      Parser/tokenizer.c
M      Lib/test/tokenize_tests.txt
M      Lib/tokenize.py

The test still dosen't pass, but it fails in the same way as it did before these changes were made.  I'll continue to look into this.  I think it's more of a problem with the test it self and not a problem with the modules.  Or it may be a bug in the struni branch that is yet to be fixed.

The following alter one or two raw strings each replacing the outer most quotes with triple quotes in most cases.

M      Lib/sgmllib.py
M      Lib/markupbase.py
M      Lib/textwrap.py
M      Lib/distutils/util.py
M      Lib/cookielib.py
M      Lib/pydoc.py
M      Lib/doctest.py
M      Lib/xml/etree/ElementTree.py
M      Lib/HTMLParser.py
msg52634 - (view) Author: Ron Adam (ron_adam) * Date: 2007-05-20 02:15
Here's a more complete patch which modifies the following files... (in py3k_struni branch)

M      Python/ast.c
M      Parser/tokenizer.c
M      Lib/test/tokenize_tests.txt
M      Lib/tokenize.py

The test still dosen't pass, but it fails in the same way as it did before these changes were made.  I'll continue to look into this.  I think it's more of a problem with the test it self and not a problem with the modules.  Or it may be a bug in the struni branch that is yet to be fixed.

The following alter one or two raw strings each replacing the outer most quotes with triple quotes in most cases.

M      Lib/sgmllib.py
M      Lib/markupbase.py
M      Lib/textwrap.py
M      Lib/distutils/util.py
M      Lib/cookielib.py
M      Lib/pydoc.py
M      Lib/doctest.py
M      Lib/xml/etree/ElementTree.py
M      Lib/HTMLParser.py

File Added: norawescape2.diff
msg52635 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-05-26 04:27
Just FYI, I have downloaded this and will attempt to apply it some time next week.
msg52636 - (view) Author: Ron Adam (ron_adam) * Date: 2007-06-14 05:10

Updated patch.

The error that I had mentioned before has been fixed.
Added changes to the tokanize_test output comparison file.

It has random failures due to it using a random sample of other tests as sources to do round trip tests with.  If those files have a problems in them, then this tests fails.

Added a filename output line to the test so the problem file can be identified.

Patch is against the py3k_struni branch, revision 55970

File Added: norawescape3.diff
msg57250 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2007-11-08 14:20
Can you create a new patch and verify that the problem still exists?
norawescape3.diff doesn't apply cleanly any more.
msg57262 - (view) Author: Ron Adam (ron_adam) * Date: 2007-11-08 17:28
Yes, I will update it.
msg57290 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-09 00:32
FWIW, I'm +1 on the part of this patch that disables \u in raw strings.
I just had a problem with a doctest that couldn't be run in verbose mode
because \u was being interpreted in raw mode...  But I'm still solidly
-1 on allowing trailing \.
msg57578 - (view) Author: Ron Adam (ron_adam) * Date: 2007-11-16 00:36
It looks like the disabling of \u and \U in raw strings is done.  Does
tokenize.py need to be fixed, to match?

While working on this I was able to clean up the string parsing parts of
tokenize.c, and have a separate patch with just that.

And an updated patch with both the cleaned up tokenize.c and the no
escapes in raw strings in case it is desired after all.
msg57579 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2007-11-16 00:52
I don't think tokenizer.py needs to be changed -- it never interpreted
backslashes in string literals anyway (not even in regular, non-raw
literals).

The tokenizer.c cleanup is submitted as revision 59007.

I still am not warming up towards the no-raw-escapes feature, so I'm
closing this as rejected.  Nevertheless, thanks for your efforts!
History
Date User Action Args
2008-01-06 22:29:46adminsetkeywords: - py3k
versions: Python 3.0
2007-11-16 00:52:49gvanrossumsetstatus: open -> closed
resolution: rejected
messages: + msg57579
2007-11-16 00:36:36ron_adamsetfiles: + no_raw_escapes_patch.diff
2007-11-16 00:36:11ron_adamsetfiles: + tokenize_cleanup_patch.diff
messages: + msg57578
2007-11-09 00:32:17gvanrossumsetmessages: + msg57290
2007-11-08 17:28:22ron_adamsetmessages: + msg57262
2007-11-08 14:20:18christian.heimessetnosy: + christian.heimes
messages: + msg57250
2007-08-30 00:22:41gvanrossumsetversions: + Python 3.0
2007-05-16 22:23:54ron_adamcreate