New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode escape sequences not parsed in raw strings. #46793
Comments
According to >>> r'\u0020'
'\\u0020'
Expected:
>>> r'\u0020'
' ' |
You use the "ur" string mode. >>> print ur"\u0020"
" " |
No, it's about python 3.0. I confirm the problem, and propose a patch: --- Python/ast.c.original 2008-04-03 15:12:15.548389400 +0200
+++ Python/ast.c 2008-04-03 15:12:28.359475800 +0200
@@ -3232,7 +3232,7 @@
return NULL;
}
}
- if (!*bytesmode && !rawmode) {
+ if (!*bytesmode) {
return decode_unicode(s, len, rawmode, encoding);
}
if (*bytesmode) { |
Thanks for noticing, Amaury, and your patch works for me. |
Fixed in r62128. |
Sorry, Guido said this is not allowed: |
The docs still need to be updated! An entry in what's new in 3.0 should |
How's this? |
Instead of "ignored" (which might be read ambiguously) how about "not You also still need to add some words to whatsnew. |
"not treated specially" it is! |
The segment "use different rules for interpreting backslash escape Also, a few paragraphs later there are more references to raw strings, |
I made the requested improvements and mentioned it in NEWS. Is there |
What about the "raw-unicode-escape" codec? |
To be honest, I don't know what the uses are for that codec. |
pickle still uses it when protocol=0 (and cPickle as well, but in trunk/ |
You can't change the codec - it's being used in other places as well, Adding a new codec would be fine, though I don't know how this would map |
Isn't "unicode-escape" enough for this purpose? |
What do you mean with "enough" ? The "raw-unicode-escape" codec is used in Python 2.x to convert literal The codec is also being used in cPickle, pickle, variants of pickle, It serves its purpose, just like "unicode-escape" and all the other |
I mean: now that raw strings cannot represent all unicode points (or Note that pickle does not use "raw-unicode-escape" as is: it replaces That's why I propose to completely remove raw-unicode-escape, and use |
While that's true for cPickle, it is not for pickle. The pickle protocol Besides, you cannot assume that the Python interpreter itself is the |
What is the status of this bug? AFAICT, the code is now correct. Have |
It's rejected because the OP wanted unicode escapes to be applied in |
Please apply the patch, but rename "Unicode escapes" to "\u and \U |
Fixed in r62568. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: