New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repr() should not escape non-ASCII characters #46882
Comments
In py3k, repr() escapes non-ASCII characters in Unicode to \uXXXX as In this patch, repr() converts special ascii characters such as "\t", This patch breaks five regr tests on my environment. [1] http://mail.python.org/pipermail/python-dev/2002-October/029443.html |
I think this has potential, but it is too liberal. There are many more OTOH there are other potential use cases where it would be nice to see The backslashreplace trick is nice, I didn't even know about that. :-) |
What if we turn on the backslashreplace trick for some operations only? |
As Michael Urman pointed out, we can use Unicode properties.
For such cases, print(s.encode("ascii", "backslashreplace")) might work. |
It would be difficult, since *_repr() API don't know who is the caller. |
Atsuo: I missed Michael Urman's comment. Can you copy it here, or Amaury: I think it would be okay to use backslashreplace as the default |
Okay, I'll revise a patch later today. |
I revised a patch against Python 3.0a4.
|
I think sys.stdout need to have backslashreplace error handler. |
I don't think this is a good idea; I've explained why earlier on this issue. |
Sorry, I missed to write "for interactive session". |
While it may be desirable to to have repr(unicode) return a non-ASCII repr() is usually used in logging and applications/users/tools don't If you do want to have this more flexible, then make the encoding used Users who wish to see non-ASCII repr(unicode) data can then adjust the This is both more flexible and backwards compatible with 2.x. Also note that the separation of the Unicode database from the BTW: "printable" is not a defined term in Unicode. What is or is not |
Turning code in unicode_repr() into a codec is good idea. I'll write two |
Is a codec which encode() returns an Unicode allowed in Python3? I |
On 2008-04-18 05:35, atsuo ishimoto wrote:
Sure, why not ? I think you have to ask another question: Is repr() allowed to If not, then unicode_repr() will have to check the return value of
That's what we currently have with unicode_repr(), but it doesn't |
New patch agaist current py3k branch. All the regr tests faild by my patch is now fixed as far as I |
On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg
Actually, it is not. In Py3k, x.encode() always requires x to be a str
In Py3k, "strings" *are* unicode. The str data type is Unicode. If you're asking about repr() possibly returning a bytes instance,
What codec?
I'm lost here. PS. Atsuo's PEP has now been checked in as PEP-3138. Discussion should |
FWIW, I've uploaded diff3.txt to Rietveld: Code review comments should be reflected here. I had to skip the change to Modules/unicodename_db.h which were too |
I forgot to mention to Modules/unicodename_db.h. The current unicodename_db.h looks it was generated |
No need to change anything, the diff is just too big for the code |
On 2008-05-06 00:07, Guido van Rossum wrote:
So you've limited the codec design to just doing Unicode<->bytes The original codec design was to have the codec decide which
With "strings" I always refer to 8-bit strings, ie. 8-bit data that
The idea is to have a codec which takes the Unicode object and Now, since you apparently cannot With the original design, this extra step wouldn't have been
See my previous replies on this ticket.
|
I wonder why unicodename_db.h have not updated after |
On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote:
Yes. This was quite a conscious decision that was not taken lightly,
Unfortunately this design made it hard to reason about the correctness IOW, when translated to 3.0, the design violates the general design
You will have to change this habit or you will thoroughly confuse both
Why does everything have to be a codec? |
On 2008-05-06 19:10, Guido van Rossum wrote:
I understand where this concept originates and usual apply this In comparison, the approach is very much like getattr() - you know The reason codecs where designed like this was to be able to I'd suggest to lift the type restrictions from the general If you then also allow those methods to return *both*
Well, I'm confused by the P3k use of terms (esp. because the Perhaps it's better to use PyString and PyUnicode.
It doesn't. It's just that codecs are so easy to add, change In the case addresses by this ticket, I see the usefulness Furthermore, a codec implementation of what Atsuo has in mind |
I'd be happy to have a separate more relaxed API for stackable codecs, |
I updated a patch as per latest PEP.
|
docdiff1.txt contains a documentation for functions I added. |
diff5.txt contains both code and documentation patch for PEP-3138.
|
Review:
should be "the non-ASCII characters in the string returned by :func:`repr` are
Otherwise, the patch is fine IMO. (I'm surprised that only so few tests |
One more thing: with r63891 the encoding and errors arguments for the |
This patch contains following changes.
|
Thank you for your review! On Tue, Jun 3, 2008 at 7:13 PM, Georg Brandl <report@bugs.python.org> wrote:
Well, my intuition came from str.islower() was wrong. An empty string is
Okay, thank you.
I added the format string in my latest patch.
Okay, thank you.
Fixed in my latest patch.
Okay, thank you.
I think bltinmodule.c is fixed with latest patch, but I don't know what
Okay, thank you.
I don't want to change here, because this is reversion of rev 63378.
I think sys.stderr should be default to 'backslashreplace' always. I'll
Thank you very much! I'll file new patch soon. |
BTW, are new C APIs and functions should be ported to Python 2.6 for |
ascii() should probably be in future_builtins. Whether the C API stuff and .isprintable() should be backported to 2.6 |
I updated the patch as per Georg's advice. |
I'm sorry, I missed a file to be uploaded. diff7_1.txt is correct file. |
No way -- while all of this makes sense in Py3k, where all strings are |
stringlib can be compiled for Python 2.6 now, but the '!a' converter is |
Shall the method be called isprintable() or simply printable()? For the |
I would expect "abc".isprintable() give me a bool and "abc".printable() |
You are right, I had forgotton about lower()/islower(). |
Patch committed to Py3k branch in r64138. Thanks all! |
Great, thank you! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: