Issue 2630: repr() should not escape non-ASCII characters

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Unsupported provider

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/46882

classification

Title:	repr() should not escape non-ASCII characters
Type:	enhancement	Stage:
Components:	None	Versions:	Python 3.0

process

Status:	closed	Resolution:	accepted
Dependencies:		Superseder:
Assigned To:		Nosy List:	amaury.forgeotdarc, eric.smith, georg.brandl, gvanrossum, ishimoto, lemburg, pitrou
Priority:	normal	Keywords:	patch

Created on 2008-04-14 09:54 by ishimoto, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
diff.txt	ishimoto, 2008-04-14 09:54
diff2.txt	ishimoto, 2008-04-15 12:19
diff3.txt	ishimoto, 2008-05-04 15:34
diff4.txt	ishimoto, 2008-05-27 12:55
docdiff1.txt	ishimoto, 2008-05-28 07:39
diff5.txt	ishimoto, 2008-06-01 12:53
diff6.txt	ishimoto, 2008-06-03 10:33
diff7_1.txt	ishimoto, 2008-06-03 18:05
diff8.patch	ishimoto, 2008-06-04 17:52

Messages (43)
msg65461 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-14 09:54
In py3k, repr() escapes non-ASCII characters in Unicode to \uXXXX as Python 2. This is unpleasant feature if you are working with non-latin characters. This issue was once discussed by Hye-Shik Chang[1], but was rejected. Here's a new challenge for Python 3 to fix issue. In this patch, repr() converts special ascii characters such as "\t", "\r", "\n", but doesn't convert non-ASCII characters to \uXXXX form. Non-ASCII characters are converted by TextIOWrapper on printing. I set 'errors' attribute of sys.stdout and sys.stderr to 'backslashreplace', so un-printable characters are converted to '\uXXXX' if your console cannot print such characters. This patch breaks five regr tests on my environment. I'll fix these tests if this patch is acceptable. [1] http://mail.python.org/pipermail/python-dev/2002-October/029443.html http://bugs.python.org/issue479898
msg65470 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-04-14 18:12
I think this has potential, but it is too liberal. There are many more characters that cannot be assumed printable, e.g. many of the Latin-1 characters in the range 0x80 through 0x9F. Isn't there some Unicode data table that shows code points that are safely printable? OTOH there are other potential use cases where it would be nice to see the \u escapes, e.g. when one is concerned about sequences that print the same but don't have the same content (e.g. pre-normalization). The backslashreplace trick is nice, I didn't even know about that. :-)
msg65483 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2008-04-14 21:20
What if we turn on the backslashreplace trick for some operations only? For example: sys_displayhook and sys_excepthook.
msg65490 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-15 01:40
> I think this has potential, but it is too liberal. There are many more > characters that cannot be assumed printable, e.g. many of the Latin-1 > characters in the range 0x80 through 0x9F. Isn't there some Unicode > data table that shows code points that are safely printable? As Michael Urman pointed out, we can use Unicode properties. Or we can define a set of non-printable characters (e.g. sys.nonprintablechars). > OTOH there are other potential use cases where it would be nice to see > the \u escapes, e.g. when one is concerned about sequences that print > the same but don't have the same content (e.g. pre-normalization). For such cases, print(s.encode("ascii", "backslashreplace")) might work.
msg65491 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-15 01:48
> What if we turn on the backslashreplace trick for some operations only? > For example: sys_displayhook and sys_excepthook. It would be difficult, since *_repr() API don't know who is the caller.
msg65493 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-04-15 03:10
Atsuo: I missed Michael Urman's comment. Can you copy it here, or (better :-) write a patch that uses it? Amaury: I think it would be okay to use backslashreplace as the default error handler for sys.stderr. Probably not for sys.stdout or other files, since I'm sure many users prefer the errors when their data cannot be printed rather than silently writing \u escapes that might cause other code reading their output to choke. For sys.stderr though I think not having exceptions raised when attempting to print errors is very valuable.
msg65494 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-15 03:35
Okay, I'll revise a patch later today.
msg65514 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-15 12:19
I revised a patch against Python 3.0a4. - As-per suggestion from Michael Urman, unicode_repr() refers unicode database to determine characters to be hex-encoded. - sys.stdout doesn't use 'backslashreplace'.
msg65535 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-16 00:33
I think sys.stdout need to have backslashreplace error handler. Without backslashreplace, print(listOfJapaneseString) prints nothing, but raises an exception. This is worse than Python2.
msg65536 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-04-16 00:44
I don't think this is a good idea; I've explained why earlier on this issue.
msg65542 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-16 02:37
Sorry, I missed to write "for interactive session". I agree for sys.stdout and other files should not have default backslashescape, but for iteractive session, I think sys.stdout can have backslasespape handler to avoid exceptions.
msg65564 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-04-16 19:37
While it may be desirable to to have repr(unicode) return a non-ASCII string, the suggested approach is not suitable to solve the problem. repr() is usually used in logging and applications/users/tools don't expect to suddenly find non-ASCII or even mixed encodings in a log file. If you do want to have this more flexible, then make the encoding used by unicode_repr() adjustable, turn the existing code into a codec (e.g. "unicode-repr") and leave it setup as default. Users who wish to see non-ASCII repr(unicode) data can then adjust the used encoding to their liking. This is both more flexible and backwards compatible with 2.x. Also note that the separation of the Unicode database from the interpreter core was done to keep the interpreter footprint manageable. It's not a good idea to just dump the complete table set into unicodeobject.c via an #include. If you need to reference APIs from modules in C, the usual approach is to create a PyCObject which is then exported by the module (see e.g. the datetime module) and imported by code needing it. BTW: "printable" is not a defined term in Unicode. What is or is not printable really depends on the use case, e.g. there are quite a few code points in Unicode that don't result in any glyph being "printed" to the screen. A Unicode string could then look as if it had fewer code points than it actually does - which is not really what you want when debugging code or sifting through log files.
msg65573 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-17 05:37
> If you do want to have this more flexible, then make the encoding used > by unicode_repr() adjustable, turn the existing code into a codec (e.g. > "unicode-repr") and leave it setup as default. Turning code in unicode_repr() into a codec is good idea. I'll write two codecs(existing repr and new Unicode friendly codec) and post a revised patch later.
msg65601 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-04-18 03:35
Is a codec which encode() returns an Unicode allowed in Python3? I started to think codec is not nessesary, but python function is enough.
msg65606 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-04-18 08:46
On 2008-04-18 05:35, atsuo ishimoto wrote: > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment: > > Is a codec which encode() returns an Unicode allowed in Python3? Sure, why not ? I think you have to ask another question: Is repr() allowed to return a string (instead of Unicode) in Py3k ? If not, then unicode_repr() will have to check the return value of the codec and convert it back to Unicode as necessary. > I started to think codec is not nessesary, but python function is enough. That's what we currently have with unicode_repr(), but it doesn't solve the problem.
msg66216 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-05-04 15:34
New patch agaist current py3k branch. All the regr tests faild by my patch is now fixed as far as I can run. I also modified a doctest module a bit, so should be reviewed by module owners.
msg66298 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-05-05 22:07
On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg <report@bugs.python.org> wrote: > On 2008-04-18 05:35, atsuo ishimoto wrote: > > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment: > > > > Is a codec which encode() returns an Unicode allowed in Python3? > > Sure, why not ? Actually, it is not. In Py3k, x.encode() always requires x to be a str (i.e. unicode) instance and return a bytes instance. y.decode() requires y to be a bytes instance and returns a str (i.e. unicode) instance. > I think you have to ask another question: Is repr() allowed to > return a string (instead of Unicode) in Py3k ? In Py3k, "strings" are unicode. The str data type is Unicode. If you're asking about repr() possibly returning a bytes instance, definitely not. > If not, then unicode_repr() will have to check the return value of > the codec and convert it back to Unicode as necessary. What codec? > > I started to think codec is not nessesary, but python function is enough. > > That's what we currently have with unicode_repr(), but it doesn't > solve the problem. I'm lost here. PS. Atsuo's PEP has now been checked in as PEP 3138. Discussion should start soon on the python-3000 list.
msg66299 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-05-05 22:17
FWIW, I've uploaded diff3.txt to Rietveld: http://codereview.appspot.com/767 Code review comments should be reflected here. I had to skip the change to Modules/unicodename_db.h which were too large for Rietveld to handle.
msg66302 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-05-06 04:30
I forgot to mention to Modules/unicodename_db.h. The current unicodename_db.h looks it was generated by old Tools/unicode/makeunicodedata.py. This patch includes newly generated unicodename_db.h, but we can exclude the change if not necessary.
msg66303 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-05-06 04:39
No need to change anything, the diff is just too big for the code review tool (Rietveld), but since it consists only of numbers we don't need to review it anyway. :)
msg66307 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-05-06 08:26
On 2008-05-06 00:07, Guido van Rossum wrote: > Guido van Rossum <guido@python.org> added the comment: > > On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg > <report@bugs.python.org> wrote: >> On 2008-04-18 05:35, atsuo ishimoto wrote: >> > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment: >> > >> > Is a codec which encode() returns an Unicode allowed in Python3? >> >> Sure, why not ? > > Actually, it is not. In Py3k, x.encode() always requires x to be a str > (i.e. unicode) instance and return a bytes instance. y.decode() > requires y to be a bytes instance and returns a str (i.e. unicode) > instance. So you've limited the codec design to just doing Unicode<->bytes conversions ? The original codec design was to have the codec decide which types to take on input and to generate on output, e.g. to escape characters in Unicode (converting Unicode to Unicode), work on compressed 8-bit strings (converting 8-bit strings to 8-bit strings), etc. >> I think you have to ask another question: Is repr() allowed to >> return a string (instead of Unicode) in Py3k ? > > In Py3k, "strings" are unicode. The str data type is Unicode. With "strings" I always refer to 8-bit strings, ie. 8-bit data that is encoded in some encoding. > If you're asking about repr() possibly returning a bytes instance, > definitely not. > >> If not, then unicode_repr() will have to check the return value of >> the codec and convert it back to Unicode as necessary. > > What codec? The idea is to have a codec which takes the Unicode object and converts it to its repr()-value. Now, since you apparently cannot go the direct way anymore (ie. have the codec encode Unicode to Unicode), you'd have to first use a codec which converts the Unicode object to its repr()-value represented as bytes object and then convert the bytes object back to Unicode in unicode_repr(). With the original design, this extra step wouldn't have been necessary. >> > I started to think codec is not nessesary, but python function is enough. >> >> That's what we currently have with unicode_repr(), but it doesn't >> solve the problem. > > I'm lost here. See my previous replies on this ticket. > PS. Atsuo's PEP has now been checked in as PEP 3138. Discussion should > start soon on the python-3000 list.
msg66310 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-05-06 11:43
> No need to change anything, the diff is just too big for the code > review tool (Rietveld), but since it consists only of numbers we don't > need to review it anyway. :) I wonder why unicodename_db.h have not updated after makeunicodedata.py was modified. If new makeunicodedata.py breaks something, I should remove the chage to unicodename_db.h from this patch (My patch works whether unicodename_db.h is updated or not.). I'll post a question to python-3000 list.
msg66320 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-05-06 17:10
On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote: > So you've limited the codec design to just doing Unicode<->bytes > conversions ? Yes. This was quite a conscious decision that was not taken lightly, with lots of community input, quite a while ago. > The original codec design was to have the codec decide which > types to take on input and to generate on output, e.g. to > escape characters in Unicode (converting Unicode to Unicode), > work on compressed 8-bit strings (converting 8-bit strings to > 8-bit strings), etc. Unfortunately this design made it hard to reason about the correctness of code, since (especially in Py3k, where bytes and str are more different than str and unicode were in 2.x) it's hard to write code that uses .encode() or .decode() unless it knows which codec is being used. IOW, when translated to 3.0, the design violates the general design principle that the type of a function's or method's return value should not depend on the value of one of the arguments. > >> I think you have to ask another question: Is repr() allowed to > >> return a string (instead of Unicode) in Py3k ? > > > > In Py3k, "strings" are unicode. The str data type is Unicode. > > With "strings" I always refer to 8-bit strings, ie. 8-bit data that > is encoded in some encoding. You will have to change this habit or you will thoroughly confuse both users and developers of 3.0. "String" refers to the built-in "str" type which in Py3k is PyUnicode. For the PyString type we use the built-in type "bytes". > > If you're asking about repr() possibly returning a bytes instance, > > definitely not. > > > >> If not, then unicode_repr() will have to check the return value of > >> the codec and convert it back to Unicode as necessary. > > > > What codec? > > The idea is to have a codec which takes the Unicode object and > converts it to its repr()-value. > > Now, since you apparently cannot > go the direct way anymore (ie. have the codec encode Unicode to > Unicode), you'd have to first use a codec which converts the Unicode > object to its repr()-value represented as bytes object and then > convert the bytes object back to Unicode in unicode_repr(). > > With the original design, this extra step wouldn't have been > necessary. Why does everything have to be a codec?
msg66424 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-05-08 17:15
On 2008-05-06 19:10, Guido van Rossum wrote: > Guido van Rossum <guido@python.org> added the comment: > > On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote: >> So you've limited the codec design to just doing Unicode<->bytes >> conversions ? > > Yes. This was quite a conscious decision that was not taken lightly, > with lots of community input, quite a while ago. > >> The original codec design was to have the codec decide which >> types to take on input and to generate on output, e.g. to >> escape characters in Unicode (converting Unicode to Unicode), >> work on compressed 8-bit strings (converting 8-bit strings to >> 8-bit strings), etc. > > Unfortunately this design made it hard to reason about the correctness > of code, since (especially in Py3k, where bytes and str are more > different than str and unicode were in 2.x) it's hard to write code > that uses .encode() or .decode() unless it knows which codec is being > used. > > IOW, when translated to 3.0, the design violates the general design > principle that the type of a function's or method's return value > should not depend on the value of one of the arguments. I understand where this concept originates and usual apply this rule to software design as well, however, in the particular case of codecs, the codec registry and its helper functions are merely interfaces to code that is defined elsewhere. In comparison, the approach is very much like getattr() - you know what the attribute is called, but know nothing about its type until you receive it from the function. The reason codecs where designed like this was to be able to easily stack them. For this to work, only the interfaces need to be defined, without restricting the codecs too much in terms of which types may be used. I'd suggest to lift the type restrictions from the general codecs.c access APIs (PyCodec_), since they don't really belong there and instead only impose the limitation on PyUnicode and PyString methods .encode() and .decode(). If you then also allow those methods to return both* PyUnicode and PyString, you'd still have strong typing (only 1 of two possible types is allowed) and stacking streams or having codecs that work on PyUnicode->PyUnicode or PyString->PyString would still be accessible via .encode()/.decode(). >> >> I think you have to ask another question: Is repr() allowed to >> >> return a string (instead of Unicode) in Py3k ? >> > >> > In Py3k, "strings" are unicode. The str data type is Unicode. >> >> With "strings" I always refer to 8-bit strings, ie. 8-bit data that >> is encoded in some encoding. > > You will have to change this habit or you will thoroughly confuse both > users and developers of 3.0. "String" refers to the built-in "str" > type which in Py3k is PyUnicode. For the PyString type we use the > built-in type "bytes". Well, I'm confused by the P3k use of terms (esp. because the C type names don't match the Python ones), which is why I'm talking about 8-bit strings and Unicode. Perhaps it's better to use PyString and PyUnicode. >> > If you're asking about repr() possibly returning a bytes instance, >> > definitely not. >> > >> >> If not, then unicode_repr() will have to check the return value of >> >> the codec and convert it back to Unicode as necessary. >> > >> > What codec? >> >> The idea is to have a codec which takes the Unicode object and >> converts it to its repr()-value. >> >> Now, since you apparently cannot >> go the direct way anymore (ie. have the codec encode Unicode to >> Unicode), you'd have to first use a codec which converts the Unicode >> object to its repr()-value represented as bytes object and then >> convert the bytes object back to Unicode in unicode_repr(). >> >> With the original design, this extra step wouldn't have been >> necessary. > > Why does everything have to be a codec? It doesn't. It's just that codecs are so easy to add, change and adjust that reusing the existing code is more attractive than reinventing the wheel every time you need to make a conversion from one text form to another adjustable in some way. In the case addresses by this ticket, I see the usefulness of having native language being written to the console using native glyphs, but there are so many drawbacks to this (see the discussion on the ticket and the mailing list), that I think there needs to be a way to adjust the mechanism or at least be able to revert to the existing repr() output. Furthermore, a codec implementation of what Atsuo has in mind would also be useful in other contexts, e.g. where you want to write PyUnicode to a stream without introducing line breaks.
msg66425 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-05-08 17:19
I'd be happy to have a separate more relaxed API for stackable codecs, however, the API should not be overloaded on the .encode() and .decode() methods on str and bytes objects.
msg67409 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-05-27 12:55
I updated a patch as per latest PEP. - io.TextIOWrapper doesn't provide API to change error handler at this time. I should update this patch after the API is provided. - This patch contains a fix for Tools/unicode/makeunicodedata.py in rev 63378.
msg67439 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-05-28 07:39
docdiff1.txt contains a documentation for functions I added.
msg67591 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-01 12:53
diff5.txt contains both code and documentation patch for PEP 3138. - In this patch, default error-handler of sys.stdout is always 'strict'.
msg67651 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-06-03 10:13
Review: * Why is an empty string not printable? In any case, the empty string should be among the test cases for isprintable(). * Why not use PyUnicode_DecodeASCII instead of PyUnicode_FromEncodedObject? It should be a bit faster. * If old-style string formatting gets "%a", .format() must get a "!a" specifier. * The ascii() and repr() tests should be expanded so that both test the same set of objects, and the expected differences. Are there tests for failing cases? * This is just "return ascii" (in builtin_ascii): + if (ascii == NULL) + return NULL; + + return ascii; * For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c seems to use them too, probably a legacy.) * There appear to be some space indentations in tab-indented files like bltinmodule.c and vice versa (unicodeobject.c). * C docs/isprintable() docs: The spec + Characters defined in the Unicode character database as "Other" + or "Separator" other than ASCII space(0x20) are not considered + printable. is unclear, better say "All character except those ... are considered printable". * ascii() docs: + the non-ASCII + characters in the string returned by :func:`ascii`() are hex-escaped + to generate a same string as :func:`repr` in Python 2. should be "the non-ASCII characters in the string returned by :func:`repr` are backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...". * makeunicodedata: len(list(n for n in names if n is not None)) could better be expressed as sum(1 for n in names if n is not None). Otherwise, the patch is fine IMO. (I'm surprised that only so few tests needed adaptation, that's a sign that we're not testing Unicode enough.)
msg67653 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-06-03 10:31
One more thing: with r63891 the encoding and errors arguments for the creation of sys.stderr were made configurable; you'll have to adapt the patch so that it defaults to backslashescape but can be overridden by PYTHONIOENCODING.
msg67654 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-03 10:33
This patch contains following changes. - Added the new C API PyObject_ASCII() for consistency. - Added the new string formatting operater for str.format() and PyUnicode_FromFormat.
msg67655 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-03 11:00
Thank you for your review! I filed a new patch just before I see your comments. On Tue, Jun 3, 2008 at 7:13 PM, Georg Brandl <report@bugs.python.org> wrote: > > Georg Brandl <georg@python.org> added the comment: > > Review: > > * Why is an empty string not printable? In any case, the empty string > should be among the test cases for isprintable(). Well, my intuition came from str.islower() was wrong. An empty string is printable, of cource. > * Why not use PyUnicode_DecodeASCII instead of > PyUnicode_FromEncodedObject? It should be a bit faster. > Okay, thank you. > * If old-style string formatting gets "%a", .format() must get a "!a" > specifier. > I added the format string in my latest patch. > * The ascii() and repr() tests should be expanded so that both test the > same set of objects, and the expected differences. Are there tests for > failing cases? > Okay, thank you. > * This is just "return ascii" (in builtin_ascii): > + if (ascii == NULL) > + return NULL; > + > + return ascii; Fixed in my latest patch. > > * For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE > and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c > seems to use them too, probably a legacy.) Okay, thank you. > > * There appear to be some space indentations in tab-indented files like > bltinmodule.c and vice versa (unicodeobject.c). > I think bltinmodule.c is fixed with latest patch, but I don't know what is correct indentation for unicodeobject.c. I guess latest patch is acceptable. > * C docs/isprintable() docs: The spec > + Characters defined in the Unicode character database as "Other" > + or "Separator" other than ASCII space(0x20) are not considered > + printable. > is unclear, better say "All character except those ... are considered > printable". > > * ascii() docs: > + the non-ASCII > + characters in the string returned by :func:`ascii`() are hex-escaped > + to generate a same string as :func:`repr` in Python 2. > > should be > > "the non-ASCII characters in the string returned by :func:`repr` are > backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...". > Okay, thank you. > * makeunicodedata: len(list(n for n in names if n is not None)) could > better be expressed as sum(1 for n in names if n is not None). I don't want to change here, because this is reversion of rev 63378. > One more thing: with r63891 the encoding and errors arguments for the > creation of sys.stderr were made configurable; you'll have to adapt the > patch so that it defaults to backslashescape but can be overridden by > PYTHONIOENCODING. I think sys.stderr should be default to 'backslashreplace' always. I'll post a messege to Py3k-list later. > > Otherwise, the patch is fine IMO. (I'm surprised that only so few tests > needed adaptation, that's a sign that we're not testing Unicode enough.) > Thank you very much! I'll file new patch soon.
msg67656 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-03 11:06
BTW, are new C APIs and functions should be ported to Python 2.6 for compatibility, without modifing repr() itself? If so, I'll prepare a patch for Python 2.6.
msg67657 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-06-03 11:10
ascii() should probably be in future_builtins. Whether the C API stuff and .isprintable() should be backported to 2.6 is something for Guido to decide.
msg67665 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-03 17:50
I updated the patch as per Georg's advice.
msg67667 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-03 18:05
I'm sorry, I missed a file to be uploaded. diff7_1.txt is correct file.
msg67670 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-06-03 18:48
> Whether the C API stuff and .isprintable() should be backported to 2.6 > is something for Guido to decide. No way -- while all of this makes sense in Py3k, where all strings are Unicode, it would cause no end of problems in 2.6, and it would break backward compatibility badly.
msg67692 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-04 17:52
stringlib can be compiled for Python 2.6 now, but the '!a' converter is disabled by #ifdef for now.
msg67702 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-06-04 21:30
Shall the method be called isprintable() or simply printable()? For the record, in the io classes, the writable()/readable() convention was chosen.
msg67704 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-06-04 21:34
I would expect "abc".isprintable() give me a bool and "abc".printable() to return a printable string, as with "abc".lower() and "abc".islower().
msg67705 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-06-04 21:36
You are right, I had forgotton about lower()/islower().
msg68008 - (view)	Author: Georg Brandl (georg.brandl) *	Date: 2008-06-11 18:38
Patch committed to Py3k branch in r64138. Thanks all!
msg68047 - (view)	Author: Atsuo Ishimoto (ishimoto) *	Date: 2008-06-12 02:44
Great, thank you!

History
Date	User	Action	Args
2022-04-11 14:56:33	admin	set	github: 46882
2008-06-12 02:44:45	ishimoto	set	messages: + msg68047
2008-06-11 18:38:55	georg.brandl	set	status: open -> closed resolution: accepted messages: + msg68008
2008-06-04 21:36:49	pitrou	set	messages: + msg67705
2008-06-04 21:34:42	georg.brandl	set	messages: + msg67704
2008-06-04 21:30:58	pitrou	set	nosy: + pitrou messages: + msg67702
2008-06-04 17:52:51	ishimoto	set	files: + diff8.patch messages: + msg67692
2008-06-03 19:06:26	eric.smith	set	nosy: + eric.smith
2008-06-03 18:48:04	gvanrossum	set	messages: + msg67670
2008-06-03 18:05:15	ishimoto	set	files: + diff7_1.txt messages: + msg67667
2008-06-03 17:57:35	ishimoto	set	files: - diff7.txt
2008-06-03 17:50:20	ishimoto	set	files: + diff7.txt messages: + msg67665
2008-06-03 11:10:19	georg.brandl	set	messages: + msg67657
2008-06-03 11:06:49	ishimoto	set	messages: + msg67656
2008-06-03 11:00:51	ishimoto	set	messages: + msg67655
2008-06-03 10:33:40	ishimoto	set	files: + diff6.txt messages: + msg67654
2008-06-03 10:31:23	georg.brandl	set	messages: + msg67653
2008-06-03 10:13:53	georg.brandl	set	nosy: + georg.brandl messages: + msg67651
2008-06-01 12:53:46	ishimoto	set	files: + diff5.txt messages: + msg67591
2008-05-28 07:39:38	ishimoto	set	files: + docdiff1.txt messages: + msg67439
2008-05-27 12:55:58	ishimoto	set	files: + diff4.txt messages: + msg67409
2008-05-08 17:19:54	gvanrossum	set	messages: + msg66425
2008-05-08 17:15:35	lemburg	set	messages: + msg66424
2008-05-06 17:10:26	gvanrossum	set	messages: + msg66320
2008-05-06 11:43:44	ishimoto	set	messages: + msg66310
2008-05-06 08:26:35	lemburg	set	messages: + msg66307
2008-05-06 04:39:17	gvanrossum	set	messages: + msg66303
2008-05-06 04:30:29	ishimoto	set	messages: + msg66302
2008-05-05 22:17:50	gvanrossum	set	messages: + msg66299
2008-05-05 22:07:36	gvanrossum	set	messages: + msg66298
2008-05-04 15:35:11	ishimoto	set	files: + diff3.txt messages: + msg66216
2008-04-18 08:46:11	lemburg	set	messages: + msg65606
2008-04-18 03:35:41	ishimoto	set	messages: + msg65601
2008-04-17 05:37:51	ishimoto	set	messages: + msg65573
2008-04-16 19:37:38	lemburg	set	nosy: + lemburg messages: + msg65564
2008-04-16 02:37:15	ishimoto	set	messages: + msg65542
2008-04-16 00:44:16	gvanrossum	set	messages: + msg65536
2008-04-16 00:33:31	ishimoto	set	messages: + msg65535
2008-04-15 12:19:56	ishimoto	set	files: + diff2.txt messages: + msg65514
2008-04-15 03:35:09	ishimoto	set	messages: + msg65494
2008-04-15 03:10:13	gvanrossum	set	messages: + msg65493
2008-04-15 01:48:46	ishimoto	set	messages: + msg65491
2008-04-15 01:40:26	ishimoto	set	messages: + msg65490
2008-04-14 21:20:11	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc messages: + msg65483
2008-04-14 18:12:23	gvanrossum	set	keywords: + patch nosy: + gvanrossum messages: + msg65470
2008-04-14 09:54:22	ishimoto	create