repr() should not escape non-ASCII characters #46882

atsuoishimoto · 2008-04-14T09:54:23Z

BPO	2630
Nosy	@malemburg, @gvanrossum, @birkenfeld, @atsuoishimoto, @amauryfa, @pitrou, @ericvsmith
Files	diff.txt diff2.txt diff3.txt diff4.txt docdiff1.txt diff5.txt diff6.txt diff7_1.txt diff8.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2008-06-11.18:38:55.202>
created_at = <Date 2008-04-14.09:54:22.958>
labels = ['type-feature']
title = 'repr() should not escape non-ASCII characters'
updated_at = <Date 2008-06-12.02:44:45.865>
user = 'https://github.com/atsuoishimoto'

bugs.python.org fields:

activity = <Date 2008-06-12.02:44:45.865>
actor = 'ishimoto'
assignee = 'none'
closed = True
closed_date = <Date 2008-06-11.18:38:55.202>
closer = 'georg.brandl'
components = ['None']
creation = <Date 2008-04-14.09:54:22.958>
creator = 'ishimoto'
dependencies = []
files = ['10028', '10034', '10193', '10447', '10456', '10491', '10507', '10512', '10518']
hgrepos = []
issue_num = 2630
keywords = ['patch']
message_count = 43.0
messages = ['65461', '65470', '65483', '65490', '65491', '65493', '65494', '65514', '65535', '65536', '65542', '65564', '65573', '65601', '65606', '66216', '66298', '66299', '66302', '66303', '66307', '66310', '66320', '66424', '66425', '67409', '67439', '67591', '67651', '67653', '67654', '67655', '67656', '67657', '67665', '67667', '67670', '67692', '67702', '67704', '67705', '68008', '68047']
nosy_count = 7.0
nosy_names = ['lemburg', 'gvanrossum', 'georg.brandl', 'ishimoto', 'amaury.forgeotdarc', 'pitrou', 'eric.smith']
pr_nums = []
priority = 'normal'
resolution = 'accepted'
stage = None
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue2630'
versions = ['Python 3.0']

atsuoishimoto · 2008-04-14T09:54:21Z

In py3k, repr() escapes non-ASCII characters in Unicode to \uXXXX as
Python 2. This is unpleasant feature if you are working with non-latin
characters. This issue was once discussed by Hye-Shik Chang[1], but was
rejected. Here's a new challenge for Python 3 to fix issue.

In this patch, repr() converts special ascii characters such as "\t",
"\r", "\n", but doesn't convert non-ASCII characters to \uXXXX form.
Non-ASCII characters are converted by TextIOWrapper on printing. I set
'errors' attribute of sys.stdout and sys.stderr to 'backslashreplace', so
un-printable characters are converted to '\uXXXX' if your console
cannot print such characters.

This patch breaks five regr tests on my environment.
I'll fix these tests if this patch is acceptable.

[1] http://mail.python.org/pipermail/python-dev/2002-October/029443.html
http://bugs.python.org/issue479898

gvanrossum · 2008-04-14T18:12:22Z

I think this has potential, but it is too liberal. There are many more
characters that cannot be assumed printable, e.g. many of the Latin-1
characters in the range 0x80 through 0x9F. Isn't there some Unicode
data table that shows code points that are safely printable?

OTOH there are other potential use cases where it would be nice to see
the \u escapes, e.g. when one is concerned about sequences that print
the same but don't have the same content (e.g. pre-normalization).

The backslashreplace trick is nice, I didn't even know about that. :-)

amauryfa · 2008-04-14T21:20:11Z

What if we turn on the backslashreplace trick for some operations only?
For example: sys_displayhook and sys_excepthook.

atsuoishimoto · 2008-04-15T01:40:25Z

I think this has potential, but it is too liberal. There are many more
characters that cannot be assumed printable, e.g. many of the Latin-1
characters in the range 0x80 through 0x9F. Isn't there some Unicode
data table that shows code points that are safely printable?

As Michael Urman pointed out, we can use Unicode properties.
Or we can define a set of non-printable characters (e.g.
sys.nonprintablechars).

OTOH there are other potential use cases where it would be nice to see
the \u escapes, e.g. when one is concerned about sequences that print
the same but don't have the same content (e.g. pre-normalization).

For such cases, print(s.encode("ascii", "backslashreplace")) might work.

atsuoishimoto · 2008-04-15T01:48:46Z

What if we turn on the backslashreplace trick for some operations only?
For example: sys_displayhook and sys_excepthook.

It would be difficult, since *_repr() API don't know who is the caller.

gvanrossum · 2008-04-15T03:10:13Z

Atsuo: I missed Michael Urman's comment. Can you copy it here, or
(better :-) write a patch that uses it?

Amaury: I think it would be okay to use backslashreplace as the default
error handler for sys.stderr. Probably not for sys.stdout or other
files, since I'm sure many users prefer the errors when their data
cannot be printed rather than silently writing \u escapes that might
cause other code reading their output to choke. For sys.stderr though I
think not having exceptions raised when attempting to print errors is
very valuable.

atsuoishimoto · 2008-04-15T03:35:09Z

Okay, I'll revise a patch later today.

atsuoishimoto · 2008-04-15T12:19:54Z

I revised a patch against Python 3.0a4.

As-per suggestion from Michael Urman, unicode_repr()
refers unicode database to determine characters to be
hex-encoded.
sys.stdout doesn't use 'backslashreplace'.

atsuoishimoto · 2008-04-16T00:33:30Z

I think sys.stdout need to have backslashreplace error handler.
Without backslashreplace, print(listOfJapaneseString) prints nothing,
but raises an exception. This is worse than Python2.

gvanrossum · 2008-04-16T00:44:16Z

I don't think this is a good idea; I've explained why earlier on this issue.

atsuoishimoto · 2008-04-16T02:37:15Z

Sorry, I missed to write "for interactive session".
I agree for sys.stdout and other files should not have default
backslashescape, but for iteractive session, I think sys.stdout can
have backslasespape handler to avoid exceptions.

malemburg · 2008-04-16T19:37:37Z

While it may be desirable to to have repr(unicode) return a non-ASCII
string, the suggested approach is not suitable to solve the problem.

repr() is usually used in logging and applications/users/tools don't
expect to suddenly find non-ASCII or even mixed encodings in a log file.

If you do want to have this more flexible, then make the encoding used
by unicode_repr() adjustable, turn the existing code into a codec (e.g.
"unicode-repr") and leave it setup as default.

Users who wish to see non-ASCII repr(unicode) data can then adjust the
used encoding to their liking.

This is both more flexible and backwards compatible with 2.x.

Also note that the separation of the Unicode database from the
interpreter core was done to keep the interpreter footprint manageable.
It's not a good idea to just dump the complete table set into
unicodeobject.c via an #include. If you need to reference APIs from
modules in C, the usual approach is to create a PyCObject which is then
exported by the module (see e.g. the datetime module) and imported by
code needing it.

BTW: "printable" is not a defined term in Unicode. What is or is not
printable really depends on the use case, e.g. there are quite a few
code points in Unicode that don't result in any glyph being "printed" to
the screen. A Unicode string could then look as if it had fewer code
points than it actually does - which is not really what you want when
debugging code or sifting through log files.

atsuoishimoto · 2008-04-17T05:37:51Z

If you do want to have this more flexible, then make the encoding used
by unicode_repr() adjustable, turn the existing code into a codec (e.g.
"unicode-repr") and leave it setup as default.

Turning code in unicode_repr() into a codec is good idea. I'll write two
codecs(existing repr and new Unicode friendly codec) and post a revised
patch later.

atsuoishimoto · 2008-04-18T03:35:40Z

Is a codec which encode() returns an Unicode allowed in Python3? I
started to think codec is not nessesary, but python function is enough.

malemburg · 2008-04-18T08:46:11Z

On 2008-04-18 05:35, atsuo ishimoto wrote:

atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment:

Is a codec which encode() returns an Unicode allowed in Python3?

Sure, why not ?

I think you have to ask another question: Is repr() allowed to
return a string (instead of Unicode) in Py3k ?

If not, then unicode_repr() will have to check the return value of
the codec and convert it back to Unicode as necessary.

I started to think codec is not nessesary, but python function is enough.

That's what we currently have with unicode_repr(), but it doesn't
solve the problem.

atsuoishimoto · 2008-05-04T15:34:52Z

New patch agaist current py3k branch.

All the regr tests faild by my patch is now fixed as far as I
can run.
I also modified a doctest module a bit, so should be reviewed
by module owners.

gvanrossum · 2008-05-05T22:07:32Z

On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:

On 2008-04-18 05:35, atsuo ishimoto wrote:
> atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment:
>
> Is a codec which encode() returns an Unicode allowed in Python3?

Sure, why not ?

Actually, it is not. In Py3k, x.encode() always requires x to be a str
(i.e. unicode) instance and return a bytes instance. y.decode()
requires y to be a bytes instance and returns a str (i.e. unicode)
instance.

I think you have to ask another question: Is repr() allowed to
return a string (instead of Unicode) in Py3k ?

In Py3k, "strings" *are* unicode. The str data type is Unicode.

If you're asking about repr() possibly returning a bytes instance,
definitely not.

If not, then unicode_repr() will have to check the return value of
the codec and convert it back to Unicode as necessary.

What codec?

> I started to think codec is not nessesary, but python function is enough.

That's what we currently have with unicode_repr(), but it doesn't
solve the problem.

I'm lost here.

PS. Atsuo's PEP has now been checked in as PEP-3138. Discussion should
start soon on the python-3000 list.

gvanrossum · 2008-05-05T22:17:33Z

FWIW, I've uploaded diff3.txt to Rietveld:
http://codereview.appspot.com/767

Code review comments should be reflected here.

I had to skip the change to Modules/unicodename_db.h which were too
large for Rietveld to handle.

atsuoishimoto · 2008-05-06T04:30:12Z

I forgot to mention to Modules/unicodename_db.h.

The current unicodename_db.h looks it was generated
by old Tools/unicode/makeunicodedata.py. This patch
includes newly generated unicodename_db.h, but we
can exclude the change if not necessary.

gvanrossum · 2008-05-06T04:39:15Z

No need to change anything, the diff is just too big for the code
review tool (Rietveld), but since it consists only of numbers we don't
need to review it anyway. :)

malemburg · 2008-05-06T08:26:33Z

On 2008-05-06 00:07, Guido van Rossum wrote:

Guido van Rossum <guido@python.org> added the comment:

On Fri, Apr 18, 2008 at 1:46 AM, Marc-Andre Lemburg
<report@bugs.python.org> wrote:
> On 2008-04-18 05:35, atsuo ishimoto wrote:
> > atsuo ishimoto <ishimoto@users.sourceforge.net> added the comment:
> >
> > Is a codec which encode() returns an Unicode allowed in Python3?
>
> Sure, why not ?

Actually, it is not. In Py3k, x.encode() always requires x to be a str
(i.e. unicode) instance and return a bytes instance. y.decode()
requires y to be a bytes instance and returns a str (i.e. unicode)
instance.

So you've limited the codec design to just doing Unicode<->bytes
conversions ?

The original codec design was to have the codec decide which
types to take on input and to generate on output, e.g. to
escape characters in Unicode (converting Unicode to Unicode),
work on compressed 8-bit strings (converting 8-bit strings to
8-bit strings), etc.

> I think you have to ask another question: Is repr() allowed to
> return a string (instead of Unicode) in Py3k ?

In Py3k, "strings" *are* unicode. The str data type is Unicode.

With "strings" I always refer to 8-bit strings, ie. 8-bit data that
is encoded in some encoding.

If you're asking about repr() possibly returning a bytes instance,
definitely not.

> If not, then unicode_repr() will have to check the return value of
> the codec and convert it back to Unicode as necessary.

What codec?

The idea is to have a codec which takes the Unicode object and
converts it to its repr()-value.

Now, since you apparently cannot
go the direct way anymore (ie. have the codec encode Unicode to
Unicode), you'd have to first use a codec which converts the Unicode
object to its repr()-value represented as bytes object and then
convert the bytes object back to Unicode in unicode_repr().

With the original design, this extra step wouldn't have been
necessary.

> > I started to think codec is not nessesary, but python function is enough.
>
> That's what we currently have with unicode_repr(), but it doesn't
> solve the problem.

I'm lost here.

See my previous replies on this ticket.

PS. Atsuo's PEP has now been checked in as PEP-3138. Discussion should
start soon on the python-3000 list.

atsuoishimoto · 2008-05-06T11:43:31Z

No need to change anything, the diff is just too big for the code
review tool (Rietveld), but since it consists only of numbers we don't
need to review it anyway. :)

I wonder why unicodename_db.h have not updated after
makeunicodedata.py was modified. If new makeunicodedata.py
breaks something, I should remove the chage to unicodename_db.h
from this patch (My patch works whether unicodename_db.h is
updated or not.). I'll post a question to python-3000 list.

gvanrossum · 2008-05-06T17:10:20Z

On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote:

So you've limited the codec design to just doing Unicode<->bytes
conversions ?

Yes. This was quite a conscious decision that was not taken lightly,
with lots of community input, quite a while ago.

The original codec design was to have the codec decide which
types to take on input and to generate on output, e.g. to
escape characters in Unicode (converting Unicode to Unicode),
work on compressed 8-bit strings (converting 8-bit strings to
8-bit strings), etc.

Unfortunately this design made it hard to reason about the correctness
of code, since (especially in Py3k, where bytes and str are more
different than str and unicode were in 2.x) it's hard to write code
that uses .encode() or .decode() unless it knows which codec is being
used.

IOW, when translated to 3.0, the design violates the general design
principle that the *type* of a function's or method's return value
should not depend on the *value* of one of the arguments.

>> I think you have to ask another question: Is repr() allowed to
>> return a string (instead of Unicode) in Py3k ?
>
> In Py3k, "strings" *are* unicode. The str data type is Unicode.

With "strings" I always refer to 8-bit strings, ie. 8-bit data that
is encoded in some encoding.

You will have to change this habit or you will thoroughly confuse both
users and developers of 3.0. "String" refers to the built-in "str"
type which in Py3k is PyUnicode. For the PyString type we use the
built-in type "bytes".

> If you're asking about repr() possibly returning a bytes instance,
> definitely not.
>
>> If not, then unicode_repr() will have to check the return value of
>> the codec and convert it back to Unicode as necessary.
>
> What codec?

The idea is to have a codec which takes the Unicode object and
converts it to its repr()-value.

Now, since you apparently cannot
go the direct way anymore (ie. have the codec encode Unicode to
Unicode), you'd have to first use a codec which converts the Unicode
object to its repr()-value represented as bytes object and then
convert the bytes object back to Unicode in unicode_repr().

With the original design, this extra step wouldn't have been
necessary.

Why does everything have to be a codec?

malemburg · 2008-05-08T17:15:31Z

On 2008-05-06 19:10, Guido van Rossum wrote:

Guido van Rossum <guido@python.org> added the comment:

On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote:
> So you've limited the codec design to just doing Unicode<->bytes
> conversions ?

Yes. This was quite a conscious decision that was not taken lightly,
with lots of community input, quite a while ago.

> The original codec design was to have the codec decide which
> types to take on input and to generate on output, e.g. to
> escape characters in Unicode (converting Unicode to Unicode),
> work on compressed 8-bit strings (converting 8-bit strings to
> 8-bit strings), etc.

Unfortunately this design made it hard to reason about the correctness
of code, since (especially in Py3k, where bytes and str are more
different than str and unicode were in 2.x) it's hard to write code
that uses .encode() or .decode() unless it knows which codec is being
used.

IOW, when translated to 3.0, the design violates the general design
principle that the *type* of a function's or method's return value
should not depend on the *value* of one of the arguments.

I understand where this concept originates and usual apply this
rule to software design as well, however, in the particular case
of codecs, the codec registry and its helper functions are merely
interfaces to code that is defined elsewhere.

In comparison, the approach is very much like getattr() - you know
what the attribute is called, but know nothing about its type
until you receive it from the function.

The reason codecs where designed like this was to be able to
easily stack them. For this to work, only the interfaces need
to be defined, without restricting the codecs too much in terms
of which types may be used.

I'd suggest to lift the type restrictions from the general
codecs.c access APIs (PyCodec_*), since they don't really belong
there and instead only impose the limitation on PyUnicode and
PyString methods .encode() and .decode().

If you then also allow those methods to return *both*
PyUnicode and PyString, you'd still have strong typing
(only 1 of two possible types is allowed) and stacking
streams or having codecs that work on PyUnicode->PyUnicode
or PyString->PyString would still be accessible via
.encode()/.decode().

> >> I think you have to ask another question: Is repr() allowed to
> >> return a string (instead of Unicode) in Py3k ?
> >
> > In Py3k, "strings" *are* unicode. The str data type is Unicode.
>
> With "strings" I always refer to 8-bit strings, ie. 8-bit data that
> is encoded in some encoding.

You will have to change this habit or you will thoroughly confuse both
users and developers of 3.0. "String" refers to the built-in "str"
type which in Py3k is PyUnicode. For the PyString type we use the
built-in type "bytes".

Well, I'm confused by the P3k use of terms (esp. because the
C type names don't match the Python ones), which is why I'm
talking about 8-bit strings and Unicode.

Perhaps it's better to use PyString and PyUnicode.

> > If you're asking about repr() possibly returning a bytes instance,
> > definitely not.
> >
> >> If not, then unicode_repr() will have to check the return value of
> >> the codec and convert it back to Unicode as necessary.
> >
> > What codec?
>
> The idea is to have a codec which takes the Unicode object and
> converts it to its repr()-value.
>
> Now, since you apparently cannot
> go the direct way anymore (ie. have the codec encode Unicode to
> Unicode), you'd have to first use a codec which converts the Unicode
> object to its repr()-value represented as bytes object and then
> convert the bytes object back to Unicode in unicode_repr().
>
> With the original design, this extra step wouldn't have been
> necessary.

Why does everything have to be a codec?

It doesn't. It's just that codecs are so easy to add, change
and adjust that reusing the existing code is more attractive
than reinventing the wheel every time you need to make
a conversion from one text form to another adjustable in
some way.

In the case addresses by this ticket, I see the usefulness
of having native language being written to the console using
native glyphs, but there are so many drawbacks to this (see the
discussion on the ticket and the mailing list), that
I think there needs to be a way to adjust the mechanism
or at least be able to revert to the existing repr() output.

Furthermore, a codec implementation of what Atsuo has in mind
would also be useful in other contexts, e.g. where you want
to write PyUnicode to a stream without introducing line breaks.

gvanrossum · 2008-05-08T17:19:54Z

I'd be happy to have a separate more relaxed API for stackable codecs,
however, the API should not be overloaded on the .encode() and .decode()
methods on str and bytes objects.

atsuoishimoto · 2008-05-27T12:55:51Z

I updated a patch as per latest PEP.

io.TextIOWrapper doesn't provide API to change error handler
at this time. I should update this patch after the API is
provided.
This patch contains a fix for Tools/unicode/makeunicodedata.py
in rev 63378.

atsuoishimoto · 2008-05-28T07:39:37Z

docdiff1.txt contains a documentation for functions I added.

atsuoishimoto · 2008-06-01T12:53:36Z

diff5.txt contains both code and documentation patch for PEP-3138.

In this patch, default error-handler of sys.stdout is always 'strict'.

birkenfeld · 2008-06-03T10:13:49Z

Review:

Why is an empty string not printable? In any case, the empty string
should be among the test cases for isprintable().
Why not use PyUnicode_DecodeASCII instead of
PyUnicode_FromEncodedObject? It should be a bit faster.
If old-style string formatting gets "%a", .format() must get a "!a"
specifier.
The ascii() and repr() tests should be expanded so that both test the
same set of objects, and the expected differences. Are there tests for
failing cases?
This is just "return ascii" (in builtin_ascii):
+ if (ascii == NULL)
+ return NULL;
+
+ return ascii;
For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE
and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c
seems to use them too, probably a legacy.)
There appear to be some space indentations in tab-indented files like
bltinmodule.c and vice versa (unicodeobject.c).
C docs/isprintable() docs: The spec
+ Characters defined in the Unicode character database as "Other"
+ or "Separator" other than ASCII space(0x20) are not considered
+ printable.
is unclear, better say "All character except those ... are considered
printable".
ascii() docs:
+ the non-ASCII
+ characters in the string returned by :func:`ascii`() are hex-escaped
+ to generate a same string as :func:`repr` in Python 2.

should be

"the non-ASCII characters in the string returned by :func:`repr` are
backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...".

makeunicodedata: len(list(n for n in names if n is not None)) could
better be expressed as sum(1 for n in names if n is not None).

Otherwise, the patch is fine IMO. (I'm surprised that only so few tests
needed adaptation, that's a sign that we're not testing Unicode enough.)

birkenfeld · 2008-06-03T10:31:22Z

One more thing: with r63891 the encoding and errors arguments for the
creation of sys.stderr were made configurable; you'll have to adapt the
patch so that it defaults to backslashescape but can be overridden by
PYTHONIOENCODING.

atsuoishimoto · 2008-06-03T10:33:02Z

This patch contains following changes.

Added the new C API PyObject_ASCII() for consistency.
Added the new string formatting operater for str.format() and
PyUnicode_FromFormat.

atsuoishimoto · 2008-06-03T11:00:48Z

Thank you for your review!
I filed a new patch just before I see your comments.

On Tue, Jun 3, 2008 at 7:13 PM, Georg Brandl <report@bugs.python.org> wrote:

Georg Brandl <georg@python.org> added the comment:

Review:

Why is an empty string not printable? In any case, the empty string
should be among the test cases for isprintable().

Well, my intuition came from str.islower() was wrong. An empty string is
printable, of cource.

Why not use PyUnicode_DecodeASCII instead of
PyUnicode_FromEncodedObject? It should be a bit faster.

Okay, thank you.

If old-style string formatting gets "%a", .format() must get a "!a"
specifier.

I added the format string in my latest patch.

The ascii() and repr() tests should be expanded so that both test the
same set of objects, and the expected differences. Are there tests for
failing cases?

Okay, thank you.

This is just "return ascii" (in builtin_ascii):
  if (ascii == NULL)
      return NULL;
  return ascii;

Fixed in my latest patch.

For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE
and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c
seems to use them too, probably a legacy.)

Okay, thank you.

There appear to be some space indentations in tab-indented files like
bltinmodule.c and vice versa (unicodeobject.c).

I think bltinmodule.c is fixed with latest patch, but I don't know what
is correct indentation for unicodeobject.c. I guess latest patch is
acceptable.

C docs/isprintable() docs: The spec

Characters defined in the Unicode character database as "Other"

or "Separator" other than ASCII space(0x20) are not considered

printable.
is unclear, better say "All character except those ... are considered
printable".

ascii() docs:

the non-ASCII

characters in the string returned by :func:`ascii`() are hex-escaped

to generate a same string as :func:`repr` in Python 2.

should be

"the non-ASCII characters in the string returned by :func:`repr` are
backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...".

Okay, thank you.

makeunicodedata: len(list(n for n in names if n is not None)) could
better be expressed as sum(1 for n in names if n is not None).

I don't want to change here, because this is reversion of rev 63378.

One more thing: with r63891 the encoding and errors arguments for the
creation of sys.stderr were made configurable; you'll have to adapt the
patch so that it defaults to backslashescape but can be overridden by
PYTHONIOENCODING.

I think sys.stderr should be default to 'backslashreplace' always. I'll
post a messege to Py3k-list later.

Otherwise, the patch is fine IMO. (I'm surprised that only so few tests
needed adaptation, that's a sign that we're not testing Unicode enough.)

Thank you very much! I'll file new patch soon.

atsuoishimoto · 2008-06-03T11:06:49Z

BTW, are new C APIs and functions should be ported to Python 2.6 for
compatibility, without modifing repr() itself? If so, I'll prepare a
patch for Python 2.6.

birkenfeld · 2008-06-03T11:10:11Z

ascii() should probably be in future_builtins.

Whether the C API stuff and .isprintable() should be backported to 2.6
is something for Guido to decide.

atsuoishimoto · 2008-06-03T17:50:11Z

I updated the patch as per Georg's advice.

atsuoishimoto · 2008-06-03T18:05:13Z

I'm sorry, I missed a file to be uploaded. diff7_1.txt is correct file.

gvanrossum · 2008-06-03T18:48:04Z

Whether the C API stuff and .isprintable() should be backported to 2.6
is something for Guido to decide.

No way -- while all of this makes sense in Py3k, where all strings are
Unicode, it would cause no end of problems in 2.6, and it would break
backward compatibility badly.

atsuoishimoto · 2008-06-04T17:52:32Z

stringlib can be compiled for Python 2.6 now, but the '!a' converter is
disabled by #ifdef for now.

pitrou · 2008-06-04T21:30:58Z

Shall the method be called isprintable() or simply printable()? For the
record, in the io classes, the writable()/readable() convention was chosen.

birkenfeld · 2008-06-04T21:34:43Z

I would expect "abc".isprintable() give me a bool and "abc".printable()
to return a printable string, as with "abc".lower() and "abc".islower().

pitrou · 2008-06-04T21:36:50Z

You are right, I had forgotton about lower()/islower().

birkenfeld · 2008-06-11T18:38:54Z

Patch committed to Py3k branch in r64138. Thanks all!

atsuoishimoto · 2008-06-12T02:44:46Z

Great, thank you!

atsuoishimoto mannequin added the type-feature A feature request or enhancement label Apr 14, 2008

birkenfeld closed this as completed Jun 11, 2008

ezio-melotti transferred this issue from another repository Apr 10, 2022

repr() should not escape non-ASCII characters #46882

repr() should not escape non-ASCII characters #46882

Comments

atsuoishimoto mannequin commented Apr 14, 2008

atsuoishimoto mannequin commented Apr 14, 2008

gvanrossum commented Apr 14, 2008

amauryfa commented Apr 14, 2008

atsuoishimoto mannequin commented Apr 15, 2008

atsuoishimoto mannequin commented Apr 15, 2008

gvanrossum commented Apr 15, 2008

atsuoishimoto mannequin commented Apr 15, 2008

atsuoishimoto mannequin commented Apr 15, 2008

atsuoishimoto mannequin commented Apr 16, 2008

gvanrossum commented Apr 16, 2008

atsuoishimoto mannequin commented Apr 16, 2008

malemburg commented Apr 16, 2008

atsuoishimoto mannequin commented Apr 17, 2008

atsuoishimoto mannequin commented Apr 18, 2008

malemburg commented Apr 18, 2008

atsuoishimoto mannequin commented May 4, 2008

gvanrossum commented May 5, 2008

gvanrossum commented May 5, 2008

atsuoishimoto mannequin commented May 6, 2008

gvanrossum commented May 6, 2008

malemburg commented May 6, 2008

atsuoishimoto mannequin commented May 6, 2008

gvanrossum commented May 6, 2008

malemburg commented May 8, 2008

gvanrossum commented May 8, 2008

atsuoishimoto mannequin commented May 27, 2008

atsuoishimoto mannequin commented May 28, 2008

atsuoishimoto mannequin commented Jun 1, 2008

birkenfeld commented Jun 3, 2008

birkenfeld commented Jun 3, 2008

atsuoishimoto mannequin commented Jun 3, 2008

atsuoishimoto mannequin commented Jun 3, 2008

atsuoishimoto mannequin commented Jun 3, 2008

birkenfeld commented Jun 3, 2008

atsuoishimoto mannequin commented Jun 3, 2008

atsuoishimoto mannequin commented Jun 3, 2008

gvanrossum commented Jun 3, 2008

atsuoishimoto mannequin commented Jun 4, 2008

pitrou commented Jun 4, 2008

birkenfeld commented Jun 4, 2008

pitrou commented Jun 4, 2008

birkenfeld commented Jun 11, 2008

atsuoishimoto mannequin commented Jun 12, 2008