Message 109523 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lemburg
Recipients	Rhamphoryncus, amaury.forgeotdarc, bupjae, ezio.melotti, lemburg, vstinner
Date	2010-07-08.09:15:59
SpamBayes Score	0.008116373
Marked as misclassified	No
Message-id	<4C35974D.7020309@egenix.com>
In-reply-to	<1278577000.33.0.00795425454255.issue5127@psf.upfronthosting.co.za>

Content
Ezio Melotti wrote: > > Ezio Melotti <ezio.melotti@gmail.com> added the comment: > > [This should probably be discussed on python-dev or in another issue, so feel free to move the conversation there.] > > The current implementation considers printable """all the characters except those characters defined in the Unicode character database as following categories are considered printable. > * Cc (Other, Control) > * Cf (Other, Format) > * Cs (Other, Surrogate) > * Co (Other, Private Use) > * Cn (Other, Not Assigned) > * Zl Separator, Line ('\u2028', LINE SEPARATOR) > * Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR) > * Zs (Separator, Space) other than ASCII space('\x20').""" > > We could also arbitrary exclude all the non-BMP chars, but that shouldn't be based on the availability of the fonts IMHO. Without fonts, you can't print the code points, even if the Unicode database defines the code point as not having one of the above classes. And that's probably also the reason why the Unicode database doesn't define a printable property :-) I also find the use of Zl, Zp and Zs in the definition somewhat arbitrary: whitespace is certainly printable. This also doesn't match the isprint() C lib API: http://www.cplusplus.com/reference/clibrary/cctype/isprint/ "A printable character is any character that is not a control character." >> Note that Python3 will send printable code points as-is to the >> console, so whether or not a code point is considered printable >> should take the common availability of fonts being able to display >> the code point into account. Otherwise, a user would just see a >> square box instead of the much more useful escape sequence > > If the concern is about the usefulness of repr() in the console, note that on the Windows terminal trying to display most of the characters results in an error (see #5110), and that makes repr() barely usable. > ascii() might be an alternative if the user wants to see the escape sequence instead of a square box. That's a different problem, but indeed also related to the printable property which was introduced as part of the Unicode repr() change: if the console encoding cannot represent the printable code points, you get an error. I was never a fan of the Unicode repr() change to begin with. The repr() of an object should work in almost all cases. Being able to read the repr() of an object in clear text is only secondary. IMHO, allowing all printable code points to pass through unescaped was not beneficial. We have str() for getting readable representations of objects. Anyway, we're stuck with it now, so have to work around the issues...

Ezio Melotti wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> [This should probably be discussed on python-dev or in another issue, so feel free to move the conversation there.]
> 
> The current implementation considers printable """all the characters except those characters defined in the Unicode character database as following categories are considered printable.
>   * Cc (Other, Control)
>   * Cf (Other, Format)
>   * Cs (Other, Surrogate)
>   * Co (Other, Private Use)
>   * Cn (Other, Not Assigned)
>   * Zl Separator, Line ('\u2028', LINE SEPARATOR)
>   * Zp Separator, Paragraph ('\u2029', PARAGRAPH SEPARATOR)
>   * Zs (Separator, Space) other than ASCII space('\x20')."""
> 
> We could also arbitrary exclude all the non-BMP chars, but that shouldn't be based on the availability of the fonts IMHO.

Without fonts, you can't print the code points, even if the Unicode
database defines the code point as not having one of the above
classes. And that's probably also the reason why the Unicode
database doesn't define a printable property :-)

I also find the use of Zl, Zp and Zs in the definition somewhat
arbitrary: whitespace is certainly printable. This also doesn't
match the isprint() C lib API:

http://www.cplusplus.com/reference/clibrary/cctype/isprint/

"A printable character is any character that is not a control character."

>> Note that Python3 will send printable code points as-is to the
>> console, so whether or not a code point is considered printable
>> should take the common availability of fonts being able to display
>> the code point into account. Otherwise, a user would just see a
>> square box instead of the much more useful escape sequence
> 
> If the concern is about the usefulness of repr() in the console, note that on the Windows terminal trying to display most of the characters results in an error (see #5110), and that makes repr() barely usable.
> ascii() might be an alternative if the user wants to see the escape sequence instead of a square box.

That's a different problem, but indeed also related to the
printable property which was introduced as part of the Unicode repr()
change: if the console encoding cannot represent
the printable code points, you get an error.

I was never a fan of the Unicode repr() change to begin with. The
repr() of an object should work in almost all cases. Being able to
read the repr() of an object in clear text is only secondary.
IMHO, allowing all printable code points to pass through unescaped
was not beneficial. We have str() for getting readable representations
of objects. Anyway, we're stuck with it now, so have to work
around the issues...

History
Date	User	Action	Args
2010-07-08 09:16:03	lemburg	set	recipients: + lemburg, amaury.forgeotdarc, Rhamphoryncus, vstinner, ezio.melotti, bupjae
2010-07-08 09:16:00	lemburg	link	issue5127 messages
2010-07-08 09:15:59	lemburg	create