classification
Title: string.printable.isprintable() returns False
Type: behavior Stage:
Components: Documentation, Library (Lib), Unicode Versions: Python 3.4, Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: akira, bru, docs@python, ezio.melotti, georg.brandl, planet36, r.david.murray, steven.daprano, vstinner
Priority: normal Keywords: patch

Created on 2014-12-09 03:52 by planet36, last changed 2014-12-13 15:30 by akira.

Files
File name Uploaded Description Edit
bug-string-ascii.py planet36, 2014-12-09 03:51 Test case shows that string.printable has control characters
0001-Fix-string.printable-respect-POSIX-spec.patch bru, 2014-12-09 14:42 review
docs-string.printable.diff akira, 2014-12-13 15:30 review
Messages (4)
msg232343 - (view) Author: Steve Ward (planet36) Date: 2014-12-09 03:51
string.printable includes all whitespace characters.  However, the only whitespace character that is printable is the space (0x20).


By definition, the only ASCII characters considered printable are:
    alphanumeric characters
    punctuation characters
    the space character (not all whitespace characters)


Source:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap07.html#tag_07_03

7.2 POSIX Locale

Conforming systems shall provide a POSIX locale, also known as the C locale.


7.3.1 LC_CTYPE

space
    Define characters to be classified as white-space characters.

    In the POSIX locale, exactly <space>, <form-feed>, <newline>, <carriage-return>, <tab>, and <vertical-tab> shall be included.

cntrl
    Define characters to be classified as control characters.

    In the POSIX locale, no characters in classes alpha or print shall be included.

graph
    Define characters to be classified as printable characters, not including the <space>.

    In the POSIX locale, all characters in classes alpha, digit, and punct shall be included; no characters in class cntrl shall be included.

print
    Define characters to be classified as printable characters, including the <space>.

    In the POSIX locale, all characters in class graph shall be included; no characters in class cntrl shall be included.


LC_CTYPE Category in the POSIX Locale

# "print" is by default "alnum", "punct", and the <space>
msg232376 - (view) Author: Bruno Cauet (bru) * Date: 2014-12-09 14:42
Here is a simple fix for the issue, plus a test.
It does not break any unit test but this raises a backwards-compatibility problem. Therefore I wouldn't advise using it for Python 3.4 but only 3.5+.
msg232382 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-12-09 15:14
This is a bit of a conundrum.  Our (string module) definition of printable is very clear, and it includes the other whitespace characters.

We could document that this does not match the posix definition of printable.  It also does not match the RFC 5822 definition of printable (for example), which does *not* include whitespace characters (not even space), but the posix definition is a more likely source of confusion.

isprintable is a newer function than string.printable, and serves a different purpose.  I suppose that when PEP 3138 was written and implemented the disconnect between the two definitions was not noticed.

For backward compatibility reasons I suspect we are stuck with the discrepancy, but perhaps others will think it worth the pain of changing string.printable.  I kind of doubt it, though.
msg232613 - (view) Author: Akira Li (akira) * Date: 2014-12-13 15:30
C standard defines locale-specific *printing characters* that are [ -~]
in "C" locale for implementations that use 7-bit US ASCII character set
i.e., SP (space, 0x20) is a printing character in C (isprint() returns
nonzero).

There is isgraph() function that returns zero for the space but
otherwise is equivalent to isprint().

POSIX definition is aligned with the ISO C standard.

I don't know what RFC 5822 has to do with this issue but the rfc
contradicts itself e.g., in one place it has: "printable US-ASCII
characters except SP" that imlies that SP *is* printable but in other
places it considers isprint==isgraph. The authors probably meant
characters for which isgraph() is nonzero when they use "printable
US-ASCII" (that is incorrect according to C standard).

Tests from issue9770 show the relation between C character classes and
string constants [1]:

  set(string.printable) == set(C['graph']) + set(C['space'])

where C['space'] is '\t\n\v\f\r ' (the standard C whitespace).

It is a documented behavior [2]:

  This is a combination of digits, ascii_letters, punctuation,
  and whitespace

where *whitespace* is C['space'].

In Python 2, *printable* is locale-dependent and it coincides with the
corresponding Python 3 definition in "C" locale with ASCII charset.

Unlike other string constants, *printable* differs from C['print'] on
both Python 2 and 3 because it includes whitespace characters other than
space.

str.isprintable [3] obeys C['print'] (in ASCII range) and considers SP
to be printable.

---

It might be too late to change string.printable to correspond to C
isprint() (for ASCII characters).

I've uploaded a documentation patch that mentions that string.printable
and str.isprintable differ.

[1] http://bugs.python.org/review/9770/diff/12212/Lib/test/test_curses_ascii.py
[2] https://hg.python.org/cpython/file/3.4/Doc/library/string.rst#l62
[3] https://docs.python.org/3.4/library/stdtypes.html#str.isprintable
History
Date User Action Args
2014-12-13 15:30:05akirasetfiles: + docs-string.printable.diff
nosy: + akira
messages: + msg232613

2014-12-13 03:32:41steven.dapranosetnosy: + steven.daprano
2014-12-09 15:38:49vstinnersetnosy: + ezio.melotti, vstinner
components: + Unicode
2014-12-09 15:14:34r.david.murraysetnosy: + r.david.murray, docs@python
messages: + msg232382

assignee: docs@python
components: + Documentation
2014-12-09 14:42:29brusetfiles: + 0001-Fix-string.printable-respect-POSIX-spec.patch
versions: + Python 3.5
nosy: + bru

messages: + msg232376

keywords: + patch
2014-12-09 11:50:38serhiy.storchakasetnosy: + georg.brandl
2014-12-09 03:52:01planet36create