This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: whitespace in strip()/lstrip()/rstrip()
Type: enhancement Stage: patch review
Components: Documentation Versions: Python 3.8, Python 3.7, Python 2.7
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Dimitri Papadopoulos Orfanos, docs@python, edmundselliot@gmail.com, ezio.melotti, joel.johnson, jwilk, nitishch, oulenz, rompe
Priority: normal Keywords: easy, patch

Created on 2015-10-18 12:15 by Dimitri Papadopoulos Orfanos, last changed 2022-04-11 14:58 by admin.

Files
File name Uploaded Description Edit
whitespace_regex.py edmundselliot@gmail.com, 2018-08-28 18:25
Pull Requests
URL Status Linked Edit
PR 14753 closed python-dev, 2019-07-13 15:51
PR 14771 open rompe, 2019-07-14 09:49
PR 14775 closed rompe, 2019-07-14 11:58
Messages (8)
msg253152 - (view) Author: Dimitri Papadopoulos Orfanos (Dimitri Papadopoulos Orfanos) Date: 2015-10-18 12:15
The documentation of strip() / lstrip() / rstrip() should define "whitespace" more precisely.

The Python 3 documentation refers to "ASCII whitespace" for bytes.strip() / bytes.lstrip() / bytes.rstrip() and "whitespace" for str.strip() / str.lstrip() / str.rstrip(). I suggest the following improvements:
* add a link from "ASCII whitespace" to string.whitespace or bytes.isspace(),
* define plain "whitespace" more precisely (possibly with a link to str.isspace()).

The Python 2 documentation refers to plain "whitespace". As far as I know strip() removes ASCII whitespaces only. If so, please:
* add a link to string.whitespace or str.isspace(),
* improve the string.whitespace documentation and explain that it is locale-dependent (see documentation of str.isspace()).
msg257449 - (view) Author: Dimitri Papadopoulos Orfanos (Dimitri Papadopoulos Orfanos) Date: 2016-01-04 09:42
In Python 2, as far as I can understand, string.whitespace and str.isspace() are different:
* str.isspace() is built upon the C isspace() function and is therefore locale-dependant. Python heavily relies on isspace() to detect "whitespace" characters.
* string.whitespace is a list of "ASCII whitespace characters" carved in stone. As far as I can see string.whitespace is defined but not used anywhere in Python source code.

See source code:
* Modules/stringobject.c around line 3319:
  [...]
  string_isspace(PyStringObject *self)
  {
  [...]
      e = p + PyString_GET_SIZE(self);
      for (; p < e; p++) {
          if (!isspace(*p))
              return PyBool_FromLong(0);
      }
      return PyBool_FromLong(1);
  [...]
* Lib/string.py near line 23:
  whitespace = ' \t\n\r\v\f'

Functions strip()/lstrip()/rstrip() use str.isspace() and have nothing to do with string.whitespace:

* Modules/stringobject.c around line 1861:
[...]
do_strip(PyStringObject *self, int striptype)
{
[...]
    i = 0;
    if (striptype != RIGHTSTRIP) {
        while (i < len && isspace(Py_CHARMASK(s[i]))) {
            i++;
        }
    }
[...]

Therefore I suggest the documentation of Python 2.7 points to str.isspace() wherever the term "whitespace" is used in the documentation - including this specific case of strip()/lstrip()/rstrip().
msg257450 - (view) Author: Dimitri Papadopoulos Orfanos (Dimitri Papadopoulos Orfanos) Date: 2016-01-04 10:06
In Python 3 the situation is similar:
* The Py_UNICODE_ISSPACE macro is used internally to define str.isspace() and wherever Python needs to detect "whitespace" characters in strings.
* There is an equivalent function Py_ISSPACE for bytes/bytearray.
* The bytearray.strip() implementation for bytearray relies on hardcoded ASCII whitespaces instead of Py_ISSPACE.
* string.whitespace is a list of "ASCII whitespace characters" carved in stone. As far as I can see string.whitespace is defined but not used anywhere in Python source code.

Therefore I suggest the documentation of Python 3 points to str.isspace() wherever the term "whitespace" is used in any documentation related to strings - including this specific case of strip()/lstrip()/rstrip().
msg314668 - (view) Author: Joel Johnson (joel.johnson) Date: 2018-03-29 19:59
I have started working on this and will have a pull request submitted by the end of the week. 

The term "whitespace" appears in several contextual situations throughout the documentation. While all situations would benefit from the definition of "whitespace" contained in the str.isspace() documentation, not all of the situations would benefit from a link to str.isspace() whose primary goal is to document the str.isspace() function and not to provide a global definition of what a whitespace character is.

Therefore I suggest the documentation of Python 3 create a new glossary definition of "whitespace" (which contains the definition currently in the str.isspace() documentation) and is pointed to wherever the term "whitespace" is used in any documentation related to strings - including this specific case of strip()/lstrip()/rstrip().
msg314681 - (view) Author: Dimitri Papadopoulos Orfanos (Dimitri Papadopoulos Orfanos) Date: 2018-03-30 06:39
I agree on avoiding a link to str.isspace() and defining "whitespace" instead.

However please note there are many de facto definitions of "whitespace". All of them must be documented - or at least the conceptual classes of "whitespace" and clarify which class each of the following belongs to:

* Unicode whitespaces are by very far the most common: str.isspace(), strip()/lstrip()/rstrip(), Py_UNICODE_ISSPACE.

* Py_ISSPACE targets byte/bytearray but is never used!

* bytearray.strip() does not use Py_ISSPACE but a hardcoded list of ASCII whitespaces instead.

* finally string.whitespace is probably equivalent to the list used by bytearray.strip().

Beyond the docs, I think Python 3 should rationalize bytearray.strip() /  Py_ISSPACE / string.whitespace, probably having bytearray.strip() rely on Py_ISSPACE, and Py_ISSPACE rely on string.whitespace unless string.whitespace is obsoleted.
msg315193 - (view) Author: Oliver Urs Lenz (oulenz) Date: 2018-04-11 14:49
Slightly tangential, but it would be great if the documentation of lstrip() and rstrip() could include an equivalent definition in terms of re.sub(), e.g.:

lstrip(foo) == re.sub(r'(?u)\A\s*', '', foo)
rstrip(foo) == re.sub(r'(?u)\s*\Z', '', foo)

(Or whatever else is correct.)
msg324272 - (view) Author: Elliot Edmunds (edmundselliot@gmail.com) Date: 2018-08-28 18:25
Not sure how helpful it would be to have the re.sub expressions for lstrip and rstrip, but I think it would look like:

l_stripped = re.sub(r'^\s*', '', foo)
r_stripped = re.sub(r'\s*$', '', foo)
msg347853 - (view) Author: Ulf Rompe (rompe) * Date: 2019-07-13 18:38
Using a re.sub() call as documentation:

1. wouldn't be helpful for many developers. If they need to look up the documentation of a simple method they shouldn't be forced to learn about a more complex one as well to understand it.
2. would be wild guessing since the re module defines its own whitespace as well.

I have created a pull request that aligns the strip methods of bytearray to those of bytes objects. The implementation found there is cleaner and even a little bit faster.

Current master:

./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.lstrip()'
1000000 loops, best of 5: 245 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.rstrip()'
1000000 loops, best of 5: 245 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.strip()'
1000000 loops, best of 5: 260 nsec per loop

Using my patch:

./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.lstrip()'
1000000 loops, best of 5: 235 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.rstrip()'
1000000 loops, best of 5: 235 nsec per loop
./python -m timeit -s 'bla = bytearray(b"  foo  ")' 'bla.strip()'
1000000 loops, best of 5: 239 nsec per loop


I have also updated the documentation, adding "whitespace" to the glossary and linking to it from many places in the documentation of standard types.
History
Date User Action Args
2022-04-11 14:58:22adminsetgithub: 69619
2019-07-14 11:58:14rompesetpull_requests: + pull_request14569
2019-07-14 09:49:57rompesetpull_requests: + pull_request14564
2019-07-13 18:38:27rompesetnosy: + rompe
messages: + msg347853
2019-07-13 15:51:33python-devsetkeywords: + patch
stage: needs patch -> patch review
pull_requests: + pull_request14548
2018-08-28 18:25:07edmundselliot@gmail.comsetfiles: + whitespace_regex.py
nosy: + edmundselliot@gmail.com
messages: + msg324272

2018-04-11 14:49:18oulenzsetnosy: + oulenz
messages: + msg315193
2018-03-31 22:32:24jwilksetnosy: + jwilk
2018-03-31 20:57:20nitishchsetnosy: + nitishch
2018-03-30 06:39:52Dimitri Papadopoulos Orfanossetmessages: + msg314681
2018-03-29 19:59:00joel.johnsonsetnosy: + joel.johnson
messages: + msg314668
2018-03-19 17:51:21cheryl.sabellasetversions: + Python 3.7, Python 3.8, - Python 3.5, Python 3.6
2016-01-04 10:06:11Dimitri Papadopoulos Orfanossetmessages: + msg257450
2016-01-04 09:42:13Dimitri Papadopoulos Orfanossetmessages: + msg257449
2016-01-04 03:41:04ezio.melottisetkeywords: + easy
nosy: + ezio.melotti
stage: needs patch

versions: + Python 3.6
2015-10-18 12:15:36Dimitri Papadopoulos Orfanoscreate