classification
Title: csv.skipinitialspace only skips spaces, not "whitespace" in general
Type: behavior Stage: patch review
Components: Documentation, Library (Lib) Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: docs@python Nosy List: Andy.Almonte, Daniel.Andersson, berker.peksag, docs@python, jbmilam, terry.reedy
Priority: normal Keywords: patch

Created on 2014-04-18 11:52 by Daniel.Andersson, last changed 2021-03-25 22:59 by iritkatriel.

Files
File name Uploaded Description Edit
csv_skipinitialspace_testing.py jbmilam, 2015-05-29 20:31 test code
csv_skipinitialspace_testing.csv jbmilam, 2015-05-29 20:31 csv file for the test code
csv_skipinitialspace_docfix.patch jbmilam, 2015-05-29 20:32 patch fix review
skipinitialspace_test.patch jbmilam, 2015-06-18 22:24 review
Messages (6)
msg216780 - (view) Author: Daniel Andersson (Daniel.Andersson) Date: 2014-04-18 11:52
Regarding the `skipinitialspace` parameter to the different CSV reader dialects in the `csv` module, the official documentation asserts:

    When True, whitespace immediately following the delimiter is ignored.

and the `help(csv)` style module documentation says:

    * skipinitialspace - specifies how to interpret whitespace which
      immediately follows a delimiter.  It defaults to False, which
      means that whitespace immediately following a delimiter is part
      of the following field.

"Whitespace" is a bit too general in both cases (at least a red herring in the second case), since it only skips spaces and not e.g. tabs [1].

In `Modules/_csv.c`, it more correctly describes the parameter. At line 81:

    int skipinitialspace;       /* ignore spaces following delimiter? */

and the actual implementation at line 638:

    else if (c == ' ' && dialect->skipinitialspace)
        /* ignore space at start of field */
        ;

No-one will probably assume that the whole UTF-8 spectrum of "whitespace" is skipped, but at least I initially assumed that the tab character was included.

[1]: http://en.wikipedia.org/wiki/Whitespace_character
msg216820 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2014-04-19 00:35
Do I understand correctly that only one space is ignored?
msg216907 - (view) Author: Daniel Andersson (Daniel.Andersson) Date: 2014-04-20 15:47
No, multiple spaces are ignored as advertised (according to actual tests; not just reading the code), but only spaces (U+0020) and not e.g. tabs (U+0009), which are also included in the term "whitespace", along with several other characters.

In light of your followup question, the internal comment at `Modules/_csv.c`, line 639:

    /* ignore space at start of field */

could perhaps be clarified to say "spaces" instead of "space", but the code context makes it quite clear, and it does not face the users anyway. The main point of this issue is meant to be the wording in the module docstring and the official docs regarding "whitespace" contra "space".
msg244415 - (view) Author: Brandon Milam (jbmilam) * Date: 2015-05-29 20:31
This code shows what Daniel Andersson was talking about. I changed the "whitespace" references in the documentation that Daniel mentioned to say spaces. Also I changed "ignore space at the start of the field" to "ignore spaces at the start of the field" due to Terry's confusion.

Let me know of any errors or extra changes that are needed.
msg244866 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-06-05 12:33
The patch looks good to me, thanks! Could you also convert your test script to a test case and add it in Lib/test/test_csv.py?
msg245484 - (view) Author: Brandon Milam (jbmilam) * Date: 2015-06-18 22:24
This is my first attempt at working with the test suite but I believe this is what you were asking for. Due to this being my first attempt at writing tests I have included it as a separate patch file. Any further changes just let me know.
History
Date User Action Args
2021-03-25 22:59:21iritkatrielsetversions: + Python 3.8, Python 3.9, Python 3.10, - Python 2.7, Python 3.4, Python 3.5, Python 3.6
2015-06-18 22:24:47jbmilamsetfiles: + skipinitialspace_test.patch

messages: + msg245484
2015-06-05 12:33:54berker.peksagsetversions: + Python 3.6
nosy: + berker.peksag

messages: + msg244866

stage: needs patch -> patch review
2015-05-29 20:32:18jbmilamsetfiles: + csv_skipinitialspace_docfix.patch
keywords: + patch
2015-05-29 20:31:58jbmilamsetfiles: + csv_skipinitialspace_testing.csv
2015-05-29 20:31:04jbmilamsetfiles: + csv_skipinitialspace_testing.py
nosy: + jbmilam
messages: + msg244415

2014-06-22 18:18:54Andy.Almontesetnosy: + Andy.Almonte
2014-04-20 15:47:57Daniel.Anderssonsetmessages: + msg216907
2014-04-19 00:35:26terry.reedysetnosy: + terry.reedy
title: skipinitialspace in the csv module only skips spaces, not "whitespace" in general -> csv.skipinitialspace only skips spaces, not "whitespace" in general
messages: + msg216820

versions: - Python 3.1, Python 3.2, Python 3.3
stage: needs patch
2014-04-18 11:52:21Daniel.Anderssoncreate