Message 370359 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	danieljewell
Recipients	bobbyocean, danieljewell, r.david.murray, serhiy.storchaka, skip.montanaro
Date	2020-05-30.06:06:28
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1590818789.14.0.720058282601.issue27580@roundup.psfhosted.org>
In-reply-to

Content
Forgive my frustration, but @Skip I really don't see how the definition of CSV relating to Excel (or Gnumeric or LibreOffice) has any relevance as to whether or not the module (and perhaps Python more generally) supports chr(0x00) as a delimiter. (Neither you nor I get to decide how someone else might write output data...) While the module is called CSV, it's really not just Comma Separated Values - rather it's a rough approximation of a database table with an optional header row where rows/records are separated by <record separator> and fields are separated by <field separator>. Sometimes the record separator is chr(0x2c) (e.g. a comma) sometimes it's chr(0x09) (e.g. a tab - or in ASCII parlance "Horizontal Tab/HT") ... or maybe even the actual ASCII "Record Separator" character (e.g. chr(0x1e)) ... or maybe NUL chr(0x00). (1) The module should be 100% agnostic about the separator - the current (3.8.3) error text when trying to use csv.reader(..., delimiter=chr(0x00)) is "TypeError: "delimiter" must be a 1-character string" ... well, chr(0x00) is a 1-character string. It's not a 1-character printable string... But then again neither is chr(0x1e) (ASCII "RS" Record Separator) .. and csv.reader(..., delimiter=chr(0x1e)) appears to work (I haven't tried actual data yet). (1a) The use of chr(0x00) or '\0' is used quite often in the NIX world as a convenient record separator that doesn't have escaping problems because by it's very nature it's non-printable. e.g. "find . -iname "something" -print0 \| xargs -0 <program>" ... As to the difficulty in handling 0x00 characters, I dunno ... it appears that GNU find, xargs, gawk... same with FreeBSD. FreeBSD writes the output for "-print0" like this: https://github.com/freebsd/freebsd/blob/508f3673dec94b03f89b9ce9569390d6d9b86a89/usr.bin/find/function.c#L1383 ... and bsd xargs handles it too. I haven't looked at the CPython source to see what's going on - it might be tricky to modify the code to support this... (but then again, IMHO, this sort of thing should have been a consideration in the first place....) I suppose in many ways, the very existence of this specific issue at all is just one example of what seems to be a larger issue with Python's overall development: It's a great language for many* things and in many ways. But I've run into so many little fringe "gotchas" where something doesn't work or is limited in some way because, seemingly, functionality is designed around/defined by a practical-example-use-case and not what is or might be possible (e.g. the CSV-as-only-a-spreadsheet-interface example -- and I really don't mean that as a personal attack @Skip - I am very appreciative of the time and effort you and everyone has poured into the project...) Is it possible to write a NUL (0x00) character to a file? Through a *NIX pipe? You bet. (I got a little rant-y .. sorry... I'm sure there's a _lot_ more going on underneath the covers and there are a lot of factors - not limited to just the csv module - as you mentioned. I just really feel like something is "off". Maybe it's my brain - ha. :))

Forgive my frustration, but @Skip I really don't see how the definition of CSV relating to Excel (or Gnumeric or LibreOffice) has any relevance as to whether or not the module (and perhaps Python more generally) supports chr(0x00) as a delimiter. (Neither you nor I get to decide how someone else might write output data...) 

 While the module is called CSV, it's really not just *Comma* Separated Values - rather it's a rough approximation of a database table with an optional header row where rows/records are separated by <record separator> and fields are separated by <field separator>. Sometimes the record separator is chr(0x2c) (e.g. a comma) sometimes it's chr(0x09) (e.g. a tab - or in ASCII parlance "Horizontal Tab/HT") ... or maybe even the actual ASCII "Record Separator" character (e.g. chr(0x1e)) ... or maybe NUL chr(0x00). 

(1) The module should be 100% agnostic about the separator - the current (3.8.3) error text when trying to use csv.reader(..., delimiter=chr(0x00)) is "TypeError: "delimiter" must be a 1-character string" ... well, chr(0x00) *is* a 1-character string. It's not a 1-character *printable* string... But then again neither is chr(0x1e) (ASCII "RS" Record Separator) .. and csv.reader(..., delimiter=chr(0x1e)) appears to work (I haven't tried actual data yet). 


(1a) The use of chr(0x00) or '\0' is used quite often in the *NIX world as a convenient record separator that doesn't have escaping problems because by it's very nature it's non-printable. e.g. "find . -iname "*something*" -print0 | xargs -0 <program>" ... 

As to the difficulty in handling 0x00 characters, I dunno ... it appears that GNU find, xargs, gawk... same with FreeBSD. FreeBSD writes the output for "-print0" like this: https://github.com/freebsd/freebsd/blob/508f3673dec94b03f89b9ce9569390d6d9b86a89/usr.bin/find/function.c#L1383 ... and bsd xargs handles it too. I haven't looked at the CPython source to see what's going on - it might be tricky to modify the code to support this... (but then again, IMHO, this sort of thing should have been a consideration in the first place....) 

I suppose in many ways, the very existence of this specific issue at all is just one example of what seems to be a larger issue with Python's overall development: It's a great language for *many* things and in many ways. But I've run into so many little fringe "gotchas" where something doesn't work or is limited in some way because, seemingly, functionality is designed around/defined by a practical-example-use-case and not what is or might be *possible* (e.g. the CSV-as-only-a-spreadsheet-interface example -- and I really *don't* mean that as a personal attack @Skip - I am very appreciative of the time and effort you and everyone has poured into the project...) Is it possible to write a NUL (0x00) character to a file? Through a *NIX pipe? You bet. 

(I got a little rant-y .. sorry... I'm sure there's a _lot_ more going on underneath the covers and there are a lot of factors - not limited to just the csv module - as you mentioned. I just really feel like something is "off". Maybe it's my brain - ha. :))

History
Date	User	Action	Args
2020-05-30 06:06:29	danieljewell	set	recipients: + danieljewell, skip.montanaro, r.david.murray, serhiy.storchaka, bobbyocean
2020-05-30 06:06:29	danieljewell	set	messageid: <1590818789.14.0.720058282601.issue27580@roundup.psfhosted.org>
2020-05-30 06:06:29	danieljewell	link	issue27580 messages
2020-05-30 06:06:28	danieljewell	create