Message 87823 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	nessus42
Recipients	benjamin.peterson, facundobatista, georg.brandl, ncoghlan, nessus42, pitrou, r.david.murray, rhettinger
Date	2009-05-15.17:46:21
SpamBayes Score	8.580914e-12
Marked as misclassified	No
Message-id	<1242409585.02.0.949900911428.issue1152248@psf.upfronthosting.co.za>
In-reply-to

Content
Antoine Pitrou <report@bugs.python.org> wrote: > Nick Coghlan <ncoghlan@gmail.com> added the comment: > > Note that the problem with the read()+split() approach is that you > > either have to read the whole file into memory (which this RFE is trying > > to avoid) or you have to do your own buffering and so forth to split > > records as you go. Since the latter is both difficult to get right and > > very similar to what the IO module already has to do for readlines(), it > > makes sense to include the extra complexity there. > I wonder how often this use case happens though. Every day for me. The reason that I originally brought up this request some years back on comp.lang.python was that I wanted to be able to use Python easily like I use the xargs program. E.g., find -type f -regex 'myFancyRegex' -print0 \| stuff-to-do-on-each- file.py With "-print0" the line separator is chaged to null, so that you can deal with filenames that have newlines in them. ("find" and "xargs" traditionally have used newline to separate files, but that fails in the face of filenames that have newlines in them, so the -print0 argument to find and the "-0" argument to xargs were thankfully eventually added as a fix for this issue. Nulls are not allowed in filenames. At least not on Unix.) > When you don't split on lines, conversely, you probably have a binary > format, That's not true for the daily use case I just mentioned. \|>ouglas P.S. I wrote my own version of readlines, of course, as the archives of comp.lang.python will show. I just don't feel that everyone should be required to do the same, when this is the sort of thing that sysadmins and other Unix-savy folks are wont to do on a daily basis. P.P.S. Another use case is that I often end up with files that have beeen transferred back and forth between Unix and Windows and god-knows-what-else, and the newlines end up being some weird mixture of carriage returns and line feeds (and sometimes some other stray characters such as "=20" or somesuch) that many programs seem to have a hard time recognizing as newlines.

Antoine Pitrou <report@bugs.python.org> wrote:

> Nick Coghlan <ncoghlan@gmail.com> added the comment:

> > Note that the problem with the read()+split() approach is that you
> > either have to read the whole file into memory (which this RFE is 
trying
> > to avoid) or you have to do your own buffering and so forth to split
> > records as you go. Since the latter is both difficult to get right 
and
> > very similar to what the IO module already has to do for 
readlines(), it
> > makes sense to include the extra complexity there.

> I wonder how often this use case happens though.

Every day for me.  The reason that I originally brought up this request
some years back on comp.lang.python was that I wanted to be able to use
Python easily like I use the xargs program.

E.g.,

   find -type f -regex 'myFancyRegex' -print0 | stuff-to-do-on-each-
file.py

With "-print0" the line separator is chaged to null, so that you can
deal with filenames that have newlines in them.

("find" and "xargs" traditionally have used newline to separate files,
but that fails in the face of filenames that have newlines in them, so
the -print0 argument to find and the "-0" argument to xargs were
thankfully eventually added as a fix for this issue.  Nulls are not
allowed in filenames.  At least not on Unix.)

> When you don't split on lines, conversely, you probably have a binary
> format,

That's not true for the daily use case I just mentioned.

|>ouglas

P.S. I wrote my own version of readlines, of course, as the archives of
comp.lang.python will show.  I just don't feel that everyone should be
required to do the same, when this is the sort of thing that sysadmins
and other Unix-savy folks are wont to do on a daily basis.

P.P.S. Another use case is that I often end up with files that have
beeen transferred back and forth between Unix and Windows and
god-knows-what-else, and the newlines end up being some weird mixture of
carriage returns and line feeds (and sometimes some other stray
characters such as "=20" or somesuch) that many programs seem to have a
hard time recognizing as newlines.

History
Date	User	Action	Args
2009-05-15 17:46:25	nessus42	set	recipients: + nessus42, georg.brandl, rhettinger, facundobatista, ncoghlan, pitrou, benjamin.peterson, r.david.murray
2009-05-15 17:46:25	nessus42	set	messageid: <1242409585.02.0.949900911428.issue1152248@psf.upfronthosting.co.za>
2009-05-15 17:46:23	nessus42	link	issue1152248 messages
2009-05-15 17:46:21	nessus42	create