Message 189675 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Michael.Fox
Recipients	Arfrever, Michael.Fox, nadeem.vawda, pitrou, rhettinger, serhiy.storchaka, vstinner
Date	2013-05-20.16:41:58
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<CABbL6oZRuVE7WznVmKyjbSMNbFcjpr660PJpgZWGij_c7VHZtA@mail.gmail.com>
In-reply-to	<1369060963.08.0.432691945326.issue18003@psf.upfronthosting.co.za>

Content
You're right. In fact, what doesn't make sense is to be doing line-oriented reads on a binary file. Why was I doing that? I do have another quibble though. The open() function is like this: open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) -> file object The lzma.open() function is like this: lzma.open = open(filename, mode='rb', *, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None) It seems to me that it would be best for them to be as congruent as possible. Because people will try to do this (I did): if filename.endswith('.xz'): f = lzma.open(filename) else: f = open(filename) for line in f: ... And then they will be in for a surprise. Would you consider changing the default mode of lzma.open() to 'rt' and implement the 'buffering' parameter as it is implemented in open()? And further, can we discuss whether "duck typing" is becoming generally problematic in an expanding standard library and whether there should be some process, language, testing or something to ensure the ducks really quack the same? For example, there could be a standard testsuite which everything purporting to implement an open() function should be subject to. On Mon, May 20, 2013 at 7:42 AM, Nadeem Vawda <report@bugs.python.org> wrote: > > Nadeem Vawda added the comment: > > No, that is the intended behavior for binary streams - they operate at > the level of individual byes. If you want to treat your input file as > Unicode-encoded text, you should open it in text mode. This will return a > TextIOWrapper which handles the decoding and line splitting properly. > > ---------- > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue18003> > _______________________________________ -- - Michael

You're right. In fact, what doesn't make sense is to be doing
line-oriented reads on a binary file. Why was I doing that?

I do have another quibble though. The open() function is like this:

open(file, mode='r', buffering=-1, encoding=None,
         errors=None, newline=None, closefd=True, opener=None) -> file object

The lzma.open() function is like this:

lzma.open = open(filename, mode='rb', *, format=None, check=-1,
preset=None, filters=None, encoding=None, errors=None, newline=None)

It seems to me that it would be best for them to be as congruent as
possible. Because people will try to do this (I did):

if filename.endswith('.xz'):
    f = lzma.open(filename)
else:
    f = open(filename)
for line in f: ...

And then they will be in for a surprise. Would you consider changing
the default mode of lzma.open() to 'rt' and implement the 'buffering'
parameter as it is implemented in open()? And further, can we discuss
whether "duck typing" is becoming generally problematic in an
expanding standard library and whether there should be some process,
language, testing or something to ensure the ducks really quack the
same?

For example, there could be a standard testsuite which everything
purporting to implement an open() function should be subject to.

On Mon, May 20, 2013 at 7:42 AM, Nadeem Vawda <report@bugs.python.org> wrote:
>
> Nadeem Vawda added the comment:
>
> No, that is the intended behavior for binary streams - they operate at
> the level of individual byes. If you want to treat your input file as
> Unicode-encoded text, you should open it in text mode. This will return a
> TextIOWrapper which handles the decoding and line splitting properly.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue18003>
> _______________________________________

-- 

-
Michael

History
Date	User	Action	Args
2013-05-20 16:41:58	Michael.Fox	set	recipients: + Michael.Fox, rhettinger, pitrou, vstinner, nadeem.vawda, Arfrever, serhiy.storchaka
2013-05-20 16:41:58	Michael.Fox	link	issue18003 messages
2013-05-20 16:41:58	Michael.Fox	create