msg124870 - (view) |
Author: David Beazley (dabeaz) |
Date: 2010-12-29 19:53 |
Is something like this supposed to work:
>>> import gzip
>>> import io
>>> f = io.TextIOWrapper(gzip.open("foo.gz"),encoding='ascii'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: readable
In a nutshell--reading a .gz file as text.
|
msg124875 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-12-29 21:59 |
Since GZipFile inherits from BufferedIOBase, and TextIOWrapper is supposed to be designed to wrap a BufferedIOBase object, I would say yes it ought to work. On the other hand there may also be a doc error there, since it may be that TextIOWrapper actually needs to wrap one of the subclasses of BufferedIOBase.
|
msg124876 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-12-29 22:00 |
Oops. It only has that inheritance in 3.2.
|
msg124877 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-12-29 22:01 |
Heh, and 2.7. Fixing versions yet again.
|
msg124878 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-12-29 22:12 |
This should be easy to fix, if only the "readable" and "writable" methods are needed. Do you want to try writing a patch?
|
msg124879 - (view) |
Author: David Beazley (dabeaz) |
Date: 2010-12-29 22:49 |
It goes without saying that this also needs to be checked with the bz2 module. A quick check seems to indicate that it has the same problem.
While you're at it, maybe someone could add an 'open' function to bz2 to make it symmetrical with gzip as well :-).
|
msg124880 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-12-29 22:55 |
bz2 is a pure C module, so that's a very different situation.
|
msg124881 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-12-29 22:56 |
> While you're at it, maybe someone could add an 'open' function to bz2
> to make it symmetrical with gzip as well :-).
That's a nice idea, but quite orthogonal to this issue.
|
msg124882 - (view) |
Author: David Beazley (dabeaz) |
Date: 2010-12-29 22:58 |
C or not, wrapping a BZ2File instance with a TextIOWrapper to get text still seems like something that someone might want to do. I doubt it would take much modification to give BZ2File instances the required set of methods.
|
msg124884 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2010-12-29 23:05 |
Right, but in the bz2 case I think it is a feature request rather than a bugfix. In any case it should be a separate issue.
|
msg124885 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-12-29 23:10 |
> C or not, wrapping a BZ2File instance with a TextIOWrapper to get text
> still seems like something that someone might want to do. I doubt it
> would take much modification to give BZ2File instances the required
> set of methods.
BZ2File uses FILE pointers internally so it may be more complicated than
it looks to be (because the methods may not have the right semantics).
|
msg124886 - (view) |
Author: David Beazley (dabeaz) |
Date: 2010-12-29 23:12 |
Do Python devs really view gzip and bz2 as two totally completely different animals? They both have the same functionality and would be used for the same kinds of things. Maybe I'm missing something.
|
msg124887 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2010-12-29 23:14 |
> Do Python devs really view gzip and bz2 as two totally completely
> different animals? They both have the same functionality and would be
> used for the same kinds of things. Maybe I'm missing something.
Well, the reality of divergent implementation strategies trumps the
theory of API compatibility :) The approach taken by bz2 is IMO
regrettable, but it's not a ten minutes job to write it again from
scratch.
|
msg124888 - (view) |
Author: David Beazley (dabeaz) |
Date: 2010-12-29 23:17 |
Hmmm. Interesting. In the big picture, it might be an interesting project for someone (not necessarily the core devs) to sit down and refactor both of these modules so that they play nice with Python 3 I/O system. Obviously that's a project outside the scope of this bug or the 3.2 release for that matter.
|
msg129187 - (view) |
Author: David Beazley (dabeaz) |
Date: 2011-02-23 13:14 |
Bump. This is still broken in Python 3.2.
|
msg129198 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-02-23 14:29 |
If a patch had been proposed it probably would have gotten in to 3.2. Maybe someone (perhaps you?) will find the time before 3.2.1.
Someone has decided to work on the bz2 rewrite, by the way (issue 5863).
|
msg129199 - (view) |
Author: David Beazley (dabeaz) |
Date: 2011-02-23 14:37 |
If I can find some time, I may took a look at this. I just noticed that similar problems arise trying to wrap TextIOWrapper around the file-like objects returned by urllib.request.urlopen as well.
In the big picture, some discussion of what it means to be "file-like" might be in order. If something is "file-like" and binary, should that always imply that I be able to wrap a TextIOWrapper object around it in order to encode/decode text? I would argue "yes", but I'd be curious to know what others think.
|
msg129200 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-02-23 14:42 |
What is the problem with Python 3.2? It works correctly here:
$ cat bla.txt
bli
blo
bla
$ gzip bla.txt
$ ./python
Python 3.3a0 (unknown, Feb 23 2011, 13:03:50)
>>> import gzip, io
>>> f = io.TextIOWrapper(gzip.open("bla.txt.gz"),encoding='ascii')
>>> f.read()
'bli\nblo\nbla\n'
If someone added Python 3.2 in the Versions field because of an issue with bz2: please open a new issue instead.
|
msg129201 - (view) |
Author: David Beazley (dabeaz) |
Date: 2011-02-23 14:46 |
Python 3.2 (r32:88445, Feb 20 2011, 21:51:21)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> import io
>>> f = io.TextIOWrapper(gzip.open("file.gz"),encoding='latin-1')
>>> f.readline()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: read1
>>>
|
msg129203 - (view) |
Author: R. David Murray (r.david.murray) * |
Date: 2011-02-23 14:56 |
Yes, a clear definition of the minimum requirements for being wrapped by TextIOWrapper sounds like a necessary thing to have (and I'd be inclined to agree with your assertion, but I didn't work on the IO library :). It would be best to open a new issue for that.
|
msg129205 - (view) |
Author: STINNER Victor (vstinner) * |
Date: 2011-02-23 15:04 |
> Yes, a clear definition of the minimum requirements for being wrapped
> by TextIOWrapper sounds like a necessary thing to have
About that: is read1() argument mandatory or not?
In _pyio, BufferedIOBase.read1() argument is optional (default: None); BytesIO.read1(), BufferedReader.read1(), BufferedRWPair.read1(), BufferedRandom.read1() argument is mandatory.
In _io, BufferedIOBase.read1() raises directly an exception, without checking the arguments; BufferedReader.read1() argument is mandatory.
In the io doc, BufferedIOBase.read1() argument is optional (default: -1), BytesIO.read1() has no argument (!) and BufferedReader.read1() argument is mandatory.
|
msg129215 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-02-23 16:52 |
It would probably be ok to fallback on read() when read1() isn't implemented. read1() is supposed to be implemented by all BufferedIO-compliant classes, but in all honesty I don't think it's very useful in practice. It's supposed to be an optimization, and I think it's a misguided one; the generalized prefetch() primitive I proposed last year would certainly be more useful: see http://mail.python.org/pipermail/python-dev/2010-September/104194.html
|
msg132757 - (view) |
Author: Filip Gruszczyński (gruszczy) |
Date: 2011-04-01 16:52 |
Is following change in GzipFile class enough:
def read1(self, n):
return self.read(n)
? This satisfies TextIOWrapper to run readline correctly.
|
msg132899 - (view) |
Author: Nadeem Vawda (nadeem.vawda) * |
Date: 2011-04-03 22:28 |
> Is following change in GzipFile class enough:
>
> def read1(self, n):
> return self.read(n)
>
> ? This satisfies TextIOWrapper to run readline correctly.
Looks good to me.
By the way, BZ2File now works correctly - the fix for issue5863 adds read1().
|
msg132902 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-04-03 22:41 |
> Nadeem Vawda <nadeem.vawda@gmail.com> added the comment:
>
> > Is following change in GzipFile class enough:
> >
> > def read1(self, n):
> > return self.read(n)
> >
> > ? This satisfies TextIOWrapper to run readline correctly.
>
> Looks good to me.
Well, ideally, read1() should satisfy the condition stated in the
BufferedIOBase documentation - namely, that it issues at most one read()
call on the underlying stream.
|
msg132931 - (view) |
Author: Nadeem Vawda (nadeem.vawda) * |
Date: 2011-04-04 10:56 |
Here's an implementation of read1() that satisfies that condition, along with
some relevant unit tests.
|
msg132948 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-04-04 14:18 |
> Here's an implementation of read1() that satisfies that condition, along with
> some relevant unit tests.
Something looks fishy: what happens if size is -1 and EOFError is not
raised?
|
msg132953 - (view) |
Author: Nadeem Vawda (nadeem.vawda) * |
Date: 2011-04-04 17:01 |
> Something looks fishy: what happens if size is -1 and EOFError is not raised?
You're right - I missed that possibility. In that case, extrasize and offset get
updated incorrectly, which will break subsequent calls to seek() and tell().
However, it seems that subsequent reads work fine, because slicing a bytes object
with a too-large upper bound doesn't raise an exception.
The attached patch fixes this bug, and updates test_read1() to catch regressions.
|
msg132959 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-04-04 19:01 |
New changeset 9775d67c9af9 by Antoine Pitrou in branch 'default':
Issue #10791: Implement missing method GzipFile.read1(), allowing GzipFile
http://hg.python.org/cpython/rev/9775d67c9af9
|
msg132960 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-04-04 19:02 |
Patch now committed, thank you!
Since the patch adds a new API (GzipFile.read1()), I think it's better not to backport it.
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:10 | admin | set | github: 55000 |
2011-04-07 13:20:42 | jcea | set | nosy:
+ jcea
|
2011-04-04 19:02:53 | pitrou | set | status: open -> closed versions:
- Python 2.7, Python 3.2 type: behavior -> enhancement messages:
+ msg132960
resolution: fixed stage: needs patch -> resolved |
2011-04-04 19:01:58 | python-dev | set | nosy:
+ python-dev messages:
+ msg132959
|
2011-04-04 17:01:25 | nadeem.vawda | set | files:
+ gzipfile_read1.diff
messages:
+ msg132953 |
2011-04-04 14:18:21 | pitrou | set | messages:
+ msg132948 |
2011-04-04 10:56:01 | nadeem.vawda | set | files:
+ gzipfile_read1.diff keywords:
+ patch messages:
+ msg132931
|
2011-04-03 22:41:02 | pitrou | set | messages:
+ msg132902 |
2011-04-03 22:28:48 | nadeem.vawda | set | messages:
+ msg132899 |
2011-04-02 10:27:46 | nadeem.vawda | set | nosy:
+ nadeem.vawda
|
2011-04-01 16:52:06 | gruszczy | set | nosy:
+ gruszczy messages:
+ msg132757
|
2011-02-23 18:26:10 | alex | set | nosy:
+ alex
|
2011-02-23 16:52:33 | pitrou | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages:
+ msg129215 |
2011-02-23 15:04:04 | vstinner | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages:
+ msg129205 |
2011-02-23 14:56:00 | r.david.murray | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages:
+ msg129203 versions:
+ Python 3.2, Python 3.3 |
2011-02-23 14:46:41 | dabeaz | set | nosy:
pitrou, vstinner, eric.araujo, r.david.murray, dabeaz messages:
+ msg129201 |
2011-02-23 14:42:58 | vstinner | set | nosy:
+ vstinner
messages:
+ msg129200 versions:
- Python 3.2 |
2011-02-23 14:37:08 | dabeaz | set | nosy:
pitrou, eric.araujo, r.david.murray, dabeaz messages:
+ msg129199 |
2011-02-23 14:29:18 | r.david.murray | set | nosy:
pitrou, eric.araujo, r.david.murray, dabeaz messages:
+ msg129198 |
2011-02-23 13:14:41 | dabeaz | set | nosy:
pitrou, eric.araujo, r.david.murray, dabeaz messages:
+ msg129187 |
2011-01-02 18:34:37 | eric.araujo | set | nosy:
+ eric.araujo
|
2010-12-29 23:17:37 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124888 |
2010-12-29 23:14:21 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124887 |
2010-12-29 23:12:11 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124886 |
2010-12-29 23:10:48 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124885 |
2010-12-29 23:05:18 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124884 |
2010-12-29 22:58:09 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124882 |
2010-12-29 22:56:33 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124881 |
2010-12-29 22:55:00 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124880 |
2010-12-29 22:49:32 | dabeaz | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124879 |
2010-12-29 22:12:09 | pitrou | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124878 |
2010-12-29 22:01:39 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124877 versions:
+ Python 2.7 |
2010-12-29 22:00:18 | r.david.murray | set | nosy:
pitrou, r.david.murray, dabeaz messages:
+ msg124876 versions:
- Python 3.1, Python 2.7 |
2010-12-29 21:59:11 | r.david.murray | set | versions:
+ Python 3.1, Python 2.7 nosy:
+ r.david.murray, pitrou
messages:
+ msg124875
stage: needs patch |
2010-12-29 19:53:51 | dabeaz | create | |