New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrapping TextIOWrapper around gzip files #55000
Comments
Is something like this supposed to work: >>> import gzip
>>> import io
>>> f = io.TextIOWrapper(gzip.open("foo.gz"),encoding='ascii'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: readable In a nutshell--reading a .gz file as text. |
Since GZipFile inherits from BufferedIOBase, and TextIOWrapper is supposed to be designed to wrap a BufferedIOBase object, I would say yes it ought to work. On the other hand there may also be a doc error there, since it may be that TextIOWrapper actually needs to wrap one of the subclasses of BufferedIOBase. |
Oops. It only has that inheritance in 3.2. |
Heh, and 2.7. Fixing versions yet again. |
This should be easy to fix, if only the "readable" and "writable" methods are needed. Do you want to try writing a patch? |
It goes without saying that this also needs to be checked with the bz2 module. A quick check seems to indicate that it has the same problem. While you're at it, maybe someone could add an 'open' function to bz2 to make it symmetrical with gzip as well :-). |
bz2 is a pure C module, so that's a very different situation. |
That's a nice idea, but quite orthogonal to this issue. |
C or not, wrapping a BZ2File instance with a TextIOWrapper to get text still seems like something that someone might want to do. I doubt it would take much modification to give BZ2File instances the required set of methods. |
Right, but in the bz2 case I think it is a feature request rather than a bugfix. In any case it should be a separate issue. |
BZ2File uses FILE pointers internally so it may be more complicated than |
Do Python devs really view gzip and bz2 as two totally completely different animals? They both have the same functionality and would be used for the same kinds of things. Maybe I'm missing something. |
Well, the reality of divergent implementation strategies trumps the |
Hmmm. Interesting. In the big picture, it might be an interesting project for someone (not necessarily the core devs) to sit down and refactor both of these modules so that they play nice with Python 3 I/O system. Obviously that's a project outside the scope of this bug or the 3.2 release for that matter. |
Bump. This is still broken in Python 3.2. |
If a patch had been proposed it probably would have gotten in to 3.2. Maybe someone (perhaps you?) will find the time before 3.2.1. Someone has decided to work on the bz2 rewrite, by the way (bpo-5863). |
If I can find some time, I may took a look at this. I just noticed that similar problems arise trying to wrap TextIOWrapper around the file-like objects returned by urllib.request.urlopen as well. In the big picture, some discussion of what it means to be "file-like" might be in order. If something is "file-like" and binary, should that always imply that I be able to wrap a TextIOWrapper object around it in order to encode/decode text? I would argue "yes", but I'd be curious to know what others think. |
What is the problem with Python 3.2? It works correctly here: $ cat bla.txt
bli
blo
bla
$ gzip bla.txt
$ ./python
Python 3.3a0 (unknown, Feb 23 2011, 13:03:50)
>>> import gzip, io
>>> f = io.TextIOWrapper(gzip.open("bla.txt.gz"),encoding='ascii')
>>> f.read()
'bli\nblo\nbla\n' If someone added Python 3.2 in the Versions field because of an issue with bz2: please open a new issue instead. |
Python 3.2 (r32:88445, Feb 20 2011, 21:51:21)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import gzip
>>> import io
>>> f = io.TextIOWrapper(gzip.open("file.gz"),encoding='latin-1')
>>> f.readline()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: read1
>>> |
Yes, a clear definition of the minimum requirements for being wrapped by TextIOWrapper sounds like a necessary thing to have (and I'd be inclined to agree with your assertion, but I didn't work on the IO library :). It would be best to open a new issue for that. |
About that: is read1() argument mandatory or not? In _pyio, BufferedIOBase.read1() argument is optional (default: None); BytesIO.read1(), BufferedReader.read1(), BufferedRWPair.read1(), BufferedRandom.read1() argument is mandatory. In _io, BufferedIOBase.read1() raises directly an exception, without checking the arguments; BufferedReader.read1() argument is mandatory. In the io doc, BufferedIOBase.read1() argument is optional (default: -1), BytesIO.read1() has no argument (!) and BufferedReader.read1() argument is mandatory. |
It would probably be ok to fallback on read() when read1() isn't implemented. read1() is supposed to be implemented by all BufferedIO-compliant classes, but in all honesty I don't think it's very useful in practice. It's supposed to be an optimization, and I think it's a misguided one; the generalized prefetch() primitive I proposed last year would certainly be more useful: see http://mail.python.org/pipermail/python-dev/2010-September/104194.html |
Is following change in GzipFile class enough: def read1(self, n):
return self.read(n) ? This satisfies TextIOWrapper to run readline correctly. |
Looks good to me. By the way, BZ2File now works correctly - the fix for bpo-5863 adds read1(). |
Well, ideally, read1() should satisfy the condition stated in the |
Here's an implementation of read1() that satisfies that condition, along with |
Something looks fishy: what happens if size is -1 and EOFError is not |
You're right - I missed that possibility. In that case, extrasize and offset get The attached patch fixes this bug, and updates test_read1() to catch regressions. |
New changeset 9775d67c9af9 by Antoine Pitrou in branch 'default': |
Patch now committed, thank you! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: