New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip.open breaks with 'U' flag #49398
Comments
If you pass the 'U' (Universal newlines) flag into gzip.open(), the flag In virtually all of my code that reads text files, I use the 'U' flag to Anyway, we added such support to some matplotlib methods, and found that
I took a look at the Python SVN (2.5.4 and 2.6.1) for the gzip lib. I # guarantee the file is opened in binary mode on platforms
# that care about that sort of thing
if mode and 'b' not in mode:
mode += 'b'
if fileobj is None:
fileobj = self.myfileobj = __builtin__.open(filename, mode
or 'rb') this is going to break for 'U' == you'll get 'rUb'. I tested So:
That later seems a better idea -- this issue could certainly come up in I haven't touched py3 yet, so I have not idea if this issue is different NOTE: passing in the 'U' flag doesn't guarantee that gzi will break. The very simple patch: Add: mode.replace('U', '') to the above code before opeing the file But we may want to do something smarter... see the (limited) discussion at: http://mail.python.org/pipermail/python-dev/2009-January/085662.html |
Seems like this should be fairly easy to do right. 'U' needs to be |
Here's a patch against trunk. Extra test case and minor doc tweak |
Same bug in 2.5, I don't know if the patch applies to 2.5 |
Unfortunately universal newlines are more complicated than replace() can |
The problem appears to be that the gzip module simply doesn't support I'm currently working on the zipfile module's universal newline support I'm not sure if file object's open() behavior when presented with 'rUb' >>> f = open("test.txt", "w").write("blah\r\nblah\rblah\nblah\r\n")
>>> f = open("test.txt", "rUb")
>>> f.read()
'blah\nblah\nblah\nblah\n' Since 'U' and 'b' are conceptually mutually exclusive on platforms where |
New changeset e647229c422b by Nadeem Vawda in branch '2.7': |
The data corruption issue is now fixed in the 2.7 branch. In 3.x, using a mode containing 'U' results in an exception rather than silent data corruption. Additionally, gzip.open() has supported text modes ("rt"/"wt"/"at") and newline translation since 3.3 [bpo-13989]. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: