New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zipfile.ZipExtFile.read() is missing universal newline support #51008
Comments
The zipfile.ZipFile.open() behavior with mode 'U' or 'rU' is not quite http://docs.python.org/library/zipfile.html#zipfile.ZipFile.open Here is an example: $ echo -ne "This is an example\r\nWhich demonstrates a problem\r\nwith
ZipFile.open(..., 'U')\r\n" > foo.txt
$ cat -v foo.txt
This is an example^M
Which demonstrates a problem^M
with ZipFile.open(..., 'U')^M
$ zip foo.zip foo.txt
adding: foo.txt (deflated 1%)
$ python
Python 2.6.2 (r262:71600, Aug 21 2009, 17:52:12)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("foo.txt", 'U').read()
"This is an example\nWhich demonstrates a problem\nwith
ZipFile.open(..., 'U')\n"
>>> from zipfile import ZipFile
>>> ZipFile("foo.zip").open("foo.txt", 'U').read()
"This is an example\r\nWhich demonstrates a problem\r\nwith
ZipFile.open(..., 'U')\r\n"
>>> The open() method was added here: http://bugs.python.org/issue1121142 The cause is that the universal newline implementation is specific to Note that test_zipfile.UniversalNewlineTests.readTest() passes. This is |
Patch for both zipfile.py and test_zipfile.py attached.
|
Hi Art, Thanks for working on this. I've taken a look at the patch. The fix to read_test looks correct. Of course, I would consider a more The changes to read() are an improvement, but I think we need to be Another thing to work out is the lack of the 'newlines' attribute, I've noted that bz2 seems to do a pretty good job with universal newline It's in C, however, and I don't think there's actually anything in the |
The python version of the io module does (_pyio.py). I've set the stage to 'test needed' because the test needs to be turned |
Hi Ryan, Thanks for the feedback. I've attached a new patch that fixes the read(nbytes) behavior--It will I also added the newlines attribute per PEP-278 and a corresponding test. I'm not sure I understood David's comment that read_test needed to be |
My apologies, I misread the patch. |
Thanks for working on this. Comments on patch: (1) I think you should retain the full "ugly check" comment explaining (2) Your code for setting the newlines attributes and the tests for it (3) Sorry for being dense, but I don't understand your code to adjust |
Hi David, Thanks for the review. Patch attached. (1) I've moved that comment to the check's new location. (2) Fixed the bug and added tests for only one separator. Also added (3) I changed this so that when the file is opened with universal |
Just found another bug in the code that sets the newlines attribute. |
Latest patch attached.
|
Well, you could try applying the patch and see if it applies cleanly and passes the tests. But it would be helpful to have it reworked for py3k, since we now commit first to py3k and then backport to 2.7. I have not reviewed the updated patch. |
Support lines in zipfile looks contradictory and buggy. This complicates the code and makes the behavior of zipfile.ZipExtFile incompatible with the interface of other file-like objects. For example, the behavior of the read, read1 and peek differ from one described in io module. If we are working with binary data, conversion of newlines is meaningless (and how about newlines in comments?). If we are working with text, the bytes must be decoded to str. This will help io.TextIOWrapper. I suggest two alternatives:
|
This would be fine with me. |
It may be worth to deprecate PEP-278? Oh, only ten years have passed |
Well, I don't know if PEPs ever get deprecated. In this case, PEP-3116 |
I know how to remove universal newline support, I know how after this With regard to this issue, I note that there are tests that check that |
Mode 'U' is deprecated now (bpo-15204). |
Support of the 'U' mode is removed in 3.6 (bpo-27029). |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: