New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant coding cookies from 3.x stdlib #53554
Comments
On Mon, Jul 19, 2010 at 2:45 AM, Guido van Rossum <guido@python.org> wrote:
|
In the attached patch, I've removed encoding cookies in all files listed in the message above except
The bad_coding contains bad encoding cookie which is being tested, in badsyntax_3131, the cookie could be removed, but I think it clarifies the meaning of the file. It can probably be replaced with a comment explaining what is being tested. The shlex.py was reverted in r82560, so I am not touching it, but I suspect that the logic used to build wordchars is not correct on systems supporting unicode. I am adding Peter Åstrand to "nosy" because his name was miscoded in Lib/getopt.py and I fixed it by copying from Misc/ACKS. |
It turns out that test_imp tests that pydoc.py is in iso-8859-1. This is not right. The test should use a dedicated file under Lib/test for that. |
This is not easy after all, kudos for taking care of that. The shlex fix was reverted IIRC because Mark was unsure of the fix, since our diff tools try to be too clever and decode files, not showing us the bytes differences. I don’t trust my editor either, nor the file command (which says very funny things about Python files). I only trust Python’s open function to give me the exact bytes, which I can compare with 'æ'.encode('utf-8') and 'æ'.encode('latin1') to make up my mind. In this case, I would trust the left-hand side of the diff seen in r82560 as a definition of what those characters are supposed to be, since I don’t have POSIX handy <wink>, and add the \x escapes for those characters (per Guido’s recommendation—I regret we’ve switched to UTF-8 but can’t use it). |
On Mon, Jul 19, 2010 at 4:33 PM, Éric Araujo <report@bugs.python.org> wrote:
Well, it is not clear what wordchars should be in shlex. I would 62 while according to str.isalnum(), three are 71 alphanumeric characters 71 I don't know what POSIX definition of word character is. |
Tracker eating python prompt prefixed lines is truly annoying. Here are the missing lines: >>> len(shlex(posix=True).wordchars) - len(shlex().wordchars)
62
>>> len([chr(i) for i in range(128, 256) if chr(i).isalnum()])
71 |
Thank you for explaining that, I had a suspicion I had not understood “the logic used to build wordchars is not correct”. So our options here are to ignore the issue and keep statu quo, remove this strange POSIX wordchars concept, find some POSIX docs, or contact esr directly. I’m tentatively adding him to nosy, hoping he’ll see the email and come here enlighten us. Eric, I hope this doesn’t annoy you. |
I don't think posix mode was added by ESR, but I cannot check ATM. |
I shouldn’t edit bugs when tired, sorry for the noise. hg log tells me that those extra characters have been added in r32284 (2003) following bug bpo-722686 by Gustavo Niemeyer following a discussion on “the mailing list”. Adjusting nosy. |
I wonder whether 3.x shlex working with unicode files is just incidental to 3.x string model and is not really correct. I think bpo-1170 is a better place to have this discussion. I'll add a comment there. |
I am attaching a new patch, issue9308a.diff which fixes test_imp by introducing a new package, test.encoded_modules which currently contains a module encoded with iso-8859-1 and another with a somewhat more interesting encoding, koi8-r. I think it would be interesting to add one or two multi-byte encodings to the mix. The test_import_encoded_module test case probably does not belong in test_imp and should be moved to test_import or test_importlib, but I would like to see the reaction to the idea of introducing test.encoded_modules before investing more time into polishing the tests. |
Could you add comments in encoded_modules/init.py? I don’t understand it, so I can’t make a useful comment. |
Please see updated issue9308a.diff. |
I think this new test is a good idea. |
Is anyone interested in reviewing this patch before it is committed? Since there are no user-visible changes and the only non-trivial change simply adds new tests, I think this can go in. Any refinements can be done later. |
I am attaching an updated patch, issue9308b.diff. Compared to the "a" patch, I added test/encoded_modules to the makefile so that it gets installed and removed cookies from some more test and Tools files. |
Committed in r85537. |
I yhink you need to add an svnignore property to that directory for __pycache__. |
On Sat, Oct 16, 2010 at 1:34 PM, R. David Murray <report@bugs.python.org> wrote:
r85576 (I hope I got it right.) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: