classification
Title: Remove redundant coding cookies from 3.x stdlib
Type: Stage: commit review
Components: Library (Lib), Tests Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: astrand, belopolsky, eric.araujo, mark.dickinson, niemeyer, r.david.murray
Priority: normal Keywords: patch

Created on 2010-07-19 18:57 by belopolsky, last changed 2010-10-16 20:15 by belopolsky. This issue is now closed.

Files
File name Uploaded Description Edit
issue9308.diff belopolsky, 2010-07-19 20:12
issue9308a.diff belopolsky, 2010-07-20 16:54
issue9308b.diff belopolsky, 2010-10-15 15:40
Messages (19)
msg110798 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-19 18:57
On Mon, Jul 19, 2010 at 2:45 AM, Guido van Rossum <guido@python.org> wrote:
> Sounds like a good idea to try to remove redundant cookies *and* to
> remove most occasional use of non-ASCII characters outside comments
> (except for unittests specifically trying to test Unicode features).
> Personally I would use \xXX escapes instead of spelling out the
> characters in shlex.py, for example.
>
> Both with or without the coding cookies, many ways of displaying text
> files garble characters outside the ASCII range, so it's better to
> stick to ASCII as much as possible.
>
> --Guido
>
> On Mon, Jul 19, 2010 at 1:21 AM, Alexander Belopolsky
> <alexander.belopolsky@gmail.com> wrote:
>> I was looking at the inspect module and noticed that it's source
>> starts with "# -*- coding: iso-8859-1 -*-".   I have checked and there
>> are no non-ascii characters in the file.   There are several other
>> modules that still use the cookie:
>>
>> Lib/ast.py:# -*- coding: utf-8 -*-
>> Lib/getopt.py:# -*- coding: utf-8 -*-
>> Lib/inspect.py:# -*- coding: iso-8859-1 -*-
>> Lib/pydoc.py:# -*- coding: latin-1 -*-
>> Lib/shlex.py:# -*- coding: iso-8859-1 -*-
>> Lib/encodings/punycode.py:# -*- coding: utf-8 -*-
>> Lib/msilib/__init__.py:# -*- coding: utf-8 -*-
>> Lib/sqlite3/__init__.py:#-*- coding: ISO-8859-1 -*-
>> Lib/sqlite3/dbapi2.py:#-*- coding: ISO-8859-1 -*-
>> Lib/test/bad_coding.py:# -*- coding: uft-8 -*-
>> Lib/test/badsyntax_3131.py:# -*- coding: utf-8 -*-
>>
>> I understand that coding: utf-8 is strictly redundant in 3.x.  There
>> are cases such as Lib/shlex.py where using encoding other than utf-8
>> is justified.  (See
>> http://svn.python.org/view?view=rev&revision=82560).  What are the
>> guidelines for other cases?  Should redundant cookies be removed?
>> Since not all editors respect the  -*- cookie, I think the answer
>> should be "yes" particularly when the cookie is setting encoding other
>> than utf-8.
>> _______________________________________________
>> Python-Dev mailing list
>> Python-Dev@python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
>
msg110804 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-19 20:12
In the attached patch, I've removed encoding cookies in all files listed in the message above except

>> Lib/test/bad_coding.py:# -*- coding: uft-8 -*-
>> Lib/test/badsyntax_3131.py:# -*- coding: utf-8 -*-
>> Lib/shlex.py:# -*- coding: iso-8859-1 -*-

The bad_coding contains bad encoding cookie which is being tested, in badsyntax_3131, the cookie could be removed, but I think it clarifies the meaning of the file.  It can probably be replaced with a comment explaining what is being tested.

The shlex.py was reverted in r82560, so I am not touching it, but I suspect that the logic used to build wordchars is not correct on systems supporting unicode.

I am adding Peter Åstrand to "nosy" because his name was miscoded in Lib/getopt.py and I fixed it by copying from Misc/ACKS.
msg110805 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-19 20:20
It turns out that test_imp tests that pydoc.py is in iso-8859-1.  This is not right.  The test should use a dedicated file under Lib/test for that.
msg110807 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-19 20:33
This is not easy after all, kudos for taking care of that.

The shlex fix was reverted IIRC because Mark was unsure of the fix, since our diff tools try to be too clever and decode files, not showing us the bytes differences. I don’t trust my editor either, nor the file command (which says very funny things about Python files). I only trust Python’s open function to give me the exact bytes, which I can compare with 'æ'.encode('utf-8') and 'æ'.encode('latin1') to make up my mind.

In this case, I would trust the left-hand side of the diff seen in r82560 as a definition of what those characters are supposed to be, since I don’t have POSIX handy <wink>, and add the \x escapes for those characters (per Guido’s recommendation—I regret we’ve switched to UTF-8 but can’t use it).
msg110828 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-19 22:50
On Mon, Jul 19, 2010 at 4:33 PM, Éric Araujo <report@bugs.python.org> wrote:
..
> The shlex fix was reverted IIRC because Mark was unsure of the fix, since our diff tools
> try to be too clever and decode files, not showing us the bytes differences.

Well, it is not clear what wordchars should be in shlex.  I would
think (c in wordchar) should be the same as (c.isalnum() or c == '_'),
but there are only 62 characters added  added in in posix mode:

62

while according to str.isalnum(), three are 71 alphanumeric characters
at code points between 128 and 255:

71

I don't know what POSIX definition of word character is.
msg110830 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-19 22:55
Tracker eating python prompt prefixed lines is truly annoying.  Here are the missing lines:

>>> len(shlex(posix=True).wordchars) - len(shlex().wordchars)
62
>>> len([chr(i) for i in range(128, 256) if chr(i).isalnum()])
71
msg110835 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-19 23:16
Thank you for explaining that, I had a suspicion I had not understood “the logic used to build wordchars is not correct”. So our options here are to ignore the issue and keep statu quo, remove this strange POSIX wordchars concept, find some POSIX docs, or contact esr directly. I’m tentatively adding him to nosy, hoping he’ll see the email and come here enlighten us. Eric, I hope this doesn’t annoy you.
msg110839 - (view) Author: Alexander Belopolsky (Alexander.Belopolsky) Date: 2010-07-19 23:21
>
> Changes by Éric Araujo <merwok@netwok.org>:
> ----------
> nosy: +esr

I don't think posix mode was added by ESR, but I cannot check ATM.
msg110840 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-19 23:30
I shouldn’t edit bugs when tired, sorry for the noise. hg log tells me that those extra characters have been added in r32284 (2003) following bug #722686 by Gustavo Niemeyer following a discussion on “the mailing list”. Adjusting nosy.
msg110853 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-20 02:55
I wonder whether 3.x shlex working with unicode files is just incidental to 3.x string model and is not really correct.  I think issue1170 is a better place to have this discussion.  I'll add a comment there.
msg110860 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-20 03:44
I am attaching a new patch, issue9308a.diff which fixes test_imp by introducing a new package, test.encoded_modules which currently contains a module encoded with iso-8859-1 and another with a somewhat more interesting encoding, koi8-r.  I think it would be interesting to add one or two multi-byte encodings to the mix.  The test_import_encoded_module test case probably does not belong in test_imp and should be moved to test_import or test_importlib, but I would like to see the reaction to the idea of introducing test.encoded_modules before investing more time into polishing the tests.
msg110907 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-20 15:10
Could you add comments in encoded_modules/__init__.py? I don’t understand it, so I can’t make a useful comment.
msg110921 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-20 16:54
> Could you add comments in encoded_modules/__init__.py?

Please see updated issue9308a.diff.
msg110922 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-07-20 17:03
I think this new test is a good idea.
msg111203 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-22 17:57
Is anyone interested in reviewing this patch before it is committed?  Since there are no user-visible changes and the only non-trivial change simply adds new tests, I think this can go in.  Any refinements can be done later.
msg118785 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-15 15:40
I am attaching an updated patch, issue9308b.diff.  Compared to the "a" patch, I added test/encoded_modules to the makefile so that it gets installed and removed cookies from some more test and Tools files.
msg118797 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-15 16:29
Committed in r85537.
msg118878 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-16 17:34
I yhink you need to add an svnignore property to that directory for __pycache__.
msg118889 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-16 20:15
On Sat, Oct 16, 2010 at 1:34 PM, R. David Murray <report@bugs.python.org> wrote:
..
> I yhink you need to add an svnignore property to that directory for __pycache__.

r85576 (I hope I got it right.)
History
Date User Action Args
2010-10-16 20:15:48belopolskysetmessages: + msg118889
2010-10-16 17:34:52r.david.murraysetnosy: + r.david.murray
messages: + msg118878
2010-10-15 16:29:28belopolskysetstatus: open -> closed

messages: + msg118797
2010-10-15 15:41:00belopolskysetfiles: + issue9308b.diff

messages: + msg118785
2010-07-22 17:57:18belopolskysetnosy: niemeyer, astrand, mark.dickinson, belopolsky, eric.araujo
messages: + msg111203

components: + Tests
keywords: - needs review
stage: patch review -> commit review
2010-07-20 17:03:59eric.araujosetmessages: + msg110922
2010-07-20 16:54:59belopolskysetfiles: - issue9308a.diff
2010-07-20 16:54:42belopolskysetfiles: + issue9308a.diff

messages: + msg110921
2010-07-20 15:10:58eric.araujosetmessages: + msg110907
2010-07-20 03:46:27belopolskysetkeywords: + needs review
stage: commit review -> patch review
2010-07-20 03:44:53belopolskysetfiles: + issue9308a.diff

messages: + msg110860
2010-07-20 02:55:47belopolskysetnosy: - Alexander.Belopolsky
messages: + msg110853
2010-07-19 23:30:11eric.araujosetnosy: + niemeyer, - esr
messages: + msg110840
2010-07-19 23:21:07Alexander.Belopolskysetnosy: + Alexander.Belopolsky
messages: + msg110839
2010-07-19 23:16:46eric.araujosetnosy: + esr
2010-07-19 23:16:15eric.araujosetmessages: + msg110835
2010-07-19 22:55:36belopolskysetmessages: + msg110830
2010-07-19 22:50:43belopolskysetmessages: + msg110828
2010-07-19 20:33:21eric.araujosetkeywords: - easy
nosy: + mark.dickinson
messages: + msg110807

2010-07-19 20:20:40belopolskysetmessages: + msg110805
2010-07-19 20:12:18belopolskysetfiles: + issue9308.diff

nosy: + astrand
messages: + msg110804

keywords: + patch
stage: needs patch -> commit review
2010-07-19 19:52:08eric.araujosetversions: + Python 3.2
nosy: + eric.araujo

keywords: + easy
resolution: accepted
stage: needs patch
2010-07-19 18:57:07belopolskycreate