Message 283132 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	Ivan.Pozdeev, serhiy.storchaka, terry.reedy
Date	2016-12-13.19:05:06
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1481655906.71.0.387962225031.issue28923@psf.upfronthosting.co.za>
In-reply-to

Content
I reread https://docs.python.org/27/reference/lexical_analysis.html#encoding-declarations A first or second line must be a comment matching "coding[=:]\s*([-\w.]+)" (which IDLE uses) and the captured name "must be recognized by Python". I also did some experiments. Apparently, "iso-latin-1-unix" is recognized by Python. On Windows, from an IDLE editor, # coding: iso-latin-1-unix runs, while # coding: xiso-latin-1-unix raises, during the compile(..., 'file', 'exec') call: SyntaxError: unknown encoding: xiso-latin-1-unix Since codecs.lookup() returns the same error for both lines: LookupError: unknown encoding: iso-latin-1-unix compile() must be doing something other than simply calling codecs.lookup. I suspect it somehow recognizes 'iso', 'latin-1', and 'unix' as valid chunks of an ecoding name. (The last might even be an obsolete legacy item.) Whatever it is, it is not obviously available to tools written in Python. Note that 'recognized as a legitimate encoding name' and 'available on a particular installation' are different concepts. I believe codecs.lookup implements the latter.

I reread
https://docs.python.org/27/reference/lexical_analysis.html#encoding-declarations
A first or second line must be a comment matching "coding[=:]\s*([-\w.]+)" (which IDLE uses) and the captured name "must be recognized by Python".

I also did some experiments.  Apparently, "iso-latin-1-unix" is recognized by Python.  On Windows, from an IDLE editor,
  # coding: iso-latin-1-unix
runs, while 
  # coding: xiso-latin-1-unix
raises, during the compile(..., 'file', 'exec') call:
  SyntaxError: unknown encoding: xiso-latin-1-unix

Since codecs.lookup() returns the same error for both lines:
  LookupError: unknown encoding: iso-latin-1-unix
compile() must be doing something other than simply calling codecs.lookup.  I suspect it somehow recognizes 'iso', 'latin-1', and 'unix' as valid chunks of an ecoding name.  (The last might even be an obsolete legacy item.)  Whatever it is, it is not obviously available to tools written in Python.

Note that 'recognized as a legitimate encoding name' and 'available on a particular installation' are different concepts. I believe codecs.lookup implements the latter.

History
Date	User	Action	Args
2016-12-13 19:05:06	terry.reedy	set	recipients: + terry.reedy, serhiy.storchaka, Ivan.Pozdeev
2016-12-13 19:05:06	terry.reedy	set	messageid: <1481655906.71.0.387962225031.issue28923@psf.upfronthosting.co.za>
2016-12-13 19:05:06	terry.reedy	link	issue28923 messages
2016-12-13 19:05:06	terry.reedy	create