Issue 1278: imp.find_module() ignores -*- coding: Latin-1 -*-

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/45619

classification

Title:	imp.find_module() ignores -- coding: Latin-1 --
Type:	behavior	Stage:
Components:	Interpreter Core	Versions:	Python 3.0

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	alexandre.vassalotti, brett.cannon, christian.heimes, gvanrossum
Priority:	normal	Keywords:

Created on 2007-10-15 01:34 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (9)
msg56431 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 01:34
imp.find_module() returns an io.TextIOWrapper instance first value. The encoding of the TextIOWrapper isn't set from a -- coding: Latin-1 -- line. >>> import imp >>> imp.find_module("heapq") (<io.TextIOWrapper object at 0xb7c8f50c>, '/home/heimes/dev/python/py3k/Lib/heapq.py', ('.py', 'U', 1)) >>> imp.find_module("heapq")[0].read() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/heimes/dev/python/py3k/Lib/io.py", line 1224, in read res += decoder.decode(self.buffer.read(), True) File "/home/heimes/dev/python/py3k/Lib/codecs.py", line 291, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1428-1430: invalid data >>> imp.find_module("heapq")[0].encoding 'UTF-8' >>> imp.find_module("heapq")[0].readline() '# -- coding: Latin-1 --\n'
msg56451 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 17:29
Can you suggest a patch? Adding Brett Cannon to the list, possibly his import-in-python would supersede this?
msg56453 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 17:47
> Can you suggest a patch? > > Adding Brett Cannon to the list, possibly his import-in-python would > supersede this? No, I can't suggest a patch. I don't know how we could get the encoding from the tokenizer or AST. Brett is obviously the best man to fix the problem. :) Christian
msg56457 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 18:02
> No, I can't suggest a patch. I don't know how we could get the encoding > from the tokenizer or AST. Try harder. :-) Look at the code that accomplishes this feat in the regular parser...
msg56459 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 18:30
> Try harder. :-) Look at the code that accomplishes this feat in the > regular parser... I've already found the methods that find the encoding in Parser/tokenizer.c: check_coding_spec() and friends. But it seems like a waste of time to use PyTokenizer_FromFile() just to find the encoding. reading Mmh ... It's not a waste of time if I can stop the tokenizer. I think it may be possible to use the tokenizer to get the encoding efficiently. I could read until tok_state->read_coding_spec or tok_state->indent != 0. Do you know a better way to stop the tokenizer when the line isn't a special comment line "# -*-"? Christian
msg56461 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-10-15 19:30
Call PyTokenizer_Get until the line number is > 2? On 10/15/07, Christian Heimes <report@bugs.python.org> wrote: > > Christian Heimes added the comment: > > > Try harder. :-) Look at the code that accomplishes this feat in the > > regular parser... > > I've already found the methods that find the encoding in > Parser/tokenizer.c: check_coding_spec() and friends. > > But it seems like a waste of time to use PyTokenizer_FromFile() just to > find the encoding. reading Mmh ... It's not a waste of time if I can > stop the tokenizer. I think it may be possible to use the tokenizer to > get the encoding efficiently. I could read until > tok_state->read_coding_spec or tok_state->indent != 0. > > Do you know a better way to stop the tokenizer when the line isn't a > special comment line "# -*-"? > > Christian > > __________________________________ > Tracker <report@bugs.python.org> > <http://bugs.python.org/issue1278> > __________________________________ >
msg56462 - (view)	Author: Brett Cannon (brett.cannon) *	Date: 2007-10-15 19:34
No, my work has the exact same problem. Actually, this bug report has confirmed for me why heapq could not be imported when I accidentally forced all open text files to use UTF-8. I just have not gotten around to trying to solve this issue yet. But since importlib just uses open() directly it has the same problems. Since it looks like TextIOWrapper does not let one change the encoding after it has been set, some subclass might need to be written that reads Looks for the the stanza or else immediately stops and uses the expected encoding (UTF-8 in the case of Py3K or ASCII for 2.6). That or expose some C function that takes a file path or open file that returns a code object. But I have bigger fish to fry as my attempt to get around open() being defined in site.py is actually failing once I clobbered my .pyc files as codecs requires importing modules, even for ASCII encoding.
msg56463 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-15 19:36
> Call PyTokenizer_Get until the line number is > 2? That's too easy :] I'm going to implement the fix tonight. Christian
msg56575 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-10-19 23:22
The bug was fixed in r58553 together with http://bugs.python.org/issue1267. Please close this bug.

History
Date	User	Action	Args
2022-04-11 14:56:27	admin	set	github: 45619
2007-10-19 23:36:58	gvanrossum	set	status: open -> closed resolution: fixed
2007-10-19 23:22:26	christian.heimes	set	messages: + msg56575
2007-10-16 01:15:09	alexandre.vassalotti	set	nosy: + alexandre.vassalotti
2007-10-15 19:36:01	christian.heimes	set	messages: + msg56463
2007-10-15 19:34:55	brett.cannon	set	messages: + msg56462
2007-10-15 19:30:23	gvanrossum	set	messages: + msg56461
2007-10-15 18:30:56	christian.heimes	set	messages: + msg56459
2007-10-15 18:02:59	gvanrossum	set	messages: + msg56457
2007-10-15 17:47:13	christian.heimes	set	messages: + msg56453
2007-10-15 17:29:19	gvanrossum	set	nosy: + brett.cannon, gvanrossum messages: + msg56451
2007-10-15 01:34:28	christian.heimes	create