Message 106134 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ysj.ray
Recipients	meatballhat, ysj.ray
Date	2010-05-20.09:08:50
SpamBayes Score	1.3668899e-08
Marked as misclassified	No
Message-id	<1274346534.56.0.254674868282.issue8774@psf.upfronthosting.co.za>
In-reply-to

Content
This is the problem with module tabnanny, it always tries to read the py source file as a platform-dependent encoded text module, that is, open the file with builtin function "open()", and with no encoding parameters. It doesn't parse the encoding cookie at the beginning of the fource file! So if a python source file contains some character not encoded in that platform-dependent encoding, the tabnanny module will fail on checking that source file. Not only heapq.py, but also several other stander modules. That platform-dependent encoding is judged as following orders: 1. os.device_encoding(fd) 2. locale.preferredencoding() 3. ascii. I wonder why tabnanny works in this way. Is this the intended behaviour? On my flatform, if I use tabnanny to check a source file which contains some chinese characters and encoded in 'gbk', the UnicodeDecodedError will raise. If this is not the intended behaviour, I guess if we want to fix this problem, we have to change the way tabnanny read the source file. Just like the way python compiler works. First, open the file in "rb" module, then try to detect the encoding use tokenize.detect_encoding() method, then use the dected encoding to open the source file again in text module.

This is the problem with module tabnanny, it always tries to read the py source file as a platform-dependent encoded text module, that is, open the file with builtin function "open()", and with no encoding parameters. It doesn't parse the encoding cookie at the beginning of the fource file! So if a python source file contains some character not encoded in that platform-dependent encoding, the tabnanny module will fail on checking that source file. Not only heapq.py, but also several other stander modules.

That platform-dependent encoding is judged as following orders:
1. os.device_encoding(fd)
2. locale.preferredencoding()
3. ascii.

I wonder why tabnanny works in this way. Is this the intended behaviour?  On my flatform, if I use tabnanny to check a source file which contains some chinese characters and encoded in 'gbk', the UnicodeDecodedError will raise.

If this is not the intended behaviour, I guess if we want to fix this problem, we have to change the way tabnanny read the source file. Just like the way python compiler works. First, open the file in "rb" module, then try to detect the encoding use tokenize.detect_encoding() method, then use the dected encoding to open the source file again in text module.

History
Date	User	Action	Args
2010-05-20 09:08:55	ysj.ray	set	recipients: + ysj.ray, meatballhat
2010-05-20 09:08:54	ysj.ray	set	messageid: <1274346534.56.0.254674868282.issue8774@psf.upfronthosting.co.za>
2010-05-20 09:08:52	ysj.ray	link	issue8774 messages
2010-05-20 09:08:50	ysj.ray	create