Message 74197 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	loewis, vstinner
Date	2008-10-02.21:49:10
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1222984154.8.0.04402572573.issue4008@psf.upfronthosting.co.za>
In-reply-to

Content
loewis wrote: > Notice that there is also IOBinding.coding_spec. > Not sure whether this or the one in tokenize is more correct. Oh! IOBinding reimplement many features now available in Python like universal new line or function to write unicode strings to a file. But I don't want to rewrite IDLE, I just want to fix the initial problem: IDLE is unable to open a non-ASCII file using "#coding:" header. So IDLE reimplemented coding detection twice: once in IOBinding and once in ScriptBinding. So I wrote a new version of my patch removing all the code to reuse tokenize.detect_encoding(). I changed IDLE behaviour: IOBinding._decode() used the locale encoding if it's unable to detect the encoding using UTF-8 BOM and/or if the #coding: header is missing. Since I also read "Finally, try the locale's encoding. This is deprecated", I prefer to remove it. If you want to keep the current behaviour, use: ------------------------- def detect_encoding(filename, default=None): with open(filename, 'rb') as f: encoding, line = tokenize.detect_encoding(f.readline) if (not line) and default: return default return encoding ... encoding = detect_encoding(filename, locale_encoding) ------------------------- Please review and test my patch (which becomes longer and longer) :-)

loewis wrote:
> Notice that there is also IOBinding.coding_spec.
> Not sure whether this or the one in tokenize is more correct.

Oh! IOBinding reimplement many features now available in Python like 
universal new line or function to write unicode strings to a file. But 
I don't want to rewrite IDLE, I just want to fix the initial problem: 
IDLE is unable to open a non-ASCII file using "#coding:" header.

So IDLE reimplemented coding detection twice: once in IOBinding and 
once in ScriptBinding. So I wrote a new version of my patch removing 
all the code to reuse tokenize.detect_encoding().

I changed IDLE behaviour: IOBinding._decode() used the locale encoding 
if it's unable to detect the encoding using UTF-8 BOM and/or if the 
#coding: header is missing. Since I also read "Finally, try the 
locale's encoding. This is deprecated", I prefer to remove it. If you 
want to keep the current behaviour, use:
-------------------------
def detect_encoding(filename, default=None):
    with open(filename, 'rb') as f:
        encoding, line = tokenize.detect_encoding(f.readline)
    if (not line) and default:
        return default
    return encoding
...
            encoding = detect_encoding(filename, locale_encoding)
-------------------------

Please review and test my patch (which becomes longer and longer) :-)

History
Date	User	Action	Args
2008-10-02 21:49:15	vstinner	set	recipients: + vstinner, loewis
2008-10-02 21:49:14	vstinner	set	messageid: <1222984154.8.0.04402572573.issue4008@psf.upfronthosting.co.za>
2008-10-02 21:49:13	vstinner	link	issue4008 messages
2008-10-02 21:49:13	vstinner	create