This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author stijn
Recipients BreamoreBoy, David.Sankel, Drekin, akira, amaury.forgeotdarc, christian.heimes, christoph, davidsarah, ezio.melotti, flox, giampaolo.rodola, hippietrail, lemburg, mark, mhammond, ncoghlan, pitrou, santoso.wijaya, smerlin, ssbarnea, steve.dower, stijn, terry.reedy, tim.golden, tzot, v+python, wiz21
Date 2014-10-02.08:50:54
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1412239855.03.0.467892886165.issue1602@psf.upfronthosting.co.za>
In-reply-to
Content
New here, but I think this is the correct issue to get info about this unicode problem. On the windows console:

> chcp
Active code page: 437

> type utf.txt
Привет

> chcp 65001
Active code page: 65001

> type utf.txt
Привет

> python --version
Python 3.5.0a0

> cat utf.py
f = open('utf.txt')
l = f.readline()
print(l)
print(len(l))

> python utf.py
Привет
�²ÐµÑ‚
�‚


13

> cat utf_explicit.py
import codecs
f = codecs.open('utf.txt', encoding='utf-8', mode='r')
l = f.readline()
print(l)
print(len(l))

> python utf_explicit.py
Привет
ет


7

I partly read through the page but these things are a bit above my head. Could anyone explain
- how to figure out what codec files returned by open()?
- is there a way to change it globally to utf-8?
- the last case is almost correct: it has the correct number of characters, but the print() still does something wrong. I got this working by using the stream patch, but got another example on which is is not correct, see below. Any way around this?

> type utf2.txt
aαbβcγdδ

> cat utf2.py
import streams
import codecs
streams.enable()
f = codecs.open('utf2.txt', encoding='utf-8', mode='r')
print(f.read(1))
print(f.read(1))
print(f.read(2))
print(f.read(4))

> python utf2.py
a
α
bβc
γdδ
History
Date User Action Args
2014-10-02 08:50:55stijnsetrecipients: + stijn, lemburg, mhammond, terry.reedy, tzot, amaury.forgeotdarc, ncoghlan, pitrou, giampaolo.rodola, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, ssbarnea, flox, davidsarah, santoso.wijaya, akira, BreamoreBoy, David.Sankel, smerlin, Drekin, steve.dower, wiz21
2014-10-02 08:50:55stijnsetmessageid: <1412239855.03.0.467892886165.issue1602@psf.upfronthosting.co.za>
2014-10-02 08:50:55stijnlinkissue1602 messages
2014-10-02 08:50:54stijncreate