Message228191
New here, but I think this is the correct issue to get info about this unicode problem. On the windows console:
> chcp
Active code page: 437
> type utf.txt
Привет
> chcp 65001
Active code page: 65001
> type utf.txt
Привет
> python --version
Python 3.5.0a0
> cat utf.py
f = open('utf.txt')
l = f.readline()
print(l)
print(len(l))
> python utf.py
Привет
�²ÐµÑ‚
�‚
13
> cat utf_explicit.py
import codecs
f = codecs.open('utf.txt', encoding='utf-8', mode='r')
l = f.readline()
print(l)
print(len(l))
> python utf_explicit.py
Привет
ет
7
I partly read through the page but these things are a bit above my head. Could anyone explain
- how to figure out what codec files returned by open()?
- is there a way to change it globally to utf-8?
- the last case is almost correct: it has the correct number of characters, but the print() still does something wrong. I got this working by using the stream patch, but got another example on which is is not correct, see below. Any way around this?
> type utf2.txt
aαbβcγdδ
> cat utf2.py
import streams
import codecs
streams.enable()
f = codecs.open('utf2.txt', encoding='utf-8', mode='r')
print(f.read(1))
print(f.read(1))
print(f.read(2))
print(f.read(4))
> python utf2.py
a
α
bβc
γdδ |
|
Date |
User |
Action |
Args |
2014-10-02 08:50:55 | stijn | set | recipients:
+ stijn, lemburg, mhammond, terry.reedy, tzot, amaury.forgeotdarc, ncoghlan, pitrou, giampaolo.rodola, christian.heimes, tim.golden, mark, christoph, ezio.melotti, v+python, hippietrail, ssbarnea, flox, davidsarah, santoso.wijaya, akira, BreamoreBoy, David.Sankel, smerlin, Drekin, steve.dower, wiz21 |
2014-10-02 08:50:55 | stijn | set | messageid: <1412239855.03.0.467892886165.issue1602@psf.upfronthosting.co.za> |
2014-10-02 08:50:55 | stijn | link | issue1602 messages |
2014-10-02 08:50:54 | stijn | create | |
|