Message 214268 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Daniel.U..Thibault
Recipients	Daniel.U..Thibault, docs@python, georg.brandl, r.david.murray
Date	2014-03-20.20:00:38
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1395345638.14.0.0710363520699.issue20686@psf.upfronthosting.co.za>
In-reply-to

Content
>>> mystring="äöü" >>> myustring=u"äöü" >>> mystring '\xc3\xa4\xc3\xb6\xc3\xbc' >>> myustring u'\xe4\xf6\xfc' >>> str(mystring) '\xc3\xa4\xc3\xb6\xc3\xbc' >>> str(myustring) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) >>> f = open('workfile', 'w') >>> f.write(mystring) >>> f.close() >>> f = open('workufile', 'w') >>> f.write(myustring) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128) >>> f.close() workfile contains C3 A4 C3 B6 C3 BC So the Unicode string (myustring) does indeed try to convert to ASCII when written to file. But not when just printed. It seems really strange that non-Unicode strings (mystring) should actually be more flexible than Unicode strings...

>>> mystring="äöü"
>>> myustring=u"äöü"

>>> mystring
'\xc3\xa4\xc3\xb6\xc3\xbc'
>>> myustring
u'\xe4\xf6\xfc'

>>> str(mystring)
'\xc3\xa4\xc3\xb6\xc3\xbc'
>>> str(myustring)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

>>> f = open('workfile', 'w')
>>> f.write(mystring)
>>> f.close()
>>> f = open('workufile', 'w')
>>> f.write(myustring)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
>>> f.close()

workfile contains C3 A4 C3 B6 C3 BC

So the Unicode string (myustring) does indeed try to convert to ASCII when written to file. But not when just printed.

It seems really strange that non-Unicode strings (mystring) should actually be more flexible than Unicode strings...

History
Date	User	Action	Args
2014-03-20 20:00:38	Daniel.U..Thibault	set	recipients: + Daniel.U..Thibault, georg.brandl, r.david.murray, docs@python
2014-03-20 20:00:38	Daniel.U..Thibault	set	messageid: <1395345638.14.0.0710363520699.issue20686@psf.upfronthosting.co.za>
2014-03-20 20:00:38	Daniel.U..Thibault	link	issue20686 messages
2014-03-20 20:00:38	Daniel.U..Thibault	create