This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: encoding error trying to save string to file
Type: behavior Stage: resolved
Components: Unicode Versions: Python 3.4
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: Gravitania, ezio.melotti, martin.panter, serhiy.storchaka, vstinner
Priority: normal Keywords:

Created on 2015-02-27 21:13 by Gravitania, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Redear_Carpetotas.py Gravitania, 2015-02-27 21:13 Reads content of a folder and writes to a text file.
Messages (4)
msg236841 - (view) Author: Rosa Maria (Gravitania) * Date: 2015-02-27 21:13
I made a program to read which files are in a windows folder, and saves in a file in order to print it, but when it tries to write in a file the following error appears:   

UnicodeEncodeError: 'charmap' codec can't encode character '\u2010' in position 8: character maps to <undefined>

I extpected that being Python-3 an utf8 native, I do'n have this problems.

I send the Python script and some examples of files to read.
One of the failures example is the file named:

'LKC.6558‐100‐HD‐P‐101_C.xlsx\n'
which appears in windows as:
'LKC6558100HDP101_C.xlsx\n'
msg236844 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-02-27 21:36
Please provide the full traceback. The error can come from a lot of
functions. Which function fails?
msg236856 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-02-27 23:19
Python 3 will only use UTF-8 encoding if you ask it to, or if the default locale encoding happens to be UTF-8. I suspect one of the file names in the “hay” list must contain a Unicode hyphen (U+2010), and your default encoding is some single byte encoding that cannot encode the hyphen. If you definitely want to write a UTF-8 file, use the open(encoding="utf-8") parameter.
msg252394 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-10-06 14:09
Indeed, the name that looks as ASCII actually contains non-ASCII characters.

>>> print(ascii('LKC.6558‐100‐HD‐P‐101_C.xlsx\n'))
'LKC.6558\u2010100\u2010HD\u2010P\u2010101_C.xlsx\n'

Martin suggests correct solution.
History
Date User Action Args
2022-04-11 14:58:13adminsetgithub: 67731
2015-10-06 14:09:11serhiy.storchakasetstatus: open -> closed

type: crash -> behavior

nosy: + serhiy.storchaka
messages: + msg252394
resolution: not a bug
stage: resolved
2015-02-27 23:19:16martin.pantersetnosy: + martin.panter
messages: + msg236856
2015-02-27 21:36:06vstinnersetmessages: + msg236844
2015-02-27 21:13:19Gravitaniacreate