This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: errors='replace' does not work at Windows command line
Type: behavior Stage: resolved
Components: Windows Versions: Python 3.1
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: ezio.melotti, jvanpraag, r.david.murray
Priority: normal Keywords:

Created on 2010-06-30 15:22 by jvanpraag, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
junk.txt jvanpraag, 2010-06-30 15:22 Text file with 'bad' characters in 3rd line.
Messages (4)
msg108987 - (view) Author: John Van Praag (jvanpraag) Date: 2010-06-30 15:22
The declaration errors='replace' works from within IDLE but not at the Windows command line. I am attaching a program and text file that demonstrate the problem. The error shows up at the Windows command line as follows:

C:\Users\John\Documents\Python\bug_reports\001>python -m read_my_file
aaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaa

bbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb

Traceback (most recent call last):
  File "C:\Python31\lib\runpy.py", line 128, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "C:\Python31\lib\runpy.py", line 34, in _run_code
    exec(code, run_globals)
  File "C:\Users\John\Documents\Python\bug_reports\001\read_my_file.py", line 20, in <module>
    readf()
  File "C:\Users\John\Documents\Python\bug_reports\001\read_my_file.py", line 17, in readf
    print(line)
  File "C:\Python31\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 10-11: character maps to <undefined>


NOTE: It appears I can only attach 1 file to this report. So I am copying the program here. The text file to read is attached.

'''
read_my_file.py:
Reads lines from faulty file.
Hangs at line 3 when run from Windows command line.
Platforms:
Windows Vista Ultimate 64-bit
Python 3.1.2
'''
#The file to read.
my_file = 'junk.txt'

def readf():
	#The declaration "errors='replace'" is suppposed replace characters the reader does not recognize with a dummy character such as a question mark.
	#This fix works in the interpreter, but not from the Windows command line.
	fh_read = open(my_file, errors='replace')
	for line in fh_read:
		print(line)

#Run.
readf()
msg109024 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2010-06-30 22:29
The problem is not in the reading part, but in the print().
Since the default encoding of your terminal is cp437 and cp437 is not able to encode the "bad character" (U+2019 RIGHT SINGLE QUOTATION MARK), an error is raised.
msg109048 - (view) Author: John Van Praag (jvanpraag) Date: 2010-07-01 13:54
According to the documentation of the open function:

errors is an optional string that specifies how encoding and decoding
errors are to be handled–this cannot be used in binary mode. Pass
'strict' to raise a ValueError exception if there is an encoding error
(the default of None has the same effect), or pass 'ignore' to ignore
errors. (Note that ignoring encoding errors can lead to data loss.)
'replace' causes a replacement marker (such as '?') to be inserted where
there is malformed data. 

If a replacement marker such as '?' were replacing the bad characters,
the print function would not have a problem. The open function is not
working as described in the documentation.

On Wed, 30 Jun 2010 22:29 +0000, "Ezio Melotti" <report@bugs.python.org>
wrote:
> 
> Ezio Melotti <ezio.melotti@gmail.com> added the comment:
> 
> The problem is not in the reading part, but in the print().
> Since the default encoding of your terminal is cp437 and cp437 is not
> able to encode the "bad character" (U+2019 RIGHT SINGLE QUOTATION MARK),
> an error is raised.
> 
> ----------
> nosy: +ezio.melotti
> resolution:  -> invalid
> stage:  -> committed/rejected
> status: open -> closed
> type:  -> behavior
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue9126>
> _______________________________________
>
msg109050 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-07-01 14:04
The characters are fine when you read them (that is, they decode correctly to unicode).  They are only invalid when you write them to the windows terminal, which can't handle all the valid characters that are in the file.  The Idle output window uses a more capable character set, and can display those characters.
History
Date User Action Args
2022-04-11 14:57:03adminsetgithub: 53372
2010-07-01 14:04:08r.david.murraysetnosy: + r.david.murray
messages: + msg109050
2010-07-01 13:54:21jvanpraagsetmessages: + msg109048
2010-06-30 22:29:12ezio.melottisetstatus: open -> closed

type: behavior

nosy: + ezio.melotti
messages: + msg109024
resolution: not a bug
stage: resolved
2010-06-30 16:51:42benjamin.petersonlinkissue9029 superseder
2010-06-30 15:22:45jvanpraagcreate