Issue 32723: codecs.open silently ignores argument errors

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76904

classification

Title:	codecs.open silently ignores argument errors
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.7, Python 3.6

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	josh.r, lemburg, vstinner, xiang.zhang
Priority:	normal	Keywords:

Created on 2018-01-30 06:13 by xiang.zhang, last changed 2022-04-11 14:58 by admin.

Messages (6)
msg311240 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2018-01-30 06:13
>>> import codecs >>> f = codecs.open('/tmp/a', 'w', errors='replace') >>> f.errors 'strict' Passing errors to codecs.open without encoding doesn't work. Can't get this from doc and I don't think it should silently ignore the passing argument.
msg311248 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2018-01-30 07:27
On both Py2 and Py3, calling codecs.open without passing an encoding argument is equivalent to adding 'b' to the mode string and calling the built-in open function directly; it returns a plain binary mode file object. The errors parameter is meaningless because no decoding is being performed (there will never be any errors to handle).
msg311263 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2018-01-30 10:32
I don't understand Josh. Looking from the code only when passing encoding binary mode is forced, although in the comment it's saying always. >>> f = codecs.open('/tmp/a', 'w') >>> f <open file '/tmp/a', mode 'w' at 0x7f516efbb6f0> For example I want to use 'replace' instead of 'strict' for default encoding, I can't simply do: >>> import codecs >>> f = codecs.open('/tmp/a', 'w', errors='replace') >>> f.write(u'\udc80') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\udc80' in position 0: ordinal not in range(128) I have to specify the default encoding explicitly to make errors function: >>> f = codecs.open('/tmp/a', 'w', encoding='ascii', errors='replace') >>> f.write(u'\udc80')
msg311307 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2018-01-31 00:41
Ah, my mistake. That said, when you're not passing an encoding, you're still just calling regular open, it's not doing any special codecs.open wrapping (on Python 2, this meant you weren't decoding the input at all, on Python 3 it means you're decoding with the default encoding); you may as well just call open (codecs.open is pointless as soon as io.open exists, since codecs.open it's slower and has so many weird quirks). While I acknowledge codecs.open is misbehaving here, I'm not sure fixing it is a great idea, since the function is effectively a legacy function (especially when used without an encoding argument), and io.open works correctly.
msg311308 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2018-01-31 01:06
On rereading the docs for Python 3 codecs.open, it doesn't seem to document the whole "no encoding means no codecs.StreamReaderWriter wrapping behavior" at all. First off, any fix would only apply to Python 3 (I've removed 2.7 from the versions). Both Python 2 and Python 3 have the behavior of calling the plain builtin open function with just filename, mode, and buffering when no encoding is provided. On Python 2, it's impossible to use the errors keyword (because plain built-in open doesn't do decoding, it doesn't accept an errors parameter); on Python 3, you could, but you'd be adding to the behavioral discrepancies with Python 2. The docs (just above codecs.open) already state: "the builtin open() and the associated io module are the recommended approach for working with encoded text files"; personally, I'm inclined to just wash my hands of codecs.open (perhaps moving the note about builtin open down inside codecs.open's docs, so people who get a direct link don't have to scroll up to notice the note). codecs.open was and is underspecified, never worked right (e.g. #8260, which despite the status, is not really fixed https://stackoverflow.com/a/46438434/364696 ), and the code which uses it is likely already working around its quirks, making fixes difficult.
msg311334 - (view)	Author: Xiang Zhang (xiang.zhang) *	Date: 2018-01-31 14:34
Honestly speaking I am not interested in Python3. I think codecs.open will be deprecated one day in Python3 and Victor has raised it long ago. See #8796. And in Python2, codecs.open is still in use. errors could still function when you are writing, encoding, just as the example I give. So now if encoding is not given, the builtin open is used, no matter errors is given or not. Is it reasonable to change the logic to either encoding or errors is given, we don't use the builtin open but the StreamReaderWriter wrapper?

History
Date	User	Action	Args
2022-04-11 14:58:57	admin	set	github: 76904
2018-01-31 14:34:27	xiang.zhang	set	messages: + msg311334
2018-01-31 01:06:53	josh.r	set	versions: - Python 2.7
2018-01-31 01:06:31	josh.r	set	messages: + msg311308
2018-01-31 00:41:20	josh.r	set	messages: + msg311307
2018-01-30 10:32:34	xiang.zhang	set	messages: + msg311263
2018-01-30 07:27:51	josh.r	set	nosy: + josh.r messages: + msg311248
2018-01-30 06:13:54	xiang.zhang	set	type: behavior title: codecs.open -> codecs.open silently ignores argument errors
2018-01-30 06:13:32	xiang.zhang	create