Issue 960874: codecs.lookup can raise exceptions other than LookupError

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/40295

classification

Title:	codecs.lookup can raise exceptions other than LookupError
Type:		Stage:
Components:	Unicode	Versions:

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:
Assigned To:	lemburg	Nosy List:	jpe, lemburg, mwh
Priority:	normal	Keywords:

Created on 2004-05-26 14:37 by jpe, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (8)
msg20893 - (view)	Author: John Ehresman (jpe) *	Date: 2004-05-26 14:37
codecs.lookup raises ValueError when given an empty string and UnicodeEncodeError when given a unicode object that can't be converted to a str in the default encoding. I'd expect it to raise LookupError when passed any basestring instance. For example: Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> codecs.lookup('') Traceback (most recent call last): File "<stdin>", line 1, in ? File "c:\python23\lib\encodings\__init__.py", line 84, in search_function globals(), locals(), _import_tail) ValueError: Empty module name >>> codecs.lookup(u'\uabcd') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\uabcd' in position 0: ordinal not in range (128) >>>
msg20894 - (view)	Author: Michael Hudson (mwh)	Date: 2004-05-26 16:32
Logged In: YES user_id=6656 What exactly are you complaining about? I'd expect codecs.lookup to raise TypeError if called with no arguments or an integer. I believe it's documented somewhere that encoding names must be ascii only, but I must admit I don't recall where.
msg20895 - (view)	Author: John Ehresman (jpe) *	Date: 2004-05-26 17:09
Logged In: YES user_id=22785 The other exceptions occur when strings or unicode objects are passed in as an argument. The string that it fails on is the empty string (''). I can see disallowing non-ascii names, but '' should raise a LookupError. My use case is to see if an user supplied unicode string is a valid encoding, so any check that the lookup function does not do, I will need to do before calling it.
msg20896 - (view)	Author: Michael Hudson (mwh)	Date: 2004-05-26 17:13
Logged In: YES user_id=6656 This much seems to be fixed in CVS, actually :-)
msg20897 - (view)	Author: John Ehresman (jpe) *	Date: 2004-05-26 18:47
Logged In: YES user_id=22785 Yes, it does look like lookup('') is fixed in CVS. So the question is whether lookup() of something that isn't convertable in the current encoding to a char* should raise a LookupError. I can live with it not, though if it did, it would make it a bit easier to determine if an arbitrary unicode string is a name of a supported encoding. I'm willing to put together a patch to raise LookupError if that's what the behavior should be
msg20898 - (view)	Author: Michael Hudson (mwh)	Date: 2004-05-26 18:53
Logged In: YES user_id=6656 Well, I don't think that's a particularly good idea. I don't know if Marc-André feels differently.
msg20899 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-05-26 19:17
Logged In: YES user_id=38388 I don't think we should change anything. First of all, the lookup function interfaces to a codec search function and these can raise all kinds of errors, so it is not guaranteed that you will only see LookupErrors (the same is true for most other Python APIs, e.g. most can generate MemoryErrors). Possible other errors are ValueErrors, NameErrors, ImportErrors, etc. etc. depending on the search function that happens to process your request. Second, the name you enter as argument usually maps to a Python module and/or package name, so it has to be ASCII. The fact that you can enter Unicode names for the codec name if only by virtue of the automagical conversion of Unicode to strings. Again, this happens in a lot of places in Python and is not specific to lookup(). Closing this request.
msg20900 - (view)	Author: John Ehresman (jpe) *	Date: 2004-05-26 19:33
Logged In: YES user_id=22785 Okay, that works for me. We might want to update the documentation, which seems to imply that LookupError will be raised if the name is invalid -- my mental model was that it acted more like a dictionary. I was just trying to avoid a catch all handler to catch expected failures (an encoding being unavailable is exepect because I know I may be feeding junk to it; but out of memory wouldn't be, though I know it can happen anywhere). Thanks for the quick response :).

History
Date	User	Action	Args
2022-04-11 14:56:04	admin	set	github: 40295
2004-05-26 14:37:36	jpe	create