Title: codecs.lookup can raise exceptions other than LookupError
Type: Stage:
Components: Unicode Versions:
Status: closed Resolution: wont fix
Dependencies: Superseder:
Assigned To: lemburg Nosy List: jpe, lemburg, mwh
Priority: normal Keywords:

Created on 2004-05-26 14:37 by jpe, last changed 2004-05-26 19:33 by jpe. This issue is now closed.

Messages (8)
msg20893 - (view) Author: John Ehresman (jpe) * Date: 2004-05-26 14:37
codecs.lookup raises ValueError when given an empty 
string and UnicodeEncodeError when given a unicode 
object that can't be converted to a str in the default 
encoding.  I'd expect it to raise LookupError when 
passed any basestring instance.

For example:
Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC 
v.1200 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more 
>>> import codecs
>>> codecs.lookup('')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "c:\python23\lib\encodings\", line 84, in 
    globals(), locals(), _import_tail)
ValueError: Empty module name
>>> codecs.lookup(u'\uabcd')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode 
character u'\uabcd' in position 0: ordinal not in range
msg20894 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-05-26 16:32
Logged In: YES 

What exactly are you complaining about?  I'd expect codecs.lookup 
to raise TypeError if called with no arguments or an integer.

I believe it's documented somewhere that encoding names must 
be ascii only, but I must admit I don't recall where.
msg20895 - (view) Author: John Ehresman (jpe) * Date: 2004-05-26 17:09
Logged In: YES 

The other exceptions occur when strings or unicode objects 
are passed in as an argument.  The string that it fails on is 
the empty string ('').  I can see disallowing non-ascii names, 
but '' should raise a LookupError.

My use case is to see if an user supplied unicode string is a 
valid encoding, so any check that the lookup function does 
not do, I will need to do before calling it.
msg20896 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-05-26 17:13
Logged In: YES 

This much seems to be fixed in CVS, actually :-)
msg20897 - (view) Author: John Ehresman (jpe) * Date: 2004-05-26 18:47
Logged In: YES 

Yes, it does look like lookup('') is fixed in CVS.  So the 
question is whether lookup() of something that isn't 
convertable in the current encoding to a char* should raise a 
LookupError.  I can live with it not, though if it did, it would 
make it a bit easier to determine if an arbitrary unicode string 
is a name of a supported encoding.  

I'm willing to put together a patch to raise LookupError if 
that's what the behavior should be
msg20898 - (view) Author: Michael Hudson (mwh) (Python committer) Date: 2004-05-26 18:53
Logged In: YES 

Well, *I* don't think that's a particularly good idea.  I don't know if 
Marc-André feels differently.
msg20899 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-05-26 19:17
Logged In: YES 

I don't think we should change anything.

First of all, the lookup function interfaces to a codec
search function and these can raise all kinds of errors, so
it is not guaranteed that you will only see LookupErrors
(the same is true for most other Python APIs, e.g. most can
generate MemoryErrors). Possible other errors are
ValueErrors, NameErrors, ImportErrors, etc. etc. depending
on the search function that happens to process your request.

Second, the name you enter as argument usually maps to a
Python module and/or package name, so it *has* to be ASCII.
The fact that you can enter Unicode names for the codec name
if only by virtue of the automagical conversion of Unicode
to strings. Again, this happens in a lot of places in Python
and is not specific to lookup().

Closing this request.
msg20900 - (view) Author: John Ehresman (jpe) * Date: 2004-05-26 19:33
Logged In: YES 

Okay, that works for me.  We might want to update the 
documentation, which seems to imply that LookupError will be 
raised if the name is invalid -- my mental model was that it 
acted more like a dictionary.  I was just trying to avoid a 
catch all handler to catch expected failures (an encoding 
being unavailable is exepect because I know I may be feeding 
junk to it; but out of memory wouldn't be, though I know it 
can happen anywhere).

Thanks for the quick response :).
Date User Action Args
2004-05-26 14:37:36jpecreate