This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: unicode('foo', '.utf99') does not raise LookupError
Type: Stage:
Components: Unicode Versions: Python 2.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: akuchling Nosy List: akuchling, doerwalter, georg.brandl, nnorwitz, osvenskan
Priority: release blocker Keywords:

Created on 2006-03-09 00:55 by osvenskan, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
UnexpectedValueError.txt osvenskan, 2006-03-09 00:55 demo of the ValueError
encodings-search.diff georg.brandl, 2006-08-17 19:17 patch
Messages (8)
msg27716 - (view) Author: Philip Semanchuk (osvenskan) * Date: 2006-03-09 00:55
A very minor inconsistency -- when I call unicode()
with an encoding that Python doesn't know about, it
usually returns a lookup error (e.g LookupError:
unknown encoding: utf99). But when the encoding begins
with a dot (ASCII 0x2e), Python instead gives a
ValueError: Empty module name. It is certainly correct
in raising an error, but it should raise a lookup
error, not a value error.

I've recreated this under Python 2.4.1/FreeBSD 6.0 and
2.3/OS X. See attachment for recreation steps.

msg27717 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2006-03-09 08:12
Logged In: YES 
user_id=89016

The problem is that after normalizing the encoding name a
module with this name is imported. Maybe
encodings/__init__.py:search_function should do:

if ".".join(filter(None, modname.split("."))) != modname:
   return None
msg27718 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-03-09 08:16
Logged In: YES 
user_id=849994

Is it possible for an encoding name to contain dots at all?

If not, this would do too:
if '.' in modname: continue
msg27719 - (view) Author: Philip Semanchuk (osvenskan) * Date: 2006-03-09 15:04
Logged In: YES 
user_id=1119995

There are encoding names that contain dots, such as
ANSI_X3.4-1968, ANSI_X3.4-1986 and ISO_646.IRV:1991 (as
reported by iconv). There are none in iconv's list that
begin with a dot. 

Please note that the behavior of this function has been
discussed before in Python bugs 513666 and 960874. Apologies
for not referencing them in my original report. 

Having stepped through the code, I understand how the
ValueError is getting generated. My frustration with this as
a programmer is that I want to write specific except clauses
for each possible exception that a method can raise, but
that's impractical if any exception is fair game on any
method. So I'm forced to use a catch-all except clause about
which the Python documentation says (wisely, IMHO), "Use
this with extreme caution, since it is easy to mask a real
programming error in this way!" While it is helpful to
document errors that a method is *likely* to raise, my code
needs to handle all possibilities, not just likely ones.

Perhaps the answer is just, "This is how Python works" and
if I feel it is a weakness in the language I need to take it
up on a different level. 
msg27720 - (view) Author: Philip Semanchuk (osvenskan) * Date: 2006-04-06 14:45
Logged In: YES 
user_id=1119995

I noticed that the documentation for unicode() says, "if the
encoding is not known, LookupError is raised". Regarding the
3rd parameter ("errors") to unicode(), the docs say, "Error
handling is done according to errors; this specifies the
treatment of characters which are invalid in the input
encoding. If errors is 'strict' (the default), a ValueError
is raised on errors..."
ref: http://docs.python.org/lib/built-in-funcs.html

That makes the code's current behavior doubly confusing
because a the documentation says that a ValueError is
reserved for indicating an undecodable byte sequence, not an
unknown encoding name.
msg27721 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-08-17 19:17
Logged In: YES 
user_id=849994

I'd say that this should be fixed before 2.5 final.

Attached patch (the modname that's used for import may not
contain a dot anymore...)
msg27722 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2006-09-30 11:22
Logged In: YES 
user_id=849994

Fixed in rev. 52075, 52076 (2.4), 52077 (2.5).
msg27723 - (view) Author: Neal Norwitz (nnorwitz) * (Python committer) Date: 2006-10-04 05:58
Logged In: YES 
user_id=33168

I thought part of this was reverted, but I'm not sure if it
was reverted in 2.4.  I know I had starred this for some
reason, but I don't recall exactly.  This should be
investigated.  I'm not sure there was a test for this either.
History
Date User Action Args
2022-04-11 14:56:15adminsetgithub: 43000
2006-03-09 00:55:01osvenskancreate