This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author a.badger
Recipients a.badger, loewis, vstinner
Date 2008-11-24.06:40:36
SpamBayes Score 0.0
Marked as misclassified No
Message-id <1227508839.13.0.529572776917.issue4006@psf.upfronthosting.co.za>
In-reply-to
Content
Is it a bug?  If so, then it should be retargetted to 3.1 instead of
closed wontfix.  If it's not a bug then there should be an explanation
of why it's not a bug.

As for fixing it there are several inelegant methods that are better
than silently ignoring the problem:

1) return mixed unicode and byte types in os.environ
2) return only byte types in os.environ
3) raise an exception if someone attempts to access an environment
variable that cannot be decoded to unicode via the system encoding and
allow the value to be accessed as a byte string via another method.
4) silently ignore the non-decodable variables when accessing os.environ
the normal way but have another method of accessing it that returns all
values as byte strings.

#4 is closest to what was done with os.listdir().  However, I think that
approach is wrong for os.listdir() and os.environ because it leads to
code that works in simple testing but can start failing mysteriously
when it becomes used in more environments.  The os.listdir() method will
lead to lots of people having to write code that uses the byte methods
on Unix and does its own conversion because it's the only thing
guaranteed to work on Unix and the unicode methods on Windows because
it's the only thing guaranteed to work there.  It degenerates to case #2
except harder to debug and requiring more platform specific knowledge of
the programmer.

#3 seems like the best choice to me as it provides a way for the
programmer to discover what's wrong and provide a fix but people seem to
have learned the wrong lessons from the python2 UnicodeEncode/Decode
problems so that might not have a large following other than me....

#2 is conceptually correct since environment variables are a point where
you're receiving bytes from a non-python environment.  However, it's
very annoying for the common case where everything in the environment
has a single encoding.

#1 is the easiest for simplistic code to deal with but seems to violate
the python3 philosophy the most.  I don't like it as it takes us to one
of the real failings of python2's unicode handling: Not knowing what
type of data you're going to get back from a method and therefore not
knowing if you have to convert it before passing it on.  Please don't do
this one as it's two steps forward and one step backwards from where we
are now.
History
Date User Action Args
2008-11-24 06:40:39a.badgersetrecipients: + a.badger, loewis, vstinner
2008-11-24 06:40:39a.badgersetmessageid: <1227508839.13.0.529572776917.issue4006@psf.upfronthosting.co.za>
2008-11-24 06:40:38a.badgerlinkissue4006 messages
2008-11-24 06:40:36a.badgercreate