This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients baikie, loewis, vstinner
Date 2009-01-07.11:30:38
SpamBayes Score 0.0019385676
Marked as misclassified No
Message-id <1231327841.3.0.949817677105.issue4859@psf.upfronthosting.co.za>
In-reply-to
Content
About pwd, we have 7 fields:
 - username: the regex looks like « [a-zA-Z0-9_.@]
[a-zA-Z0-9_.@\/]*$? », so it's ASCII only
 - password: ASCII only? on my Ubuntu, /etc/passwd uses "x" for all 
passwords, and /etc/shadow uses MD5 hash with a like 
like "$1$x6vJEXyc$" (MD5 marker + salt)
 - user identifier: integer (ASCII)
 - main group identifier: integer (ASCII)
 - GECOS: user text
 - shell: filename
 - home directory: filename

We can expect GECOS and filenames to be encoded in the "default system 
locale" (eg. latin-1 or UTF-8). An user is allowed to change its GECOS 
field. If the user account use a different locale and set a non-ASCII 
GECOS, decoding the string (to unicode) will fail.

Your patch latin1.diff is wrong: the charset is not always latin-1 or 
always utf-8: it depends on the system default charset. You should use 
sys.getfilesystemencoding() or locale.getpreferredencoding() to get 
the right encoding. If you used latin-1 as automagic charset to get 
text as bytes, it's not the good solution: use the bytes type to get 
real bytes (as you implemented with your get*b() functions).

The situation is similar to the bytes/unicode filename debate (see 
issue #3187). I think that we can consider that a system correctly 
configured will use the same locale for all users accounts => use 
unicode. But for compatibility with old systems mixing different 
locales / or new system with locale problems => use bytes.

The default should be unicode, but we need to be able get all fields 
as bytes. Example:
  pwd.getpwnam(str) -> str fields (and integers for uid/gid)
  pwd.getpwnamb(bytes) -> bytes fields (and integers for uid/gid)

We have already bytes/unicode functions using the "b" suffix: 
os.getpwd()->str and os.getpwdb()->bytes.

Note: The GECOS field problem was already reported in issue #3023 (by 
baikie).
History
Date User Action Args
2009-01-07 11:30:42vstinnersetrecipients: + vstinner, loewis, baikie
2009-01-07 11:30:41vstinnersetmessageid: <1231327841.3.0.949817677105.issue4859@psf.upfronthosting.co.za>
2009-01-07 11:30:40vstinnerlinkissue4859 messages
2009-01-07 11:30:39vstinnercreate