This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pwd, spwd, grp functions vulnerable to denial of service
Type: security Stage:
Components: Extension Modules Versions: Python 3.0, Python 3.1
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: loewis Nosy List: baikie, loewis, vstinner
Priority: high Keywords: patch

Created on 2009-01-06 20:52 by baikie, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
bytes.diff baikie, 2009-01-06 20:52 Add functions to return bytes objects
minor.diff baikie, 2009-01-06 20:53 Fix refcount error and fd leak
latin1.diff baikie, 2009-01-06 20:54 Make pwd use Latin-1
surrogateescape-fields.diff baikie, 2009-05-24 18:07 Decode strings using surrogateescape
surrogateescape-args.diff baikie, 2009-05-24 18:08 Encode string arguments using surrogateescape
Messages (10)
msg79286 - (view) Author: David Watson (baikie) Date: 2009-01-06 20:52
The pwd (and spwd and grp) modules deal with data from
/etc/passwd (and/or other sources) that can be supplied by users
on the system.  Specifically, users can often change the data in
their GECOS fields without the OS requiring that it conform to a
specific encoding, and given some automated account signup
system, it's conceivable that arbitrary data could even be placed
in the username field.

This causes a problem since the functions in these modules try to
decode the data into str objects, and if a user has placed data
in /etc/passwd, say, that does not conform to the relevant
encoding, the function will raise UnicodeDecodeError and thus
prevent the program from learning the relevant mapping between
username and UID, etc. (or crash the program if it wasn't
expecting this).  For a system program written in Python, this
can amount to a denial of service attack, especially if the
program uses the get*all() functions.

Currently, the pwd module tries to decode the string fields using
the Unicode-escape codec, i.e. like a Python string literal, and
this can fail when given an invalid backslash escape.  You can
see this by running chfn(1), entering something like "\ux" in one
of the fields, and then calling pwd.getpwnam(yourname) or
pwd.getpwall().  Perhaps the use of this codec is a mistake,
given that spwd and grp decode the string fields as UTF-8, but
chfn could also be used to enter non-UTF-8 data in the GECOS
field.  You can see similar failures in the grp and spwd modules
after adding a user with a non-UTF-8 name (do something like
"useradd $'\xff'" in bash).

A debug build of Python also reports a reference counting error
in grp (count goes to -1) when its functions fail on non-UTF-8
data; what I think is going on is that in mkgrent(),
PyStructSequence_SET_ITEM steals the reference to "w", meaning
the second "Py_DECREF(w)" shouldn't be there.  Also, getpwall()
and getgrall() leave file descriptors open when they fail, since
they don't call end*ent() in this case.  The attached minor.diff
fixes both of these problems, I think.

I've also written a patch (bytes.diff, attached) that would add
new functions pwd.getpwnamb(), etc. (analogous to os.getcwdb())
to return bytes objects for the text fields, thus avoiding these
problems - what do you think?  The patch also makes pwd's
original string functions use UTF-8 like the other modules.

Alternatively or in addition, a quick "fix" for the GECOS problem
might be for the pwd module to decode the text fields as Latin-1,
since in the absence of backslash escapes this is what the
Unicode-escape encoding is equivalent to.  This would at least
block any DoS attempts using the GECOS field (or attempts to add
extra commas with \x2c, etc.) without changing the behaviour
much.  The attached latin1.diff does this.
msg79316 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-01-07 10:16
Any decision on this issue should be deferred until a PEP has been
written and accepted that decides on usage of bytes in Unix APIs.
msg79319 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-07 11:02
baikie: Open a separated issue for the refcount error and fd leak.
msg79320 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-07 11:05
> it's conceivable that arbitrary data could even be 
> placed in the username field.

On Ubuntu, it's not possible to create an user with a non-ASCII name:

$ sudo adduser é --no-create-home
adduser: To avoid problems, the username should consist only of
letters, digits, underscores, periods, at signs and dashes, and not
start with a dash (as defined by IEEE Std 1003.1-2001). For
compatibility with Samba machine accounts $ is also supported at 
the end of the username
msg79322 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-07 11:30
About pwd, we have 7 fields:
 - username: the regex looks like « [a-zA-Z0-9_.@]
[a-zA-Z0-9_.@\/]*$? », so it's ASCII only
 - password: ASCII only? on my Ubuntu, /etc/passwd uses "x" for all 
passwords, and /etc/shadow uses MD5 hash with a like 
like "$1$x6vJEXyc$" (MD5 marker + salt)
 - user identifier: integer (ASCII)
 - main group identifier: integer (ASCII)
 - GECOS: user text
 - shell: filename
 - home directory: filename

We can expect GECOS and filenames to be encoded in the "default system 
locale" (eg. latin-1 or UTF-8). An user is allowed to change its GECOS 
field. If the user account use a different locale and set a non-ASCII 
GECOS, decoding the string (to unicode) will fail.

Your patch latin1.diff is wrong: the charset is not always latin-1 or 
always utf-8: it depends on the system default charset. You should use 
sys.getfilesystemencoding() or locale.getpreferredencoding() to get 
the right encoding. If you used latin-1 as automagic charset to get 
text as bytes, it's not the good solution: use the bytes type to get 
real bytes (as you implemented with your get*b() functions).

The situation is similar to the bytes/unicode filename debate (see 
issue #3187). I think that we can consider that a system correctly 
configured will use the same locale for all users accounts => use 
unicode. But for compatibility with old systems mixing different 
locales / or new system with locale problems => use bytes.

The default should be unicode, but we need to be able get all fields 
as bytes. Example:
  pwd.getpwnam(str) -> str fields (and integers for uid/gid)
  pwd.getpwnamb(bytes) -> bytes fields (and integers for uid/gid)

We have already bytes/unicode functions using the "b" suffix: 
os.getpwd()->str and os.getpwdb()->bytes.

Note: The GECOS field problem was already reported in issue #3023 (by 
baikie).
msg79323 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2009-01-07 11:33
> For a system program written in Python, this
> can amount to a denial of service attack, especially 
> if the program uses the get*all() functions

I don't think that it can be called a "denial of service attack".
msg79376 - (view) Author: David Watson (baikie) Date: 2009-01-07 22:27
> baikie: Open a separated issue for the refcount error and fd leak.

OK.  It does affect 2.x as well, come to think of it.

> On Ubuntu, it's not possible to create an user with a non-ASCII
> name:
>
> $ sudo adduser é --no-create-home
>
> adduser: To avoid problems, the username should consist only of...

Well, good for Ubuntu :)  But you can still add one with the
lower-level useradd command, and not everyone uses Ubuntu.

> Your patch latin1.diff is wrong

Yes, I know it's "wrong" - I just thought of it as a stopgap
measure until some sort of bytes functionality is added (since
pwd already decodes everything as Latin-1, but tries to interpret
backslash escapes).  But yeah, if it's going to be changed later,
then I suppose there's not much point.

> I don't think that it can be called a "denial of service attack".

It depends on how the program uses these functions.  Obviously
Python itself is only vulnerable to a DoS if the interpreter
crashes or something, but what I'm saying is that there should be
a way for Python programs to access the password database that is
not subject to denial of service attacks.  If someone changes
their GECOS field they can make pwd.getpwall() fail for another
user's program, and if the program relies on pwd.getpwall()
working, then that's a DoS.
msg88273 - (view) Author: David Watson (baikie) Date: 2009-05-24 18:07
Patch to make pwd, spwd and grp decode their string fields using
the file system encoding and the "surrogateescape" error handler,
as per PEP 383.
msg88274 - (view) Author: David Watson (baikie) Date: 2009-05-24 18:08
Patch to make get*nam() functions encode their arguments using
the file system encoding and "surrogateescape" error handler, so
that they correctly handle the user/group name fields returned by
each other.
msg88511 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-29 15:23
Thanks for the patches. Committed as r73015.
History
Date User Action Args
2022-04-11 14:56:43adminsetgithub: 49109
2009-05-29 15:23:38loewissetstatus: open -> closed
resolution: accepted
messages: + msg88511
2009-05-26 17:46:57loewissetpriority: high
assignee: loewis
2009-05-24 18:08:56baikiesetfiles: + surrogateescape-args.diff

messages: + msg88274
2009-05-24 18:07:27baikiesetfiles: + surrogateescape-fields.diff

messages: + msg88273
2009-01-07 22:27:08baikiesetmessages: + msg79376
2009-01-07 11:33:29vstinnersetmessages: + msg79323
2009-01-07 11:30:40vstinnersetmessages: + msg79322
2009-01-07 11:05:47vstinnersetmessages: + msg79320
2009-01-07 11:02:27vstinnersetnosy: + vstinner
messages: + msg79319
2009-01-07 10:16:48loewissetnosy: + loewis
messages: + msg79316
2009-01-06 20:54:18baikiesetfiles: + latin1.diff
2009-01-06 20:53:32baikiesetfiles: + minor.diff
2009-01-06 20:52:30baikiecreate