Message 79286 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	baikie
Recipients	baikie
Date	2009-01-06.20:52:24
SpamBayes Score	0.00023534856
Marked as misclassified	No
Message-id	<1231275151.13.0.805176401894.issue4859@psf.upfronthosting.co.za>
In-reply-to

Content
The pwd (and spwd and grp) modules deal with data from /etc/passwd (and/or other sources) that can be supplied by users on the system. Specifically, users can often change the data in their GECOS fields without the OS requiring that it conform to a specific encoding, and given some automated account signup system, it's conceivable that arbitrary data could even be placed in the username field. This causes a problem since the functions in these modules try to decode the data into str objects, and if a user has placed data in /etc/passwd, say, that does not conform to the relevant encoding, the function will raise UnicodeDecodeError and thus prevent the program from learning the relevant mapping between username and UID, etc. (or crash the program if it wasn't expecting this). For a system program written in Python, this can amount to a denial of service attack, especially if the program uses the getall() functions. Currently, the pwd module tries to decode the string fields using the Unicode-escape codec, i.e. like a Python string literal, and this can fail when given an invalid backslash escape. You can see this by running chfn(1), entering something like "\ux" in one of the fields, and then calling pwd.getpwnam(yourname) or pwd.getpwall(). Perhaps the use of this codec is a mistake, given that spwd and grp decode the string fields as UTF-8, but chfn could also be used to enter non-UTF-8 data in the GECOS field. You can see similar failures in the grp and spwd modules after adding a user with a non-UTF-8 name (do something like "useradd $'\xff'" in bash). A debug build of Python also reports a reference counting error in grp (count goes to -1) when its functions fail on non-UTF-8 data; what I think is going on is that in mkgrent(), PyStructSequence_SET_ITEM steals the reference to "w", meaning the second "Py_DECREF(w)" shouldn't be there. Also, getpwall() and getgrall() leave file descriptors open when they fail, since they don't call endent() in this case. The attached minor.diff fixes both of these problems, I think. I've also written a patch (bytes.diff, attached) that would add new functions pwd.getpwnamb(), etc. (analogous to os.getcwdb()) to return bytes objects for the text fields, thus avoiding these problems - what do you think? The patch also makes pwd's original string functions use UTF-8 like the other modules. Alternatively or in addition, a quick "fix" for the GECOS problem might be for the pwd module to decode the text fields as Latin-1, since in the absence of backslash escapes this is what the Unicode-escape encoding is equivalent to. This would at least block any DoS attempts using the GECOS field (or attempts to add extra commas with \x2c, etc.) without changing the behaviour much. The attached latin1.diff does this.

The pwd (and spwd and grp) modules deal with data from
/etc/passwd (and/or other sources) that can be supplied by users
on the system.  Specifically, users can often change the data in
their GECOS fields without the OS requiring that it conform to a
specific encoding, and given some automated account signup
system, it's conceivable that arbitrary data could even be placed
in the username field.

This causes a problem since the functions in these modules try to
decode the data into str objects, and if a user has placed data
in /etc/passwd, say, that does not conform to the relevant
encoding, the function will raise UnicodeDecodeError and thus
prevent the program from learning the relevant mapping between
username and UID, etc. (or crash the program if it wasn't
expecting this).  For a system program written in Python, this
can amount to a denial of service attack, especially if the
program uses the get*all() functions.

Currently, the pwd module tries to decode the string fields using
the Unicode-escape codec, i.e. like a Python string literal, and
this can fail when given an invalid backslash escape.  You can
see this by running chfn(1), entering something like "\ux" in one
of the fields, and then calling pwd.getpwnam(yourname) or
pwd.getpwall().  Perhaps the use of this codec is a mistake,
given that spwd and grp decode the string fields as UTF-8, but
chfn could also be used to enter non-UTF-8 data in the GECOS
field.  You can see similar failures in the grp and spwd modules
after adding a user with a non-UTF-8 name (do something like
"useradd $'\xff'" in bash).

A debug build of Python also reports a reference counting error
in grp (count goes to -1) when its functions fail on non-UTF-8
data; what I think is going on is that in mkgrent(),
PyStructSequence_SET_ITEM steals the reference to "w", meaning
the second "Py_DECREF(w)" shouldn't be there.  Also, getpwall()
and getgrall() leave file descriptors open when they fail, since
they don't call end*ent() in this case.  The attached minor.diff
fixes both of these problems, I think.

I've also written a patch (bytes.diff, attached) that would add
new functions pwd.getpwnamb(), etc. (analogous to os.getcwdb())
to return bytes objects for the text fields, thus avoiding these
problems - what do you think?  The patch also makes pwd's
original string functions use UTF-8 like the other modules.

Alternatively or in addition, a quick "fix" for the GECOS problem
might be for the pwd module to decode the text fields as Latin-1,
since in the absence of backslash escapes this is what the
Unicode-escape encoding is equivalent to.  This would at least
block any DoS attempts using the GECOS field (or attempts to add
extra commas with \x2c, etc.) without changing the behaviour
much.  The attached latin1.diff does this.

History
Date	User	Action	Args
2009-01-06 20:52:32	baikie	set	recipients: + baikie
2009-01-06 20:52:31	baikie	set	messageid: <1231275151.13.0.805176401894.issue4859@psf.upfronthosting.co.za>
2009-01-06 20:52:30	baikie	link	issue4859 messages
2009-01-06 20:52:28	baikie	create