Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pwd, spwd, grp functions vulnerable to denial of service #49109

Closed
baikie mannequin opened this issue Jan 6, 2009 · 10 comments
Closed

pwd, spwd, grp functions vulnerable to denial of service #49109

baikie mannequin opened this issue Jan 6, 2009 · 10 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-security A security issue

Comments

@baikie
Copy link
Mannequin

baikie mannequin commented Jan 6, 2009

BPO 4859
Nosy @loewis, @vstinner
Files
  • bytes.diff: Add functions to return bytes objects
  • minor.diff: Fix refcount error and fd leak
  • latin1.diff: Make pwd use Latin-1
  • surrogateescape-fields.diff: Decode strings using surrogateescape
  • surrogateescape-args.diff: Encode string arguments using surrogateescape
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/loewis'
    closed_at = <Date 2009-05-29.15:23:38.047>
    created_at = <Date 2009-01-06.20:52:30.365>
    labels = ['type-security', 'extension-modules']
    title = 'pwd, spwd, grp functions vulnerable to denial of service'
    updated_at = <Date 2009-05-29.15:23:38.046>
    user = 'https://bugs.python.org/baikie'

    bugs.python.org fields:

    activity = <Date 2009-05-29.15:23:38.046>
    actor = 'loewis'
    assignee = 'loewis'
    closed = True
    closed_date = <Date 2009-05-29.15:23:38.047>
    closer = 'loewis'
    components = ['Extension Modules']
    creation = <Date 2009-01-06.20:52:30.365>
    creator = 'baikie'
    dependencies = []
    files = ['12621', '12622', '12623', '14055', '14056']
    hgrepos = []
    issue_num = 4859
    keywords = ['patch']
    message_count = 10.0
    messages = ['79286', '79316', '79319', '79320', '79322', '79323', '79376', '88273', '88274', '88511']
    nosy_count = 3.0
    nosy_names = ['loewis', 'vstinner', 'baikie']
    pr_nums = []
    priority = 'high'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue4859'
    versions = ['Python 3.0', 'Python 3.1']

    @baikie
    Copy link
    Mannequin Author

    baikie mannequin commented Jan 6, 2009

    The pwd (and spwd and grp) modules deal with data from
    /etc/passwd (and/or other sources) that can be supplied by users
    on the system. Specifically, users can often change the data in
    their GECOS fields without the OS requiring that it conform to a
    specific encoding, and given some automated account signup
    system, it's conceivable that arbitrary data could even be placed
    in the username field.

    This causes a problem since the functions in these modules try to
    decode the data into str objects, and if a user has placed data
    in /etc/passwd, say, that does not conform to the relevant
    encoding, the function will raise UnicodeDecodeError and thus
    prevent the program from learning the relevant mapping between
    username and UID, etc. (or crash the program if it wasn't
    expecting this). For a system program written in Python, this
    can amount to a denial of service attack, especially if the
    program uses the get*all() functions.

    Currently, the pwd module tries to decode the string fields using
    the Unicode-escape codec, i.e. like a Python string literal, and
    this can fail when given an invalid backslash escape. You can
    see this by running chfn(1), entering something like "\ux" in one
    of the fields, and then calling pwd.getpwnam(yourname) or
    pwd.getpwall(). Perhaps the use of this codec is a mistake,
    given that spwd and grp decode the string fields as UTF-8, but
    chfn could also be used to enter non-UTF-8 data in the GECOS
    field. You can see similar failures in the grp and spwd modules
    after adding a user with a non-UTF-8 name (do something like
    "useradd $'\xff'" in bash).

    A debug build of Python also reports a reference counting error
    in grp (count goes to -1) when its functions fail on non-UTF-8
    data; what I think is going on is that in mkgrent(),
    PyStructSequence_SET_ITEM steals the reference to "w", meaning
    the second "Py_DECREF(w)" shouldn't be there. Also, getpwall()
    and getgrall() leave file descriptors open when they fail, since
    they don't call end*ent() in this case. The attached minor.diff
    fixes both of these problems, I think.

    I've also written a patch (bytes.diff, attached) that would add
    new functions pwd.getpwnamb(), etc. (analogous to os.getcwdb())
    to return bytes objects for the text fields, thus avoiding these
    problems - what do you think? The patch also makes pwd's
    original string functions use UTF-8 like the other modules.

    Alternatively or in addition, a quick "fix" for the GECOS problem
    might be for the pwd module to decode the text fields as Latin-1,
    since in the absence of backslash escapes this is what the
    Unicode-escape encoding is equivalent to. This would at least
    block any DoS attempts using the GECOS field (or attempts to add
    extra commas with \x2c, etc.) without changing the behaviour
    much. The attached latin1.diff does this.

    @baikie baikie mannequin added extension-modules C modules in the Modules dir type-security A security issue labels Jan 6, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jan 7, 2009

    Any decision on this issue should be deferred until a PEP has been
    written and accepted that decides on usage of bytes in Unix APIs.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2009

    baikie: Open a separated issue for the refcount error and fd leak.

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2009

    it's conceivable that arbitrary data could even be
    placed in the username field.

    On Ubuntu, it's not possible to create an user with a non-ASCII name:

    $ sudo adduser é --no-create-home
    adduser: To avoid problems, the username should consist only of
    letters, digits, underscores, periods, at signs and dashes, and not
    start with a dash (as defined by IEEE Std 1003.1-2001). For
    compatibility with Samba machine accounts $ is also supported at 
    the end of the username

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2009

    About pwd, we have 7 fields:

    • username: the regex looks like « [a-zA-Z0-9_.@]
      [a-zA-Z0-9_.@\/]*$? », so it's ASCII only
    • password: ASCII only? on my Ubuntu, /etc/passwd uses "x" for all
      passwords, and /etc/shadow uses MD5 hash with a like
      like "$1$x6vJEXyc$" (MD5 marker + salt)
    • user identifier: integer (ASCII)
    • main group identifier: integer (ASCII)
    • GECOS: user text
    • shell: filename
    • home directory: filename

    We can expect GECOS and filenames to be encoded in the "default system
    locale" (eg. latin-1 or UTF-8). An user is allowed to change its GECOS
    field. If the user account use a different locale and set a non-ASCII
    GECOS, decoding the string (to unicode) will fail.

    Your patch latin1.diff is wrong: the charset is not always latin-1 or
    always utf-8: it depends on the system default charset. You should use
    sys.getfilesystemencoding() or locale.getpreferredencoding() to get
    the right encoding. If you used latin-1 as automagic charset to get
    text as bytes, it's not the good solution: use the bytes type to get
    real bytes (as you implemented with your get*b() functions).

    The situation is similar to the bytes/unicode filename debate (see
    issue bpo-3187). I think that we can consider that a system correctly
    configured will use the same locale for all users accounts => use
    unicode. But for compatibility with old systems mixing different
    locales / or new system with locale problems => use bytes.

    The default should be unicode, but we need to be able get all fields
    as bytes. Example:
    pwd.getpwnam(str) -> str fields (and integers for uid/gid)
    pwd.getpwnamb(bytes) -> bytes fields (and integers for uid/gid)

    We have already bytes/unicode functions using the "b" suffix:
    os.getpwd()->str and os.getpwdb()->bytes.

    Note: The GECOS field problem was already reported in issue bpo-3023 (by
    baikie).

    @vstinner
    Copy link
    Member

    vstinner commented Jan 7, 2009

    For a system program written in Python, this
    can amount to a denial of service attack, especially
    if the program uses the get*all() functions

    I don't think that it can be called a "denial of service attack".

    @baikie
    Copy link
    Mannequin Author

    baikie mannequin commented Jan 7, 2009

    baikie: Open a separated issue for the refcount error and fd leak.

    OK. It does affect 2.x as well, come to think of it.

    On Ubuntu, it's not possible to create an user with a non-ASCII
    name:

    $ sudo adduser é --no-create-home

    adduser: To avoid problems, the username should consist only of...

    Well, good for Ubuntu :) But you can still add one with the
    lower-level useradd command, and not everyone uses Ubuntu.

    Your patch latin1.diff is wrong

    Yes, I know it's "wrong" - I just thought of it as a stopgap
    measure until some sort of bytes functionality is added (since
    pwd already decodes everything as Latin-1, but tries to interpret
    backslash escapes). But yeah, if it's going to be changed later,
    then I suppose there's not much point.

    I don't think that it can be called a "denial of service attack".

    It depends on how the program uses these functions. Obviously
    Python itself is only vulnerable to a DoS if the interpreter
    crashes or something, but what I'm saying is that there should be
    a way for Python programs to access the password database that is
    not subject to denial of service attacks. If someone changes
    their GECOS field they can make pwd.getpwall() fail for another
    user's program, and if the program relies on pwd.getpwall()
    working, then that's a DoS.

    @baikie
    Copy link
    Mannequin Author

    baikie mannequin commented May 24, 2009

    Patch to make pwd, spwd and grp decode their string fields using
    the file system encoding and the "surrogateescape" error handler,
    as per PEP-383.

    @baikie
    Copy link
    Mannequin Author

    baikie mannequin commented May 24, 2009

    Patch to make get*nam() functions encode their arguments using
    the file system encoding and "surrogateescape" error handler, so
    that they correctly handle the user/group name fields returned by
    each other.

    @loewis loewis mannequin self-assigned this May 26, 2009
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented May 29, 2009

    Thanks for the patches. Committed as r73015.

    @loewis loewis mannequin closed this as completed May 29, 2009
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant