Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os.path.normpath doesn't preserve unicode #50077

Closed
mgiuca mannequin opened this issue Apr 24, 2009 · 7 comments
Closed

os.path.normpath doesn't preserve unicode #50077

mgiuca mannequin opened this issue Apr 24, 2009 · 7 comments
Assignees
Labels
stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@mgiuca
Copy link
Mannequin

mgiuca mannequin commented Apr 24, 2009

BPO 5827
Nosy @loewis, @ezio-melotti
Files
  • normpath.patch: (Obsolete) Fix for posixpath.normpath and ntpath.normpath.
  • normpath.2.patch: Fix for posixpath.normpath and ntpath.normpath.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/ezio-melotti'
    closed_at = <Date 2010-01-12.04:07:34.664>
    created_at = <Date 2009-04-24.04:23:57.878>
    labels = ['type-bug', 'library', 'expert-unicode']
    title = "os.path.normpath doesn't preserve unicode"
    updated_at = <Date 2010-01-12.04:07:34.663>
    user = 'https://bugs.python.org/mgiuca'

    bugs.python.org fields:

    activity = <Date 2010-01-12.04:07:34.663>
    actor = 'ezio.melotti'
    assignee = 'ezio.melotti'
    closed = True
    closed_date = <Date 2010-01-12.04:07:34.664>
    closer = 'ezio.melotti'
    components = ['Library (Lib)', 'Unicode']
    creation = <Date 2009-04-24.04:23:57.878>
    creator = 'mgiuca'
    dependencies = []
    files = ['13757', '15362']
    hgrepos = []
    issue_num = 5827
    keywords = ['patch']
    message_count = 7.0
    messages = ['86395', '95223', '95459', '95490', '95493', '95495', '97621']
    nosy_count = 5.0
    nosy_names = ['loewis', 'kcwu', 'ezio.melotti', 'mgiuca', 'sandberg']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue5827'
    versions = ['Python 2.6', 'Python 2.7']

    @mgiuca
    Copy link
    Mannequin Author

    mgiuca mannequin commented Apr 24, 2009

    In the Python 2.x branch, os.path.normpath will sometimes return a str
    even if given a unicode. This is not an issue in the Python 3.0 branch.

    This happens specifically when it throws away all string data and
    constructs its own:

    >>> os.path.normpath(u'')
    '.'
    >>> os.path.normpath(u'.')
    '.'
    >>> os.path.normpath(u'/')
    '/'

    This is a problem if working with code which expects all strings to be
    unicode strings (sometimes, functions raise exceptions if given a str,
    when expecting a unicode).

    I have attached patches (with test cases) for posixpath and ntpath which
    correctly preserve the unicode-ness of the input string, such that the
    new behaviour is:

    >>> os.path.normpath(u'')
    u'.'
    >>> os.path.normpath(u'.')
    u'.'
    >>> os.path.normpath(u'/')
    u'/'

    I tried it on os2emxpath and plat-riscos/riscospath (the other two
    OS-specific path modules), and it already worked fine for them.
    Therefore, this patch fixes all necessary OS-specific versions of os.path.

    @mgiuca mgiuca mannequin added stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error labels Apr 24, 2009
    @ezio-melotti
    Copy link
    Member

    Thanks for the patch, I tried it on Linux and it seems to solve the problem.

    A few comments about it:

    1. I'd change all the self.assertEqual(type(posixpath.normpath(u"")),
      unicode) to self.assertTrue(isinstance(posixpath.normpath(u""), unicode));
    2. a test for normpath(u'.') should probably be added;
    3. in ntpath.py the 'slash' is actually a backslash, so the name of the
      var should be changed;

    @mgiuca
    Copy link
    Mannequin Author

    mgiuca mannequin commented Nov 19, 2009

    Thanks Ezio.

    I've updated the patch to incorporate your suggestions.

    Note that I too have only tested it on Linux, but I tested both
    posixpath and ntpath (and there is no OS-specific code, except for the
    filenames themselves).

    I'm not sure if using assertTrue(isinstance ...) is better than
    assertEqual(type ...), because the type equality checking produces this
    error:
    AssertionError: <type 'str'> != <type 'unicode'>
    while isinstance produces this unhelpful error:
    AssertionError: False is not True

    But oh well, I made the change anyway as most test cases use isinstance.

    @ezio-melotti
    Copy link
    Member

    assertTrue() also accepts a 'msg' argument where to explain what went
    wrong in case of failure 1.

    @sandberg
    Copy link
    Mannequin

    sandberg mannequin commented Nov 19, 2009

    Also, assertTrue has an alias failUnless which I personally find more
    descriptive (I don't know if either form is preferred for inclusion in
    Python though).

    @ezio-melotti
    Copy link
    Member

    failUnless is deprecated in Python3.1 1. The assert* methods are
    preferred over the fail* ones that are now deprecated.

    @ezio-melotti ezio-melotti self-assigned this Jan 7, 2010
    @ezio-melotti
    Copy link
    Member

    Fixed in r77442 (trunk) and r77443 (release26-maint), thanks!

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant