classification
Title: os.path.normpath change some characters of a path into kinda 'hex number'
Type: behavior Stage: resolved
Components: Versions: Python 3.5
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: Yugi, eryksun, terry.reedy, xtreak
Priority: normal Keywords:

Created on 2019-08-24 14:54 by Yugi, last changed 2019-08-30 20:51 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
Screenshot from 2019-08-24 21-41-52.png Yugi, 2019-08-24 14:54 screenshot for system information and terminal output
Messages (5)
msg350371 - (view) Author: (Yugi) Date: 2019-08-24 14:54
I was trying to handle path to work on both '/' and '\' but when I tried to run the code like they said on: https://stackoverflow.com/questions/3167154/how-to-split-a-dos-path-into-its-components-in-python
I have no idea why the terminal on my PC doesnt have the same output like everybody was discussing at the time the questions and answers were posted.
OS: ubuntu 16.04 LTS, Intel Core i5-7500, 16GB/1TB, Intel HD Graphics 630
python version: 3.5.2
I borrowed a mac pro 2015 to check if it had the same output like my PC but it had not. my friend has python 3.7.1 installed and the output is: ['d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt'] (on my PC, it is: ['d:\\stuff\\morestuff\x0curtherdown\\THEFILE.txt']). I'm totally new to Python and I'm very sorry if this issue is already reported. Thank you!
msg350373 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2019-08-24 15:07
I guess '\f' translates to \x0c and using raw string helps with this.

>>> ord('\f')
12
>>> '\f'
'\x0c'
>>> var = "d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> var
'd:\\stuff\\morestuff\x0curtherdown\\THEFILE.txt'
>>> print(os.path.normpath(var))
d:\stuff\morestuff
                  urtherdown\THEFILE.txt
>>> os.path.normpath(var)
'd:\\stuff\\morestuff\x0curtherdown\\THEFILE.txt'

# Use raw string

>>> var = r"d:\stuff\morestuff\furtherdown\THEFILE.txt"
>>> var
'd:\\stuff\\morestuff\\furtherdown\\THEFILE.txt'
>>> print(os.path.normpath(var))
d:\stuff\morestuff\furtherdown\THEFILE.txt
>>> os.path.normpath(var)
'd:\\stuff\\morestuff\\furtherdown\\THEFILE.txt'

# Or escape back slashes

>>> var = "d:\\stuff\\morestuff\\furtherdown\\THEFILE.txt"
>>> var
'd:\\stuff\\morestuff\\furtherdown\\THEFILE.txt'
>>> print(os.path.normpath(var))
d:\stuff\morestuff\furtherdown\THEFILE.txt
msg350382 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-24 17:31
As Karthikeyan noted, in a regular string literal, backslash is an escape character that's used in the following escape sequences:

    \N{name}   : named character
    \UXXXXXXXX : 32-bit hexadecimal ordinal (e.g. \U0010ffff)
    \uXXXX     : 16-bit hexadecimal ordinal (e.g. \uffff)
    \xXX       : 8-bit hexadecimal ordinal (e.g. \xff)
    \OOO       : 9-bit octal ordinal (e.g. \777)
    \OO        : 6-bit octal ordinal (e.g. \77)
    \O         : 3-bit octal ordinal (e.g. \7)

    \a : \x07, \N{BEL}, \N{ALERT}
    \b : \x08, \N{BS}, \N{BACKSPACE}
    \t : \x09, \N{HT}, \N{TAB}, \N{CHARACTER TABULATION}, \N{HORIZONTAL TABULATION}
    \n : \x0a, \N{LF}, \N{NL}, \N{LINE FEED}, \N{NEW LINE}
    \v : \x0b, \N{VT}, \N{LINE TABULATION}, \N{VERTICAL TABULATION}
    \f : \x0c, \N{FF}, \N{FORM FEED}
    \r : \x0d, \N{CR}, \N{CARRIAGE RETURN}
    \" : \x22, \N{QUOTATION MARK}
    \' : \x27, \N{APOSTROPHE}
    \\ : \x5c, \N{REVERSE SOLIDUS}

For a Windows path, either we can use a normal string literal with backslash path separators escaped by doubling them or we can use a raw string literal. 

One corner case with a raw string literal is that it can't end with an odd number of backslashes. We can address this in one of two ways. Either rely on the compiler's implicit concatenation of string literals, or rely on the system's path normalization to collapse multiple path separators (except at the beginning of a path). For example:

    >>> print(r'C:\Users' '\\')
    C:\Users\
    >>> print(r'C:\Users\\')
    C:\Users\\

The system normalizes the second case to collapse repeated backslashes. For example:

    >>> print(os.path._getfullpathname(r'C:\Users\\'))
    C:\Users\
    >>> os.path.samefile(r'C:\Users\\', r'C:\Users' '\\')
    True

We can also use forward slash as the path separator for file-system paths (but not registry paths), such as paths that we're passing to open() or os functions. I don't recommend this if a file-system path is to be passed as a command-line argument. Some programs use forward slash as a switch for command-line options. In this case first normalize the path via os.path.normpath, or via replace('/', '\\').

In some cases a path may be returned to us in Windows with a "\\?\" prefix (backslash only), which is sometimes referred to as an extended path. (More specifically, it's a native path in the device namespace.) This tells the Windows API to skip path normalization. If a path begins with exactly this prefix, then appending components to it with forward slash results in a path that will not work. Use os.path.join, or normalize the path via os.path.normpath to ensure the final path uses only backslash as the path separator.
msg350386 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2019-08-24 17:58
>    \N{name}   : named character
>    \UXXXXXXXX : 32-bit hexadecimal ordinal (e.g. \U0010ffff)
>    \uXXXX     : 16-bit hexadecimal ordinal (e.g. \uffff)
>    \xXX       : 8-bit hexadecimal ordinal (e.g. \xff)
>    \OOO       : 9-bit octal ordinal (e.g. \777)
>    \OO        : 6-bit octal ordinal (e.g. \77)
>    \O         : 3-bit octal ordinal (e.g. \7)

Note that bytes literals do not implement \N, \U, and \u escape sequences -- e.g. b'\N{SPACE}' is literally just those 9 bytes, not b' '. Also, in bytes literals 9-bit octal sequences wrap around for the [256, 511] range -- e.g. b'\400' ==  b'\000' == b'\x00' and b'\777' == b'\377' == b'\xff'. I don't know whether the latter is intentional. I'd prefer for the compiler to raise a syntax error in this case. Asking for a byte value in the range [256, 511] is nonsense.
msg350887 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-08-30 20:51
3.5 (and 3.6) only gets security fixes.  From the report, the bug is fixed in 3.7.

FWIW, I agree about the 9-bit octal thing.  There is another issue about this.
History
Date User Action Args
2019-08-30 20:51:23terry.reedysetstatus: open -> closed

nosy: + terry.reedy
messages: + msg350887

resolution: out of date
stage: resolved
2019-08-24 17:58:58eryksunsetmessages: + msg350386
2019-08-24 17:31:18eryksunsetnosy: + eryksun
messages: + msg350382
2019-08-24 15:07:55xtreaksetnosy: + xtreak
messages: + msg350373
2019-08-24 14:54:52Yugicreate