This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: os.path.splitext with more dots
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.10
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Malcolm Smith, eryksun, iritkatriel, lys.nikolaou, miss-islington, pitrou, serhiy.storchaka, xnovakj, xtreak
Priority: normal Keywords: patch

Created on 2018-10-08 12:06 by xnovakj, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 30347 merged iritkatriel, 2022-01-03 00:30
PR 30368 merged miss-islington, 2022-01-03 20:10
PR 30369 merged miss-islington, 2022-01-03 20:10
Messages (13)
msg327346 - (view) Author: Jan Novak (xnovakj) Date: 2018-10-08 12:06
There are some old tickets about changing splitext() in 2007:
https://bugs.python.org/issue1115886
https://bugs.python.org/issue1681842

Present python documentation:
Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').
Changed in version 2.6: Earlier versions could produce an empty root when the only period was the first character.

But nobody take care about more than one dots:
For example this possible corect filenames:

>>> os.path.splitext('....jpg')
('....jpg', '')

So present function is insuficient (buggy) to use to detect right extension.
Maybe new parameter would be helpfull for that?
Like parameter "preserve_dotfiles" discussed in 2007.

And what to do with the wrong '.', '..', '...', ... filenames?
msg328722 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2018-10-28 18:23
IMHO this is not a bug. Every file starting with a dot is hidden by default, so this is somewhat correct behaviour. Since it is documented correctly I don't this something needs to change here.

Yet again, could someone more experienced offer their opinion as well?
msg328810 - (view) Author: Jan Novak (xnovakj) Date: 2018-10-29 11:55
Yes, dot behaviour is well documented.

But splitext() is a function to split name and extension, not take care about hidden files.

Hidden files starting with dot is Unix like system feature.
https://en.wikipedia.org/wiki/Hidden_file_and_hidden_directory

Hidden files could have also extension.
Try for example create/rename .somename.jpg file in Ubuntu.
It is normal image with .jpg extension. You could open it etc.

You can't use standard python splitext() function to detect extension.

I know that change this standard python function is probably imposible due to backward compatibility. Bud extend it with new parameter could be possible. The change in 2007 was unfortunate/incomplete.
msg328820 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-10-29 12:31
Related issues: issue536120, issue1115886, issue1462106, issue1681842, issue19191.

Python-Dev discussions:

    https://mail.python.org/pipermail/python-dev/2007-March/071557.html
    https://mail.python.org/pipermail/python-dev/2007-March/071935.html
msg370262 - (view) Author: Malcolm Smith (Malcolm Smith) Date: 2020-05-28 18:59
> Try for example create/rename .somename.jpg file in Ubuntu.
> It is normal image with .jpg extension. You could open it etc.
>
> You can't use standard python splitext() function to detect extension.

Yes you can (Python 3.8.2):

>>> splitext(".somename.jpg")
('.somename', '.jpg')

Only leading dots are treated specially. If there are non-leading dots later in the name, then the filename will be split at the last one.

So the only remaining question is weird filenames like ".....jpg". There's no way to know whether that's supposed to be a name with an extension, or a dot-file with multiple leading dots. Either choice would be reasonable, but we've already gone with the second one. 

Maybe sometimes you might prefer the other way, but such filenames are so rare that I can't imagine we'd ever add an option for this. So I think this issue can be closed.
msg409569 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-03 08:44
There are other issues with the documentation of splitext().

1. It uses term "extension" (it is even a part of function name), but it is vague and usually does not include a period. On Windows the extension of "python.exe" is "exe", not ".exe". On Unix term "suffix" is commonly used, ".exe" is a suffix. It is also used in pathlib. I suggest to replace "extension" with "suffix".

2. It is not specified that only the part of the last path component is included in the suffix, and leading periods of the last path component are ignored, not just leading periods of the path. So splitext('mail.dir/') == ('mail.dir/', '') and splitext('/home/user/.etc') is ('/home/user/.etc', ''). It is not documented that splitext() works with multi-component paths at all.
msg409575 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2022-01-03 10:35
> On Windows the extension of "python.exe" is "exe", not ".exe". 

FWIW, a file extension in Windows includes the dot. Trailing dots are stripped from filenames, so a file can't be named "python." because it's the same as just "python". (It's possible to prevent stripping trailing dots from a name by using a \\?\ literal path, but creating such a filename is a bad idea.) 

The shell API's file associations include the dot in the file extension. Also, the PATHEXT environment variable (i.e. the list of extensions that a CLI shell should try appending in a PATH search) includes the dot in each extension. In both of the latter cases, an extension that's just "." matches a filename that has no extension. In other words, the "." file extension can be used to associate files that have no extension with a ProgID (i.e. a programmatic identifier, which defines properties and actions for a file type), and adding "." to PATHEXT includes files that have no extension in a PATH search.

To clarify further, here are some results from PathCchFindExtension() [1]:

    import ctypes
    path = ctypes.OleDLL('api-ms-win-core-path-l1-1-0')
    s = (ctypes.c_wchar * 100)()
    ext = ctypes.c_wchar_p()

    >>> s.value = 'python.exe'
    >>> _ = path.PathCchFindExtension(s, len(s), ctypes.byref(ext))
    >>> ext.value
    '.exe'

    >>> s.value = '...exe'
    >>> _ = path.PathCchFindExtension(s, len(s), ctypes.byref(ext))
    >>> ext.value
    '.exe'

    >>> s.value = 'python.'
    >>> _ = path.PathCchFindExtension(s, len(s), ctypes.byref(ext))
    >>> ext.value
    '.'

---
[1] https://docs.microsoft.com/en-us/windows/win32/api/pathcch/nf-pathcch-pathcchfindextension
msg409576 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2022-01-03 11:27
Well, so we can keep term "extension". But I think it is worth to clarify that "leading periods" is related to the last component, not the whole path. It is related to the original issue.
msg409615 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-03 20:10
New changeset 51700bf08b0dd4baf998440b2ebfaa488a2855ba by Irit Katriel in branch 'main':
bpo-34931: [doc] clarify behavior of os.path.splitext() on paths with multiple leading periods (GH-30347)
https://github.com/python/cpython/commit/51700bf08b0dd4baf998440b2ebfaa488a2855ba
msg409617 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-03 20:36
New changeset 8184a613b93d54416b954e667951cdf3d069cc13 by Miss Islington (bot) in branch '3.10':
bpo-34931: [doc] clarify behavior of os.path.splitext() on paths with multiple leading periods (GH-30347) (GH-30368)
https://github.com/python/cpython/commit/8184a613b93d54416b954e667951cdf3d069cc13
msg409618 - (view) Author: Irit Katriel (iritkatriel) * (Python committer) Date: 2022-01-03 20:39
New changeset 4a792ca95c1a994b07d18fe06e2104d5b1e0b796 by Miss Islington (bot) in branch '3.9':
bpo-34931: [doc] clarify behavior of os.path.splitext() on paths with multiple leading periods (GH-30347) (GH-30369)
https://github.com/python/cpython/commit/4a792ca95c1a994b07d18fe06e2104d5b1e0b796
msg409622 - (view) Author: Jan Novak (xnovakj) Date: 2022-01-03 21:53
Thank you all for discussion and partial solution in latest Python versions and extending documentation.

For the future development of Python the initial question remains.
How to easy detect extensions for each file with standard python library function. Without programing own function to fix it.

Filenames with more dots could exist both in Unix and Windows worlds.
Nobody can't say (for example web app users). Please not use those files.

Python 3.10.1
Works fine:
>>> os.path.splitext('.some.jpg')
('.some', '.jpg')
>>> os.path.splitext('..some.jpg')
('..some', '.jpg')

Not usable:
>>> os.path.splitext('....jpg')
('....jpg', '')

There are some possible ways:
- new parametr
- new function
- change backward compatibility
- stay buggy forever

Thank you
msg409625 - (view) Author: Jan Novak (xnovakj) Date: 2022-01-03 22:25
It is interesting that pathlib.Path works fine:

>>> pathlib.Path('....jpg').suffix
'.jpg'
>>> pathlib.Path('path/....jpg').suffix
'.jpg'
History
Date User Action Args
2022-04-11 14:59:06adminsetgithub: 79112
2022-01-03 23:23:53iritkatrielsetresolution: fixed ->
components: + Library (Lib)
2022-01-03 22:25:37xnovakjsetmessages: + msg409625
2022-01-03 21:53:03xnovakjsetstatus: closed -> open

messages: + msg409622
versions: + Python 3.10
2022-01-03 20:40:02iritkatrielsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2022-01-03 20:39:12iritkatrielsetmessages: + msg409618
2022-01-03 20:36:49iritkatrielsetmessages: + msg409617
2022-01-03 20:10:37miss-islingtonsetpull_requests: + pull_request28582
2022-01-03 20:10:32miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request28581
2022-01-03 20:10:18iritkatrielsetmessages: + msg409615
2022-01-03 11:27:39serhiy.storchakasetmessages: + msg409576
2022-01-03 10:35:46eryksunsetnosy: + eryksun
messages: + msg409575
2022-01-03 08:44:42serhiy.storchakasetnosy: + pitrou
messages: + msg409569
2022-01-03 00:30:37iritkatrielsetkeywords: + patch
nosy: + iritkatriel

pull_requests: + pull_request28559
stage: patch review
2020-05-28 18:59:05Malcolm Smithsetnosy: + Malcolm Smith
messages: + msg370262
2018-10-29 12:31:32serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg328820
2018-10-29 11:55:07xnovakjsetmessages: + msg328810
2018-10-28 18:23:31lys.nikolaousetnosy: + lys.nikolaou
messages: + msg328722
2018-10-08 13:59:44xtreaksetnosy: + xtreak
2018-10-08 12:06:01xnovakjcreate