msg327346 - (view) |
Author: Jan Novak (xnovakj) |
Date: 2018-10-08 12:06 |
There are some old tickets about changing splitext() in 2007:
https://bugs.python.org/issue1115886
https://bugs.python.org/issue1681842
Present python documentation:
Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').
Changed in version 2.6: Earlier versions could produce an empty root when the only period was the first character.
But nobody take care about more than one dots:
For example this possible corect filenames:
>>> os.path.splitext('....jpg')
('....jpg', '')
So present function is insuficient (buggy) to use to detect right extension.
Maybe new parameter would be helpfull for that?
Like parameter "preserve_dotfiles" discussed in 2007.
And what to do with the wrong '.', '..', '...', ... filenames?
|
msg328722 - (view) |
Author: Lysandros Nikolaou (lys.nikolaou) *  |
Date: 2018-10-28 18:23 |
IMHO this is not a bug. Every file starting with a dot is hidden by default, so this is somewhat correct behaviour. Since it is documented correctly I don't this something needs to change here.
Yet again, could someone more experienced offer their opinion as well?
|
msg328810 - (view) |
Author: Jan Novak (xnovakj) |
Date: 2018-10-29 11:55 |
Yes, dot behaviour is well documented.
But splitext() is a function to split name and extension, not take care about hidden files.
Hidden files starting with dot is Unix like system feature.
https://en.wikipedia.org/wiki/Hidden_file_and_hidden_directory
Hidden files could have also extension.
Try for example create/rename .somename.jpg file in Ubuntu.
It is normal image with .jpg extension. You could open it etc.
You can't use standard python splitext() function to detect extension.
I know that change this standard python function is probably imposible due to backward compatibility. Bud extend it with new parameter could be possible. The change in 2007 was unfortunate/incomplete.
|
msg328820 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-10-29 12:31 |
Related issues: issue536120, issue1115886, issue1462106, issue1681842, issue19191.
Python-Dev discussions:
https://mail.python.org/pipermail/python-dev/2007-March/071557.html
https://mail.python.org/pipermail/python-dev/2007-March/071935.html
|
msg370262 - (view) |
Author: Malcolm Smith (Malcolm Smith) |
Date: 2020-05-28 18:59 |
> Try for example create/rename .somename.jpg file in Ubuntu.
> It is normal image with .jpg extension. You could open it etc.
>
> You can't use standard python splitext() function to detect extension.
Yes you can (Python 3.8.2):
>>> splitext(".somename.jpg")
('.somename', '.jpg')
Only leading dots are treated specially. If there are non-leading dots later in the name, then the filename will be split at the last one.
So the only remaining question is weird filenames like ".....jpg". There's no way to know whether that's supposed to be a name with an extension, or a dot-file with multiple leading dots. Either choice would be reasonable, but we've already gone with the second one.
Maybe sometimes you might prefer the other way, but such filenames are so rare that I can't imagine we'd ever add an option for this. So I think this issue can be closed.
|
msg409569 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2022-01-03 08:44 |
There are other issues with the documentation of splitext().
1. It uses term "extension" (it is even a part of function name), but it is vague and usually does not include a period. On Windows the extension of "python.exe" is "exe", not ".exe". On Unix term "suffix" is commonly used, ".exe" is a suffix. It is also used in pathlib. I suggest to replace "extension" with "suffix".
2. It is not specified that only the part of the last path component is included in the suffix, and leading periods of the last path component are ignored, not just leading periods of the path. So splitext('mail.dir/') == ('mail.dir/', '') and splitext('/home/user/.etc') is ('/home/user/.etc', ''). It is not documented that splitext() works with multi-component paths at all.
|
msg409575 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2022-01-03 10:35 |
> On Windows the extension of "python.exe" is "exe", not ".exe".
FWIW, a file extension in Windows includes the dot. Trailing dots are stripped from filenames, so a file can't be named "python." because it's the same as just "python". (It's possible to prevent stripping trailing dots from a name by using a \\?\ literal path, but creating such a filename is a bad idea.)
The shell API's file associations include the dot in the file extension. Also, the PATHEXT environment variable (i.e. the list of extensions that a CLI shell should try appending in a PATH search) includes the dot in each extension. In both of the latter cases, an extension that's just "." matches a filename that has no extension. In other words, the "." file extension can be used to associate files that have no extension with a ProgID (i.e. a programmatic identifier, which defines properties and actions for a file type), and adding "." to PATHEXT includes files that have no extension in a PATH search.
To clarify further, here are some results from PathCchFindExtension() [1]:
import ctypes
path = ctypes.OleDLL('api-ms-win-core-path-l1-1-0')
s = (ctypes.c_wchar * 100)()
ext = ctypes.c_wchar_p()
>>> s.value = 'python.exe'
>>> _ = path.PathCchFindExtension(s, len(s), ctypes.byref(ext))
>>> ext.value
'.exe'
>>> s.value = '...exe'
>>> _ = path.PathCchFindExtension(s, len(s), ctypes.byref(ext))
>>> ext.value
'.exe'
>>> s.value = 'python.'
>>> _ = path.PathCchFindExtension(s, len(s), ctypes.byref(ext))
>>> ext.value
'.'
---
[1] https://docs.microsoft.com/en-us/windows/win32/api/pathcch/nf-pathcch-pathcchfindextension
|
msg409576 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2022-01-03 11:27 |
Well, so we can keep term "extension". But I think it is worth to clarify that "leading periods" is related to the last component, not the whole path. It is related to the original issue.
|
msg409615 - (view) |
Author: Irit Katriel (iritkatriel) *  |
Date: 2022-01-03 20:10 |
New changeset 51700bf08b0dd4baf998440b2ebfaa488a2855ba by Irit Katriel in branch 'main':
bpo-34931: [doc] clarify behavior of os.path.splitext() on paths with multiple leading periods (GH-30347)
https://github.com/python/cpython/commit/51700bf08b0dd4baf998440b2ebfaa488a2855ba
|
msg409617 - (view) |
Author: Irit Katriel (iritkatriel) *  |
Date: 2022-01-03 20:36 |
New changeset 8184a613b93d54416b954e667951cdf3d069cc13 by Miss Islington (bot) in branch '3.10':
bpo-34931: [doc] clarify behavior of os.path.splitext() on paths with multiple leading periods (GH-30347) (GH-30368)
https://github.com/python/cpython/commit/8184a613b93d54416b954e667951cdf3d069cc13
|
msg409618 - (view) |
Author: Irit Katriel (iritkatriel) *  |
Date: 2022-01-03 20:39 |
New changeset 4a792ca95c1a994b07d18fe06e2104d5b1e0b796 by Miss Islington (bot) in branch '3.9':
bpo-34931: [doc] clarify behavior of os.path.splitext() on paths with multiple leading periods (GH-30347) (GH-30369)
https://github.com/python/cpython/commit/4a792ca95c1a994b07d18fe06e2104d5b1e0b796
|
msg409622 - (view) |
Author: Jan Novak (xnovakj) |
Date: 2022-01-03 21:53 |
Thank you all for discussion and partial solution in latest Python versions and extending documentation.
For the future development of Python the initial question remains.
How to easy detect extensions for each file with standard python library function. Without programing own function to fix it.
Filenames with more dots could exist both in Unix and Windows worlds.
Nobody can't say (for example web app users). Please not use those files.
Python 3.10.1
Works fine:
>>> os.path.splitext('.some.jpg')
('.some', '.jpg')
>>> os.path.splitext('..some.jpg')
('..some', '.jpg')
Not usable:
>>> os.path.splitext('....jpg')
('....jpg', '')
There are some possible ways:
- new parametr
- new function
- change backward compatibility
- stay buggy forever
Thank you
|
msg409625 - (view) |
Author: Jan Novak (xnovakj) |
Date: 2022-01-03 22:25 |
It is interesting that pathlib.Path works fine:
>>> pathlib.Path('....jpg').suffix
'.jpg'
>>> pathlib.Path('path/....jpg').suffix
'.jpg'
|
|
Date |
User |
Action |
Args |
2022-04-11 14:59:06 | admin | set | github: 79112 |
2022-01-03 23:23:53 | iritkatriel | set | resolution: fixed -> components:
+ Library (Lib) |
2022-01-03 22:25:37 | xnovakj | set | messages:
+ msg409625 |
2022-01-03 21:53:03 | xnovakj | set | status: closed -> open
messages:
+ msg409622 versions:
+ Python 3.10 |
2022-01-03 20:40:02 | iritkatriel | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2022-01-03 20:39:12 | iritkatriel | set | messages:
+ msg409618 |
2022-01-03 20:36:49 | iritkatriel | set | messages:
+ msg409617 |
2022-01-03 20:10:37 | miss-islington | set | pull_requests:
+ pull_request28582 |
2022-01-03 20:10:32 | miss-islington | set | nosy:
+ miss-islington pull_requests:
+ pull_request28581
|
2022-01-03 20:10:18 | iritkatriel | set | messages:
+ msg409615 |
2022-01-03 11:27:39 | serhiy.storchaka | set | messages:
+ msg409576 |
2022-01-03 10:35:46 | eryksun | set | nosy:
+ eryksun messages:
+ msg409575
|
2022-01-03 08:44:42 | serhiy.storchaka | set | nosy:
+ pitrou messages:
+ msg409569
|
2022-01-03 00:30:37 | iritkatriel | set | keywords:
+ patch nosy:
+ iritkatriel
pull_requests:
+ pull_request28559 stage: patch review |
2020-05-28 18:59:05 | Malcolm Smith | set | nosy:
+ Malcolm Smith messages:
+ msg370262
|
2018-10-29 12:31:32 | serhiy.storchaka | set | nosy:
+ serhiy.storchaka messages:
+ msg328820
|
2018-10-29 11:55:07 | xnovakj | set | messages:
+ msg328810 |
2018-10-28 18:23:31 | lys.nikolaou | set | nosy:
+ lys.nikolaou messages:
+ msg328722
|
2018-10-08 13:59:44 | xtreak | set | nosy:
+ xtreak
|
2018-10-08 12:06:01 | xnovakj | create | |