This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: An 'ascii_alphanumerics' variable is missing in the 'strings' lib
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.11
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, corona10, pauloxnet, serhiy.storchaka, skip.montanaro, terry.reedy
Priority: normal Keywords: patch

Created on 2021-10-29 16:11 by pauloxnet, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 29317 open pauloxnet, 2021-10-29 16:24
Messages (9)
msg405315 - (view) Author: Paolo Melchiorre (pauloxnet) * Date: 2021-10-29 16:11
It is very common to construct a variable containing alphanumeric values as a basis for generating random strings, especially in the web environment as a slug to be used in URLs:

>>> import string
>>> ascii_alphanumerics = string.ascii_letters + string.digits
>>> ascii_alphanumerics
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'


I suggest inserting a variable for just this purpose directly into Python's "string" module:

>>> import string
>>> string.ascii_alphanumerics
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
msg405353 - (view) Author: Batuhan Taskaya (BTaskaya) * (Python committer) Date: 2021-10-29 22:58
I've found 81 occurrences of this pattern among 52 projects on a dataset of top PyPI packages (~3.5K). So I'd say +1 on including this.
msg405359 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-10-30 04:55
+1 to me also
msg405360 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-10-30 04:58
I always need to write trivial code to generate fixed-length random strings with ascii_alphanumerics.
It will solve similar usages and help a lot of packages.
msg405365 - (view) Author: Skip Montanaro (skip.montanaro) * (Python triager) Date: 2021-10-30 12:09
I'll be the wet blanket here and say -1. This doesn't seem at all necessary. 81 occurrences in ~3.5k PyPI packages? That's a hardly overwhelming endorsement. To top it off, since this can't be backported to 3.10 and earlier, it creates a needless (trivial) difference. Package authors who would like to use this but support earlier versions of Py3 will need to do something like this:

try:
    from string import ascii_alphanumerics
except ImportError:
    ascii_alphanumerics = string.ascii_letters + string.digits

They get no benefit from the addition. In fact, their code gets marginally harder to read.
msg405368 - (view) Author: Dong-hee Na (corona10) * (Python committer) Date: 2021-10-30 13:22
@skip.montanaro

I agree with your concern, if authors want to use this constant,
they need to use the mentioned workaround to import it.
But we already have cases that add such constants if we considered them as useful constants.

So I think that the main concern of this issue is that how it will be useful.
In my case, if I should write the utility to generate random tokens for authentication in python, I always write a single trivial code to use ascii_alphanumerics, that's why I thought that this constant can be useful.
FYI We already have the same use case in official documentation[1] also.

But if other core devs also thought that this is not that commonly well-used case, I will change my mind to -1 since we can not support all ASCII variations(e.g ascii_number_special) and I agree with that point.

[1] https://docs.python.org/3/library/secrets.html#recipes-and-best-practices
msg405369 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2021-10-30 13:49
I was also against this feature. My reasons:

1. There will be not much benefit. ascii_alphanumerics is too long, so there is no much saving of typing. There is larger chance of typo. You will be not able to use it for several years, until support of Python 3.10 be dropped, without making the code uglier.

2. What exactly should it include? In virtually all use cases in the stdlib and tests (except two tests) you need a set which contains ascii_letters, digits, and some other characters (usually "_", but sometimes more). Should "_" be included in ascii_alphanumerics? Whatever we choose, it will be less explicit than ascii_letters + digits and people will need to look in the documentations or make errors. And every reader of the code would have doubts.

Lib/http/cookies.py:162:_LegalChars = string.ascii_letters + string.digits + "!#$%&'*+-.^_`|~:"
Lib/crypt.py:19:_saltchars = _string.ascii_letters + _string.digits + './'
Lib/email/_encoded_words.py:75:    safe = b'-!*+/' + ascii_letters.encode('ascii') + digits.encode('ascii')
Lib/email/quoprimime.py:60:for c in b'-!*+/' + ascii_letters.encode('ascii') + digits.encode('ascii'):
Lib/cmd.py:50:IDENTCHARS = string.ascii_letters + string.digits + '_'
Lib/idlelib/autoexpand.py:20:    wordchars = string.ascii_letters + string.digits + "_"
Lib/idlelib/undo.py:254:    alphanumeric = string.ascii_letters + string.digits + "_"
Lib/idlelib/editor.py:809:    IDENTCHARS = string.ascii_letters + string.digits + "_"
Lib/idlelib/hyperparser.py:13:_ASCII_ID_CHARS = frozenset(string.ascii_letters + string.digits + "_")
Lib/idlelib/autocomplete.py:33:ID_CHARS = string.ascii_letters + string.digits + "_"
Lib/test/test_re.py:1011:    LITERAL_CHARS = string.ascii_letters + string.digits + '!"%\',/:;<=>@_`'
Lib/test/test_email/test__header_value_parser.py:50:    rfc_atext_chars = (string.ascii_letters + string.digits +
Lib/test/test_importlib/test_util.py:654:        valid_characters = string.ascii_letters + string.digits
Lib/test/test_secrets.py:115:        legal = string.ascii_letters + string.digits + '-_'
Lib/test/string_tests.py:1219:        s = string.ascii_letters + string.digits
Lib/test/test_shlex.py:328:        safeunquoted = string.ascii_letters + string.digits + '@%_-+=:,./'
Lib/ntpath.py:359:        varchars = bytes(string.ascii_letters + string.digits + '_-', 'ascii')
Lib/ntpath.py:370:        varchars = string.ascii_letters + string.digits + '_-'
Lib/msilib/__init__.py:177:    identifier_chars = string.ascii_letters + string.digits + "._"
Tools/scripts/texi2html.py:1975:goodchars = string.ascii_letters + string.digits + '!@-=+.'

But if other core developers support it, it is okay to me.
msg405389 - (view) Author: Paolo Melchiorre (pauloxnet) * Date: 2021-10-30 22:09
Thank you all for the feedback, also I have been using python for years and contributing to other python-based projects it is my first PR to Cpython and I don't know if my contribution to the discussion is foreseen.

It seems to me that adding this constant does not cause any damage in fact it is an easy victory. However, I try to clarify some doubts raised.

I proposed this new variable to simplify the code in a use case that occurs very often, not so much to save some characters.

I think every single new variable, constant and function needed a waiting period before being used in packages that support different versions of python (ex: f-string in Python 3.6) but can be used in the code of projects that use the latest version of python or in the python code itself.

The definition of the alphanumeric constant in the standard libraries I believe can help to have a clearly defined common variable[1], which could be used as a basis for alphabets containing additional symbols (ex: _- =)

[1] https://en.wikipedia.org/wiki/Alphanumeric
msg405517 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-11-02 16:52
Serhiy, thanks for the list. I can see it as either justifying or refuting the proposal.  So +-1 for the moment.

(I opened #45692 to eliminate the duplication in idlelib.)
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89832
2021-11-02 16:52:28terry.reedysetnosy: + terry.reedy
messages: + msg405517
2021-10-30 22:09:27pauloxnetsetmessages: + msg405389
2021-10-30 13:49:29serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg405369
2021-10-30 13:22:43corona10setmessages: + msg405368
2021-10-30 12:09:23skip.montanarosetnosy: + skip.montanaro
messages: + msg405365
2021-10-30 04:58:23corona10setmessages: + msg405360
2021-10-30 04:55:27corona10setnosy: + corona10
messages: + msg405359
2021-10-29 22:58:28BTaskayasetnosy: + BTaskaya
messages: + msg405353
2021-10-29 16:24:10pauloxnetsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27586
2021-10-29 16:11:03pauloxnetcreate