Message 191081 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ncoghlan
Recipients	belopolsky, cvrebert, eric.araujo, eric.smith, ezio.melotti, lemburg, mark.dickinson, ncoghlan, skrah, vstinner
Date	2013-06-13.12:06:57
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1371125218.18.0.0572478723595.issue10581@psf.upfronthosting.co.za>
In-reply-to

Content
I think PEP 393 gives us a quick way to fast parsing: if the max char is < 128, just roll straight into normal processing, otherwise do the normalisation and "all decimal digits are from the same script" steps. There are almost certainly better ways to do the script translation, but the example below tries to just do the "convert to ASCII" step to avoid duplicating the +/- and decimal point processing logic: if max_char(arg) >= 128: arg = toNFKC(arg) originals = set() converted = [] for c in arg: try: d = str(unicodedata.decimal(c)) except ValueError: d = c else: originals.add(c) converted.append(d) if (max(originals) - min(originals)) >= 10: raise ValueError("%s mixes digits from multiple scripts" % arg) arg = "".join(converted) result = parse_ascii_number(arg) P.S. I don't think the base argument is especially applicable ('0x' is rejected because 'x' is not a base 10 digit and we allow a base of '0' to request "use int literal base markers").

I think PEP 393 gives us a quick way to fast parsing: if the max char is < 128, just roll straight into normal processing, otherwise do the normalisation and "all decimal digits are from the same script" steps.

There are almost certainly better ways to do the script translation, but the example below tries to just do the "convert to ASCII" step to avoid duplicating the +/- and decimal point processing logic:

    if max_char(arg) >= 128:
        arg = toNFKC(arg)
        originals = set()
        converted = []
        for c in arg:
            try:
                d = str(unicodedata.decimal(c))
            except ValueError:
                d = c
            else:
                originals.add(c)
            converted.append(d)
        if (max(originals) - min(originals)) >= 10:
            raise ValueError("%s mixes digits from multiple scripts" % arg)
        arg = "".join(converted)
    result = parse_ascii_number(arg)


P.S. I don't think the base argument is especially applicable ('0x' is rejected because 'x' is not a base 10 digit and we allow a base of '0' to request "use int literal base markers").

History
Date	User	Action	Args
2013-06-13 12:06:58	ncoghlan	set	recipients: + ncoghlan, lemburg, mark.dickinson, belopolsky, vstinner, eric.smith, ezio.melotti, eric.araujo, cvrebert, skrah
2013-06-13 12:06:58	ncoghlan	set	messageid: <1371125218.18.0.0572478723595.issue10581@psf.upfronthosting.co.za>
2013-06-13 12:06:58	ncoghlan	link	issue10581 messages
2013-06-13 12:06:57	ncoghlan	create