This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Rosuav
Recipients Rosuav, berker.peksag, ncoghlan
Date 2016-07-21.14:19:03
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1469110744.57.0.181920176875.issue27582@psf.upfronthosting.co.za>
In-reply-to
Content
Actually pinpointing the invalid character may be impractical, as there are two boolean situations: either a UnicodeDecodeError (because you had an invalid UTF-8 stream), or PyUnicode_IsIdentifier returns false. Either way, it applies to the whole identifier. So there are a few possibilities, corresponding to the patches I'm attaching.

1) Change the way this one specific error is handled, in tokenizer.c verify_identifier(). If it finds an error, adjust tok->cur to point to the beginning of it. No new failures in test suite.

2) As above, but also change tok->inp, because of this comment in tokenizer.h:31 /* NB If done != E_OK, cur must be == inp!!! */ which I have no idea about the meaning of. This results in truncated error messages, but suggests that method 1 might be breaking an invariant that results in breakage elsewhere. If there are, though, they're not exercised by 'make test', which itself may be a problem. No new test failures.

3) Change the handling of ALL parser errors, in parsetok.c parsetok(), so now they all point to tok->start. Octal literals with 8s or 9s in them now get the caret pointing to the invalid digit, rather than the end of the literal. Unterminated strings point to the opening quote. And some forms of IndentationError now segfault Python. Test suite fails (unsurprisingly).

4) In response to the above segfault, hack it back to the old way of doing things if there's no tok->start. Maybe the condition should be done differently? No new failures in the test suite.

I'd ideally like to use the technique from method 3 (either as patch 4 or with some other guard condition). Failing that, can anyone explain the "NB" above, and what ought to be done to comply with it?
History
Date User Action Args
2016-07-21 14:19:04Rosuavsetrecipients: + Rosuav, ncoghlan, berker.peksag
2016-07-21 14:19:04Rosuavsetmessageid: <1469110744.57.0.181920176875.issue27582@psf.upfronthosting.co.za>
2016-07-21 14:19:04Rosuavlinkissue27582 messages
2016-07-21 14:19:04Rosuavcreate