Author djmitche
Recipients Andrew.C, amaury.forgeotdarc, djmitche, effbot, kirkshorts, meador.inge
Date 2015-04-14.16:07:58
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1429027680.06.0.0199156518725.issue3353@psf.upfronthosting.co.za>
In-reply-to
Content
Here's an updated patch for #1:

Existing Patch:
 - move tokenizer.h from Parser/ to Include/
 - Add PyAPI_Func to export tokenizer functions

New:
 - Removed unused, undefined PyTokenizer_RestoreEncoding
 - Include PyTokenizer_State with limited ABI compatibility (but still undocumented)
 - namespace the struct name (PyTokenizer_State)
 - Documentation

I'd like particular attention to the documentation for the tokenizer -- I'm not entirely confident that I have documented the functions correctly!  In particular, I'm not sure how PyTokenizer_FromString handles encodings.

There's a further iteration possible here, but it's beyond my understanding of the tokenizer and of possible uses of the API. That would be to expose some of the tokenizer state fields and document them, either as part of the limited ABI or even the stable API.  In particular, there are about a half-dozen struct fields used by the parser, and those would be good candidates for addition to the public API.

If that's desirable, I'd prefer to merge a revision of my patch first, and keep the issue open for subsequent improvement.
History
Date User Action Args
2015-04-14 16:08:00djmitchesetrecipients: + djmitche, effbot, amaury.forgeotdarc, kirkshorts, meador.inge, Andrew.C
2015-04-14 16:08:00djmitchesetmessageid: <1429027680.06.0.0199156518725.issue3353@psf.upfronthosting.co.za>
2015-04-14 16:08:00djmitchelinkissue3353 messages
2015-04-14 16:07:59djmitchecreate