Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(6)

Unified Diff: Doc/c-api/tokenizer.rst

Issue 3353: make built-in tokenizer available via Python C API
Patch Set: Created 4 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | Doc/c-api/utilities.rst » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
new file mode 100644
--- /dev/null
+++ b/Doc/c-api/tokenizer.rst
@@ -0,0 +1,96 @@
+.. highlightlang:: c
+
+.. _tokenizer:
+
+Tokenizing Python Code
+======================
+
+.. sectionauthor:: Dustin J. Mitchell <dustin@cs.uchicago.edu>
+
+.. index::
+ tokenizer
+
+These routines allow C code to break Python code into a stream of tokens.
+The token constants match those defined in :mod:`token`, but with a ``PYTOK_`` prefix.
+
+.. c:type:: PyTokenizer_State
+
+ The C structure used to represent the state of a tokenizer.
+
+.. c:function:: PyTokenizer_State *PyTokenizer_FromString(string, exec_input)
+
+ :param string: string to convert to tokens
+ :param exec_input: true if the input is from an ``exec`` call
+
+ Initialize a tokenizer to read from a C string.
+ If ``exec_input`` is true, then an implicit newline will be added to the end of the string.
+
+.. c:function:: PyTokenizer_State *PyTokenizer_FromUTF8String(string, exec_input)
+
+ :param string: UTF-8 encoded string to convert to tokens
+ :param exec_input: true if the input is from an ``exec`` call
+
+ Initialize a tokenizer to read from a UTF-8 encoded C string.
+ If ``exec_input`` is true, then an implicit newline will be added to the end of the string.
+
+.. c:function:: PyTokenizer_State *PyTokenizer_FromFile(FILE *fp, const char *encoding, const char *ps1, const char *ps2)
+
+ :param fp: file to tokenize
+ :param encoding: encoding of the file contents
+ :param ps1: initial-line interactive prompt
+ :param ps2: subsequent-line interactive prompt
+
+ Initialize a tokenizer to read from a file.
+ The file data is decoded using ``encoding``, if given.
+ If ``ps1`` and ``ps2`` are not NULL, the tokenizer will operate in interactive mode.
+
+.. c:function:: PyTokenizer_Free(PyTokenizer_State *state)
+
+ :param state: tokenizer state
+
+ Free the given tokenizer.
+
+.. c:function:: int PyTokenizer_Get(PyTokenizer_State *state, char **p_start, char **p_end)
+
+ :param state: tokenizer state
+ :param p_start: (output) first character of the returned token
+ :param p_end: (output) first character following the returned token
+ :return: token
+
+ Get the next token from the tokenizer.
+ The ``p_start`` and ``p_end`` output parameters give the boundaries of the returned token.
+
+.. c:function:: PYTOK_ISTERMINAL(x)
+
+ Return true for terminal token values.
+
+.. c:function:: ISNONTERMINAL(x)
+
+ Return true for non-terminal token values.
+
+.. c:function:: ISEOF(x)
+
+ Return true if *x* is the marker indicating the end of input.
+
+Putting all of that together::
+
+ PyTokenizer_State *tokenizer;
+ int tok;
+ int nest_level;
+ char *p_start, *p_end;
+
+ tokenizer = PyTokenizer_FromString("((1+2)+(3+4))", 1);
+
+ nest_level = 0;
+ while (1) {
+ tok = PyTokenizer_Get(tokenizer, &p_start, &p_end);
+ if (PYTOK_ISEOF(tok))
+ break;
+ switch (tok) {
+ case PYTOK_LPAR: nest_level++; break;
+ case PYTOK_RPAR: nest_level--; break;
+ }
+ }
+
+ PyTokenizer_Free(tokenizer);
+ printf("final nesting level: %d\n", nest_level);
« no previous file with comments | « no previous file | Doc/c-api/utilities.rst » ('j') | no next file with comments »

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+