Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(37)

Delta Between Two Patch Sets: Doc/c-api/tokenizer.rst

Issue 3353: make built-in tokenizer available via Python C API
Left Patch Set: Created 4 years, 10 months ago
Right Patch Set: Created 4 years, 10 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
Left: Side by side diff | Download
Right: Side by side diff | Download
« no previous file with change/comment | « no previous file | Doc/c-api/utilities.rst » ('j') | no next file with change/comment »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
LEFTRIGHT
1 .. highlightlang:: c 1 .. highlightlang:: c
2 2
3 .. _tokenizer: 3 .. _tokenizer:
4 4
5 Tokenizing Python Code 5 Tokenizing Python Code
Nick Coghlan 2015/04/14 18:25:39 This could use a usage example to demonstrate chai
6 ====================== 6 ======================
7 7
8 .. sectionauthor:: Dustin J. Mitchell <dustin@cs.uchicago.edu> 8 .. sectionauthor:: Dustin J. Mitchell <dustin@cs.uchicago.edu>
9 9
10 .. index:: 10 .. index::
11 tokenizer 11 tokenizer
12 12
13 These routines allow C code to break Python code into a stream of tokens. 13 These routines allow C code to break Python code into a stream of tokens.
14 The token constants match those defined in :mod:`token`. 14 The token constants match those defined in :mod:`token`, but with a ``PYTOK_`` p refix.
15 15
16 .. c:type:: PyTokenizer_State 16 .. c:type:: PyTokenizer_State
17 17
18 The C structure used to represent the state of a tokenizer. 18 The C structure used to represent the state of a tokenizer.
19 19
20 .. c:function:: PyTokenizer_State *PyTokenizer_FromString(string, exec_input) 20 .. c:function:: PyTokenizer_State *PyTokenizer_FromString(string, exec_input)
21 21
22 :param string: string to convert to tokens 22 :param string: string to convert to tokens
23 :param exec_input: true if the input is from an ``exec`` call 23 :param exec_input: true if the input is from an ``exec`` call
24 24
(...skipping 18 matching lines...) Expand all
43 Initialize a tokenizer to read from a file. 43 Initialize a tokenizer to read from a file.
44 The file data is decoded using ``encoding``, if given. 44 The file data is decoded using ``encoding``, if given.
45 If ``ps1`` and ``ps2`` are not NULL, the tokenizer will operate in interacti ve mode. 45 If ``ps1`` and ``ps2`` are not NULL, the tokenizer will operate in interacti ve mode.
46 46
47 .. c:function:: PyTokenizer_Free(PyTokenizer_State *state) 47 .. c:function:: PyTokenizer_Free(PyTokenizer_State *state)
48 48
49 :param state: tokenizer state 49 :param state: tokenizer state
50 50
51 Free the given tokenizer. 51 Free the given tokenizer.
52 52
53 .. c:function:: int PyTokenizer_Get(PyTokenizer_State, *state, char **p_start, c har **p_end) 53 .. c:function:: int PyTokenizer_Get(PyTokenizer_State *state, char **p_start, ch ar **p_end)
54 54
55 :param state: tokenizer state 55 :param state: tokenizer state
56 :param p_start: (output) first character of the returned token 56 :param p_start: (output) first character of the returned token
57 :param p_end: (output) first character following the returned token 57 :param p_end: (output) first character following the returned token
58 :return: token 58 :return: token
59 59
60 Get the next token from the tokenizer. 60 Get the next token from the tokenizer.
61 The ``p_start`` and ``p_end`` output parameters give the boundaries of the r eturned token. 61 The ``p_start`` and ``p_end`` output parameters give the boundaries of the r eturned token.
62
63 .. c:function:: PYTOK_ISTERMINAL(x)
64
65 Return true for terminal token values.
66
67 .. c:function:: ISNONTERMINAL(x)
68
69 Return true for non-terminal token values.
70
71 .. c:function:: ISEOF(x)
72
73 Return true if *x* is the marker indicating the end of input.
74
75 Putting all of that together::
76
77 PyTokenizer_State *tokenizer;
78 int tok;
79 int nest_level;
80 char *p_start, *p_end;
81
82 tokenizer = PyTokenizer_FromString("((1+2)+(3+4))", 1);
83
84 nest_level = 0;
85 while (1) {
86 tok = PyTokenizer_Get(tokenizer, &p_start, &p_end);
87 if (PYTOK_ISEOF(tok))
88 break;
89 switch (tok) {
90 case PYTOK_LPAR: nest_level++; break;
91 case PYTOK_RPAR: nest_level--; break;
92 }
93 }
94
95 PyTokenizer_Free(tokenizer);
96 printf("final nesting level: %d\n", nest_level);
LEFTRIGHT

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+