Rietveld Code Review Tool
Help | Bug tracker | Discussion group | Source code | Sign in
(5)

Delta Between Two Patch Sets: Doc/c-api/tokenizer.rst

Issue 3353: make built-in tokenizer available via Python C API
Left Patch Set: Created 5 years, 8 months ago
Right Patch Set: Created 4 years, 10 months ago
Left:
Right:
Use n/p to move between diff chunks; N/P to move between comments. Please Sign in to add in-line comments.
Jump to:
Right: Side by side diff | Download
« no previous file with change/comment | « no previous file | Doc/c-api/utilities.rst » ('j') | no next file with change/comment »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
LEFTRIGHT
(no file at all)
1 .. highlightlang:: c
2
3 .. _tokenizer:
4
5 Tokenizing Python Code
6 ======================
7
8 .. sectionauthor:: Dustin J. Mitchell <dustin@cs.uchicago.edu>
9
10 .. index::
11 tokenizer
12
13 These routines allow C code to break Python code into a stream of tokens.
14 The token constants match those defined in :mod:`token`, but with a ``PYTOK_`` p refix.
15
16 .. c:type:: PyTokenizer_State
17
18 The C structure used to represent the state of a tokenizer.
19
20 .. c:function:: PyTokenizer_State *PyTokenizer_FromString(string, exec_input)
21
22 :param string: string to convert to tokens
23 :param exec_input: true if the input is from an ``exec`` call
24
25 Initialize a tokenizer to read from a C string.
26 If ``exec_input`` is true, then an implicit newline will be added to the end of the string.
27
28 .. c:function:: PyTokenizer_State *PyTokenizer_FromUTF8String(string, exec_input )
29
30 :param string: UTF-8 encoded string to convert to tokens
31 :param exec_input: true if the input is from an ``exec`` call
32
33 Initialize a tokenizer to read from a UTF-8 encoded C string.
34 If ``exec_input`` is true, then an implicit newline will be added to the end of the string.
35
36 .. c:function:: PyTokenizer_State *PyTokenizer_FromFile(FILE *fp, const char *en coding, const char *ps1, const char *ps2)
37
38 :param fp: file to tokenize
39 :param encoding: encoding of the file contents
40 :param ps1: initial-line interactive prompt
41 :param ps2: subsequent-line interactive prompt
42
43 Initialize a tokenizer to read from a file.
44 The file data is decoded using ``encoding``, if given.
45 If ``ps1`` and ``ps2`` are not NULL, the tokenizer will operate in interacti ve mode.
46
47 .. c:function:: PyTokenizer_Free(PyTokenizer_State *state)
48
49 :param state: tokenizer state
50
51 Free the given tokenizer.
52
53 .. c:function:: int PyTokenizer_Get(PyTokenizer_State *state, char **p_start, ch ar **p_end)
54
55 :param state: tokenizer state
56 :param p_start: (output) first character of the returned token
57 :param p_end: (output) first character following the returned token
58 :return: token
59
60 Get the next token from the tokenizer.
61 The ``p_start`` and ``p_end`` output parameters give the boundaries of the r eturned token.
62
63 .. c:function:: PYTOK_ISTERMINAL(x)
64
65 Return true for terminal token values.
66
67 .. c:function:: ISNONTERMINAL(x)
68
69 Return true for non-terminal token values.
70
71 .. c:function:: ISEOF(x)
72
73 Return true if *x* is the marker indicating the end of input.
74
75 Putting all of that together::
76
77 PyTokenizer_State *tokenizer;
78 int tok;
79 int nest_level;
80 char *p_start, *p_end;
81
82 tokenizer = PyTokenizer_FromString("((1+2)+(3+4))", 1);
83
84 nest_level = 0;
85 while (1) {
86 tok = PyTokenizer_Get(tokenizer, &p_start, &p_end);
87 if (PYTOK_ISEOF(tok))
88 break;
89 switch (tok) {
90 case PYTOK_LPAR: nest_level++; break;
91 case PYTOK_RPAR: nest_level--; break;
92 }
93 }
94
95 PyTokenizer_Free(tokenizer);
96 printf("final nesting level: %d\n", nest_level);
LEFTRIGHT

RSS Feeds Recent Issues | This issue
This is Rietveld 894c83f36cb7+