| --- a/Doc/library/shlex.rst Sat Jun 02 18:22:31 2012 +0200 |
| +++ b/Doc/library/shlex.rst Sun Jun 03 18:46:34 2012 +0100 |
| @@ -71,7 +71,7 @@ |
| The :mod:`shlex` module defines the following class: |
| -.. class:: shlex(instream=None, infile=None, posix=False) |
| +.. class:: shlex(instream=None, infile=None, posix=False, punctuation_chars=False) |
| A :class:`shlex` instance or subclass instance is a lexical analyzer object. |
| The initialization argument, if present, specifies where to read characters |
| @@ -81,10 +81,22 @@ |
| string, which sets the initial value of the :attr:`infile` attribute. If the |
| *instream* argument is omitted or equal to ``sys.stdin``, this second |
| argument defaults to "stdin". The *posix* argument defines the operational |
| - mode: when *posix* is not true (default), the :class:`shlex` instance will |
| + mode: when *posix* is false (the default), the :class:`shlex` instance will |
| operate in compatibility mode. When operating in POSIX mode, :class:`shlex` |
| - will try to be as close as possible to the POSIX shell parsing rules. |
| + will try to be as close as possible to the POSIX shell parsing rules. The |
| + *punctuation_chars* argument provides a way to make the behaviour even |
| + closer to how real shells parse. This can take a number of values: the |
| + default value, ``False``, preserves the behaviour seen under Python 3.2 and |
| + earlier. If set to ``True``, then parsing of the characters ``();<>|&`` is |
| + changed: any run of these characters (considered punctuation characters) is |
| + returned as a single token. If set to a non-empty string of characters, |
| + those characters will be used as the punctuation characters. Any characters |
| + in the :attr:`wordchars` attribute that appear in *punctuation_chars* will |
| + be removed from :attr:`wordchars`. See :ref:`improved-shell-compatibility` |
| + for more information. |
| + .. versionchanged:: 3.3 |
| + The `punctuation_chars` parameter was added. |
| .. seealso:: |
| @@ -186,7 +198,13 @@ |
| .. attribute:: shlex.wordchars |
| The string of characters that will accumulate into multi-character tokens. By |
| - default, includes all ASCII alphanumerics and underscore. |
| + default, includes all ASCII alphanumerics and underscore. In POSIX mode, the |
| + accented charaters in the Latin-1 set are also included. If |
| + :attr:`punctuation_chars` is not empty, the characters ``~-./*?=``, which can |
| + appear in filename specifications and command line parameters, will also be |
| + included in this attribute, and any characters which appear in |
| + ``punctuation_chars`` will be removed from ``wordchars`` if they are present |
| + there. |
| .. attribute:: shlex.whitespace |
| @@ -217,9 +235,11 @@ |
| .. attribute:: shlex.whitespace_split |
| - If ``True``, tokens will only be split in whitespaces. This is useful, for |
| - example, for parsing command lines with :class:`shlex`, getting tokens in a |
| - similar way to shell arguments. |
| + If ``True``, tokens will only be split in whitespaces. If this attribute is |
| + ``True``, :attr:`punctuation_chars` will have no effect, and splitting will |
| + happen only in whitespaces. When using :attr:`punctuation_chars`, which is |
| + intended to provide parsing closer to that implemented by shells, it is |
| + advisable to leave ``whitespace_split`` as ``False`` (the default value). |
| .. attribute:: shlex.infile |
| @@ -268,6 +288,16 @@ |
| (``''``), in non-POSIX mode, and to ``None`` in POSIX mode. |
| +.. attribute:: shlex.punctuation_chars |
| + |
| + Characters that will be considered punctuation. Runs of punctuation |
| + characters will be returned as a single token. However, note that no |
| + semantic validity checking will be performed: for example, '>>>' could be |
| + returned as a token, even though it may not recognised as such by shells. |
| + |
| + .. versionadded:: 3.3 |
| + |
| + |
| .. _shlex-parsing-rules: |
| Parsing Rules |
| @@ -317,3 +347,62 @@ |
| * EOF is signaled with a :const:`None` value; |
| * Quoted empty strings (``''``) are allowed. |
| + |
| +.. _improved-shell-compatibility: |
| + |
| +Improved Compatibility with Shells |
| +---------------------------------- |
| + |
| +.. versionadded:: 3.3 |
| + |
| +The :class:`shlex` class provides compatibility with the parsing performed by |
| +common Unix shells like ``bash``, ``dash``, and ``sh``. To take advantage of |
| +this compatibility, specify the ``punctuation_chars`` argument in the |
| +constructor. This defaults to ``False``, which preserves pre-3.3 behaviour. |
| +However, if it is set to ``True``, then parsing of the characters ``();<>|&`` |
| +is changed: any run of these characters is returned as a single token. While |
| +this is short of a full parser for shells (which would be out of scope for the |
| +standard library, given the multiplicity of shells out there), it does allow |
| +you to perform processing of command lines more easily than you could |
| +otherwise. To illustrate, you can see the difference in the following snippet:: |
| + |
| + import shlex |
| + |
| + for punct in (False, True): |
| + if punct: |
| + message = 'Old' |
| + else: |
| + message = 'New' |
| + text = "a && b; c && d || e; f >'abc'; (def \"ghi\")" |
| + s = shlex.shlex(text, punctuation_chars=punct) |
| + print('%s: %s' % (message, list(s))) |
| + |
| +which prints out:: |
| + |
| + Old: ['a', '&', '&', 'b', ';', 'c', '&', '&', 'd', '|', '|', 'e', ';', 'f', '>', "'abc'", ';', '(', 'def', '"ghi"', ')'] |
| + New: ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', "'abc'", ';', '(', 'def', '"ghi"', ')'] |
| + |
| +Of course, tokens will be returned which are not valid for shells, and you'll |
| +need to implement your own error checks on the returned tokens. |
| + |
| +Instead of passing ``True`` as the value for the punctuation_chars parameter, |
| +you can pass a string with specific characters, which will be used to determine |
| +which characters constitute punctuation. For example:: |
| + |
| + >>> import shlex |
| + >>> s = shlex.shlex("a && b || c", punctuation_chars="|") |
| + >>> list(s) |
| + ['a', '&', '&', 'b', '||', 'c'] |
| + |
| +.. note:: When ``punctuation_chars`` is specified, the :attr:`~shlex.wordchars` |
| + attribute is augmented with the characters ``~-./*?=``. That is because these |
| + characters can appear in file names (including wildcards) and command-line |
| + arguments (e.g. ``--color=auto``). Hence:: |
| + |
| + >>> import shlex |
| + >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?', |
| + ... punctuation_chars=True) |
| + >>> list(s) |
| + ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?'] |
| + |
| + |