This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Update Regular Expression HOWTO
Type: Stage: commit review
Components: Documentation Versions: Python 3.1, Python 3.2, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: terry.reedy Nosy List: SilentGhost, akuchling, docs@python, eric.araujo, georg.brandl, terry.reedy
Priority: normal Keywords: patch

Created on 2011-01-09 20:20 by terry.reedy, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
regex.rst.diff SilentGhost, 2011-01-09 22:29
zregex2.rst.diff terry.reedy, 2011-01-10 22:46 Matching Chars addition
Messages (16)
msg125855 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-09 20:20
0. Does 'Release 0.05' at the top have any useful current meaning?
or could it be deleted?

1. Introduction:

The history paragraph "The re module was added in Python 1.5, and provides Perl-style regular expression patterns. Earlier versions of Python came with the regex module, which provided Emacs-style patterns. The regex module was removed completely in Python 2.5." might be eliminated in 3.x, or at least the irrelevant-for-py3 reference to regex. This is a policy decision.

2. Performing matches:

"If you have Tkinter available, you may also want to look at Tools/scripts/redemo.py,"

Change 'Tkinter' to 'tkinter' and make it a module reference.
In link, change 'scripts' to 'demo' as redemo.py got moved.

"Phil Schwartz’s Kodos is also an interactive tool for developing and testing RE patterns."

Add the url '(http://kodos.sourceforge.net/)' to the text so that Windows help users can copy and paste it into a browser. (This should be a general policy.)

"Python 2.2.2 (#1, Feb 10 2003, 12:57:01)"
delete

<_sre.SRE_Match object at 80c4f68>

This is correctly updated (for late 2.x and 3.x)

"<re.MatchObject instance at 80c9650>" (7 like this)

Globally replace 're.MatchObject instance' with '_sre.SRE_Match object'

3. Footnote

"[1] Introduced in Python 2.2.2."

remove for 3.x here and wherever footnote reference is in the text.

4. "Not Using re.VERBOSE"

This section is about *using* re.VERBOSE and the benefit thereof, not about not using it. I recommend deleting 'Not' as it gives the impression that the section is a warning about not using, the opposite of the intent.

5. Code example output and doctest:

I ran doctest.testfile("C:/programs/PyDev/py32/Doc/howto/regex.rst", module_relative = False)

After the 're...' to '_sre...' substitution above, all 11 failures would be due to 'at 0x#######' address mismatches. I believe changing all 11 addresses to '0x...' (I took this from the doctest doc) would both fix the failures and remove irrelevant detail for human readers.

The other 87 examples all passed ;-!.

Is there any current doctest-related markup that should be added?
msg125858 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-09 20:47
Your points 1-5 all sound valid to me.  Would you like to do make a patch? I don't know what to do about the release number.  Probably doesn't hurt anyone to keep it.
msg125859 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-09 20:51
Good points overall.

The only subpoint I disagree with is this one: “Add the url '(http://kodos.sourceforge.net/)' to the text so that Windows help users can copy and paste it into a browser. (This should be a general policy.)”  IMO, it’s the job of the Sphinx builder to add URIs in plaintext if the format does not have hyperlinks.  -1 on cluttering the source and HTML output with duplicated links.
msg125861 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2011-01-09 20:55
Oh right, I misread that one.  Can't Windows help users right-click and select "Copy URL"?
msg125862 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2011-01-09 20:56
Here is the patch implementing all but the url suggestion.

Doctest still has 11 failures (changing to '0x...' didn't help).
msg125865 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2011-01-09 21:20
A few bits and pieces fixed compared to the previous patch.

>>> doctest.testfile("/home/mischa/pydev/Doc/howto/regex.rst", module_relative = False, optionflags=doctest.ELLIPSIS)
TestResults(failed=0, attempted=98)
msg125866 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2011-01-09 21:35
It seems that the special sequences description in Matching Characters section need to be updated to incorporate information on unicode and bytes. I don't think, however, that it's a good idea just to copy that information from the Doc/library/re.rst May be the section could be shortened and linked to that RE Syntax section? there aren't any deeper links available unfortunately.
msg125868 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-09 22:17
I agree that the .rst should not have two copies and that any windows.chm specific fixup should be in the tool. Right now, right clicking gives a context menu with one item: Properties. Clicking that brings up a dialog box with a url that can be copied. Good enough for me at the moment but not terribly obvious. A possible separate issue.

Unless A Kuchling says different, I would like to remove the version number. It implies to me that this doc is in pre-alpha condition and it is far beyond that. I see that the patch already does so.

-:file:`Tools/scripts/redemo.py`, a demonstration program included with the
+:file:`Tools/scripts/demo.py`, a demonstration program included with the

should (currently) be
+:file:`Tools/demo/redemo.py`, a demonstration program included with the

Other than that, the patch looks good. Thanks. I am still thinking about Matching Characters. Once the patch is fixed with possible addition, a 2.7 version can easily be made be deleting the 3.x-specific deletions.
msg125869 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2011-01-09 22:29
I don't know whether it would be easy to strip down py3k version to 2.7 version.

Seeing how it's just a basic introduction, I would think that a single statement re unicode support might be sufficient. For exhaustive description of special sequences refer the docs and carry on with ascii strings.

Attached patch fixes path issue.
msg125874 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-09 23:50
Since I think I know how to do it, easily, I will try to derive the 2.7 patch.

In Matching Characters, I think
"The following predefined special sequences are available:"

should be expanded to 

"The following predefined special sequences are a subset of those available. The equivalent classes are for bytes patterns. For a complete list of sequences and expanded class definitions for Unicode string patterns, see the end of Regular Expression Syntax."
(with section reference markup).

Note to myself. /bytes/byte string/ for 2.7.

While the changes all look innocuous to me with respect to building the docs, I am curious if you have tried to rebuild the HOWTO (if you have the tool chain, which I do not).
msg125876 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-10 02:20
> I agree that the .rst should not have two copies and that any windows.chm specific fixup should be in the tool. Right now, right clicking gives a context menu with one item: Properties. Clicking that brings up a dialog box with a url that can be copied. Good enough for me at the moment but not terribly obvious. A possible separate issue.

I would argue that this is a bug in the CHM viewers, not Python :)
msg125891 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2011-01-10 10:09
> While the changes all look innocuous to me with respect to building the docs, I am curious if you have tried to rebuild the HOWTO (if you have the tool chain, which I do not).

I did rebuild the docs with 'make html'. Build was clean every time. If you meant something else please let me know.
msg125946 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-10 22:46
I applied patch to 3.2, 3.1 in r87904, r87905. Thanks.
I had to re-edit for 2.7: r87909.

I made a separate small patch for my suggested addition to Matching Characters. Could someone check that it is correct, given that re.rst contains the target directive (or whatever it is called):
.. _re-syntax:
msg125950 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-01-10 23:01
Looks good, builds without warnings.

Note that you can use :ref:`re-syntax` and Sphinx will substitute the heading for you.  The :role:`some special text <real-target>` form is used when you want to control the text of the link.

(That thing is called an hyperlink target: http://docutils.sourceforge.net/docs/user/rst/quickref.html#hyperlink-targets)
msg125962 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-11 00:08
and r87918 for 2.7, with bytes -> byte string
msg126024 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2011-01-11 17:46
Correction: r87912 and r87913 for 3.x
History
Date User Action Args
2022-04-11 14:57:11adminsetgithub: 55084
2011-01-11 17:46:26terry.reedysetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg126024
2011-01-11 17:28:15terry.reedysetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: - msg125954
2011-01-11 00:08:18terry.reedysetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125962
2011-01-10 23:19:16terry.reedysetstatus: open -> closed

messages: + msg125954
resolution: fixed
nosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
2011-01-10 23:01:04eric.araujosetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125950
2011-01-10 22:46:17terry.reedysetfiles: + zregex2.rst.diff
nosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125946

assignee: docs@python -> terry.reedy
stage: needs patch -> commit review
2011-01-10 10:09:45SilentGhostsetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125891
2011-01-10 02:20:40eric.araujosetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125876
2011-01-09 23:50:30terry.reedysetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125874
2011-01-09 22:29:27SilentGhostsetfiles: - regex.rst.diff
nosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
2011-01-09 22:29:04SilentGhostsetfiles: + regex.rst.diff
nosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125869
2011-01-09 22:17:47terry.reedysetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125868
2011-01-09 21:35:28SilentGhostsetnosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125866
2011-01-09 21:21:05SilentGhostsetfiles: - regex.rst.diff
nosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
2011-01-09 21:20:54SilentGhostsetfiles: + regex.rst.diff
nosy: akuchling, georg.brandl, terry.reedy, eric.araujo, SilentGhost, docs@python
messages: + msg125865
2011-01-09 20:56:56SilentGhostsetfiles: + regex.rst.diff

nosy: + SilentGhost
messages: + msg125862

keywords: + patch
2011-01-09 20:55:37georg.brandlsetmessages: + msg125861
2011-01-09 20:51:08eric.araujosetnosy: + eric.araujo
messages: + msg125859
2011-01-09 20:47:57georg.brandlsetnosy: + georg.brandl
messages: + msg125858
2011-01-09 20:20:07terry.reedycreate