Issue 12731: python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/56940

classification

Title:	python lib re uses obsolete sense of \w in full violation of UTS#18 RL1.2a
Type:	behavior	Stage:	needs patch
Components:	Regular Expressions, Unicode	Versions:	Python 3.11

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	Arfrever, HThompson, JustinTArthur, benjamin.peterson, docs@python, ezio.melotti, gvanrossum, lemburg, mrabarnett, pitrou, serhiy.storchaka, tchrist, terry.reedy
Priority:	normal	Keywords:

Created on 2011-08-11 19:18 by tchrist, last changed 2022-04-11 14:57 by admin.

Files
File name	Uploaded	Description	Edit
alnum.python	tchrist, 2011-08-11 19:18	test case showing conformance bugs in Python re lib when handling Unicode

Messages (14)
msg141920 - (view)	Author: Tom Christiansen (tchrist)	Date: 2011-08-11 19:18
You cannot use Python's lib re for handling Unicode regular expressions because it violates the standard set out for the same in UTS#18 on Unicode Regular Expressions in RL1.2a on compatibility properties. What \w is allowed to match is clearly explained there, but Python has its own idea. Because it is in clear violation of the standard, it is misleading and wrong for Python to claim that the re.UNICODE flag makes \w and friends match Unicode. Here are the failed test cases when the attached file is run under v3.2; there are further failures when run under v2.7. FAIL lib re found non alphanumeric string café FAIL lib re found non alphanumeric string Ⓚ FAIL lib re found non alphanumeric string ͅ FAIL lib re found non alphanumeric string ְ FAIL lib re found non alphanumeric string 𝟘 FAIL lib re found non alphanumeric string 𐍁 FAIL lib re found non alphanumeric string 𝔘𝔫𝔦𝔠𝔬𝔡𝔢 FAIL lib re found non alphanumeric string 𐐔𐐯𐑅𐐨𐑉𐐯𐐻 FAIL lib re found non alphanumeric string connector‿punctuation FAIL lib re found non alphanumeric string Ὰͅ_Στο_Διάολο FAIL lib re found non alphanumeric string 𐌰𐍄𐍄𐌰‿𐌿𐌽𐍃𐌰𐍂‿𐌸𐌿‿𐌹𐌽‿𐌷𐌹𐌼𐌹𐌽𐌰𐌼 FAIL lib re found all alphanumeric string ¹²³ FAIL lib re found all alphanumeric string ₁₂₃ FAIL lib re found all alphanumeric string ¼½¾ FAIL lib re found all alphanumeric string ⑶ Note that Matthew Barnett's regex lib for Python handles all of these cases in comformance with The Unicode Standard.
msg141993 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2011-08-12 22:46
However desireable it would be, I do not believe there is any claim in the manual that the re module follows the evolving Unicode consortium r.e. standard. If I understand, you are saying that this statement in the doc, "Matches Unicode word characters;" is not now correct and should be revised. Was it once correct? Could we add "by an older definition of 'word' character"? There has been some discussion of adding regex to the stdlib, possibly as a replacement for re. You posts indicate that regex is more improved then some realized, and hence has more incompatibilities that we realized, and hence is less suitable as a strictly backwards-compatible replacement. So I think it needs to be looked at as a parallel addition. I do not know Mathew's current position on the subject.
msg142001 - (view)	Author: Tom Christiansen (tchrist)	Date: 2011-08-13 00:18
> Terry J. Reedy <tjreedy@udel.edu> added the comment: > However desireable it would be, I do not believe there is any claim in the = > manual that the re module follows the evolving Unicode consortium r.e. stan= My from the hip thought is that if re cannot be fixed to follow the Unicode Standard, it should be deprecated in favor of code that can if such is available, because you cannot process Unicode text with regular expressions otherwise. > dard. If I understand, you are saying that this statement in the doc, "Matc= > hes Unicode word characters;" is not now correct and should be revised. Was= > it once correct? Could we add "by an older definition of 'word' character"= > ? Yes, your hunch is exactly correct. They once had a lesser definition that they have now. It is very very old. I had to track this down for Java once. There is some discussion of a "word_character class" at least as far back as tr18v3 from back in 1998. http://www.unicode.org/reports/tr18/tr18-3.html By the time tr18v5 rolled around just a year later in 1999, the overall document has changed substantially, and you can clearly see its current shape there. Word characters are supposed to include all code points with the Alphabetic property, for example. http://www.unicode.org/reports/tr18/tr18-5.html However, the word "alphabetic" has never been synonymous in Unicode with \p{gc=Lu} \p{gc=Ll} \p{gc=Lt} \p{gc=Lm} \p{gc=Lo} as many people incorrectly assume, nor certainly to \p{gc=Lu} \p{gc=Ll} \p{gc=Lt} let alone to \p{gc=Lu} \p{gc=Ll} Rather, it has since its creation included code points that are not letters, such as all GC=Nl and also certain GC=So code points. And, notoriously, U+0345. Indeed it is here I first noticed that that Python had already broken with the Standard, because U+0345 COMBINING GREEK YPOGEGRAMMENI is GC=Mn, but Alphabetic=True, yet I have shown that Python's title method is messing up there. I wouldn't spend too much in archaeological digs, though, because lots of stuff has changed since the less millennium. It was in tr18v7 from 2003-05 that we hit paydirt, because this is when the famous Annex C of RL1.2a fame first appeared: http://www.unicode.org/reports/tr18/tr18-7.html#Compatibility_Properties Notice how it defines \w to be nothing more than \p{alpha}, \p{digit}, and \p{gc=Pc}. It does not yet contain the requirement that all Marks be counted as part of the word, just the few that are alphas -- which the U+0345 counts for, since it has an uppercase map of a capital iota! That particular change did not occur until tr18v8 in 2003-08, barely a scant three months later. http://www.unicode.org/reports/tr18/tr18-8.html#Compatibility_Properties Now at last we see word characters defined in the modern way that we have become used to. They must match any of: \p{alpha} \p{gc=Mark} \p{digit} \p{gc=Connector_Punctuation} BTW, Python is matching all of \p{GC=N} meaning \p{GC=Nd} \p{GC=Nl} \p{GC=No} instead of the required \p{GC=Nd} which is a synonym for \p{digit}. I don't know had that happened, because \w has never included all number code points in Unicode, only the decimal number ones. That all goes to show why, when citing conformance to some aspect of The Unicode Standard, one must be exceedingly careful just how one does so! The Unicode Consortium recognizes this is an issue, and I am pretty sure I can hear it in your own subtext as well. Kindly bear with and forgive me for momentarily sounding like a standard lawyer. I do this because to show not just why it is important to get references to the Unicode Standard correct, but indeed, how to do so. After I have given the formal requirements, I will then produce illustrations of various purported claims, some of which meet the citation requirements, and others which do not. ======================================================================= To begin with, there is an entire technical report on conformance. It includes: http://unicode.org/reports/tr33/ The Unicode Standard [Unicode] is a very large and complex standard. Because of this complexity, and because of the nature and role of the standard, it is often rather difficult to determine, in any particular case, just exactly what conformance to the Unicode Standard means. ... Conformance claims must be specific to versions of the Unicode Standard, but the level of specificity needed for a claim may vary according to the nature of the particular conformance claim. Some standards developed by the Unicode Consortium require separate conformance to a specific version (or later), of the Unicode Standard. This version is sometimes called the base version. In such cases, the version of the standard and the version of the Unicode Standard to which the conformance claim refers must be compatible. However, you don't need to read tr33, really, because the most important thing bits about conformance are to be found on pp. 57-58 of Chapter 3 of the published Unicode Standard: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf References to the Unicode Standard The documents associated with the major, minor, and update versions are called the major reference, minor reference, and update reference, respectively. For example, consider Uni- code Version 3.1.1. The major reference for that version is The Unicode Standard, Version 3.0 (ISBN 0-201-61633-5). The minor reference is Unicode Standard Annex #27, "The Uni- code Standard, Version 3.1." The update reference is Unicode Version 3.1.1. The exact list of contributory files, Unicode Standard Annexes, and Unicode Character Database files can be found at Enumerated Version 3.1.1. The reference for this version, Version 6.0.0, of the Unicode Standard, is The Unicode Consortium. The Unicode Standard, Version 6.0.0, defined by: The Unicode Standard, Version 6.0 (Mountain View, CA: The Uni- code Consortium, 2011. ISBN 978-1-936213-01-6) References to an update (or minor version prior to Version 5.2.0) include a reference to both the major version and the documents modifying it. For the standard citation format for other versions of the Unicode Standard, see "Versions" in Section B.6, Other Unicode Online Resources. Precision in Version Citation Because Unicode has an open repertoire with relatively frequent updates, it is important not to over-specify the version number. Wherever the precise behavior of all Unicode char- acters needs to be cited, the full three-field version number should be used, as in the first example below. However, trailing zeros are often omitted, as in the second example. In such a case, writing 3.1 is in all respects equivalent to writing 3.1.0. 1. The Unicode Standard, Version 3.1.1 2. The Unicode Standard, Version 3.1 3. The Unicode Standard, Version 3.0 or later 4. The Unicode Standard Where some basic level of content is all that is important, phrasing such as in the third example can be used. Where the important information is simply the overall architecture and semantics of the Unicode Standard, the version can be omitted entirely, as in example 4. References to Unicode Character Properties Properties and property values have defined names and abbreviations, such as Property: General_Category (gc) Property Value: Uppercase_Letter (Lu) To reference a given property and property value, these aliases are used, as in this example: The property value Uppercase_Letter from the General_Category prop- erty, as specified in Version 6.0.0 of the Unicode Standard. Then cite that version of the standard, using the standard citation format that is provided for each version of the Unicode Standard. When referencing multi-word properties or property values, it is permissible to omit the underscores in these aliases or to replace them by spaces. When referencing a Unicode character property, it is customary to prepend the word "Uni- code" to the name of the property, unless it is clear from context that the Unicode Standard is the source of the specification. References to Unicode Algorithms A reference to a Unicode algorithm must specify the name of the algorithm or its abbrevia- tion, followed by the version of the Unicode Standard, as in this example: The Unicode Bidirectional Algorithm, as specified in Version 6.0.0 of the Unicode Standard. See Unicode Standard Annex #9, "Unicode Bidirectional Algorithm," (http://www.unicode.org/reports/tr9/tr9-23.html) ======================================================================= Now for some concrete citation examples, both correct and dubious. In the JDK7 documentation for on the Character class we find: Character information is based on the Unicode Standard, version 6.0.0. That one is a perfectly good conformance citation, even if there seems a bit of wiggle in "is based on", but no matter. It is short and does everything it needs to. However, in the JDK7 documentation for the Pattern class we somewhat problematically find: Unicode support This class is in conformance with Level 1 of Unicode Technical Standard #18: Unicode Regular Expression, plus RL2.1 Canonical Equivalents. And similarly, in the JDK7 documentation for the Normalizer class we find: This class provides the method normalize which transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 — Unicode Normalization Forms. The problem with those second two Java refs is that they to my reading appear to be in technical violation, for they give neither a full version number nor a date of publication. You have to give one or the other, or both. Java got themselves into a heap of trouble (so to speak) over this once before because it turned out that the version of the document they were actually in conformance with was quite literally from the previous millennium!! That's why you need to give versions and publication dates. Here are some other citations. First, from the perldelta manpage that the Perl 5.14 release ships with: Perl comes with the Unicode 6.0 data base updated with Corrigendum #8 <http://www.unicode.org/versions/corrigendum8.html>, with one exception noted below. See <http://unicode.org/versions/Unicode6.0.0/> for details on the new release. Perl does not support any Unicode provisional properties, including the new ones for this release. That is quite complete, as it even includes the specific which corrigenda we follow and explains the matter of properties. Or this from the perlunicode manpage of that same release: Unicode Regular Expression Support Level The following list of Unicode supported features for regular expressions describes all features currently directly supported by core Perl. The references to "Level N" and the section numbers refer to the Unicode Technical Standard #18, "Unicode Regular Expressions", version 13, from August 2008. See all that? Notice how it gives the name of the document, its revision number, and its publication date. You don't have to do all that for the main Unicode release, but you really ought to when referring to individual technical reports BECAUSE THESE GET UPDATED ASYNCRONOUSLY. I would suggest you pick a version of tr18 that you conform to, and state which of its requirements you do and do not meet. However, I cannot find any version of tr18 that has existed during the present millennium that Python comes even close to meeting more than one or two requirements for. Given that, it may be better to no longer make any claims regarding Unicode at all. That seems like back-peddaling to me, not future-thinking. Matthew's regex module, however, does almost everything right that re does wrong. It may be that as with Java's io vs nio classes (and now heaven forbid nio2!), you actually can't fix the only module and must create a wholly new namespace. I cannot answer that. For RL1.2 proper, the first properties requirement, Java was only missing a few, so they went and added the missing properties. I strongly urge you to do so because you cannot handle Unicode without properties. Rl1.2 requires only 11 of them, so it isn't too hard. Matthew supports many many more. However, because the \w&c issues are bigger, Java addressed the tr18 RL1.2a issues differently, this time by creating a new compilation flag called UNICODE_CHARACTER_CLASSES (with corresponding embedded "(?U)" regex flag.) Truth be told, even Perl has secret pattern compilation flags to govern this sort of thing (ascii, locale, unicode), but we (well, I) hope you never have to use or even notice them. That too might be a route forward for Python, although I am not quite sure how much flexibility and control of your lexical scope you have. However, the "from __future_" imports suggest you may have enough to do something slick so that only people who ask for it get it, and also importantly that they get it all over the place so don't have to add an extra flag or u'...' or whatever every single time. This isn't something I've looked much into, however. Hope this clarifies things. --tom
msg142030 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2011-08-13 17:36
> However, because the \w&c issues are bigger, Java addressed the tr18 RL1.2a > issues differently, this time by creating a new compilation flag called > UNICODE_CHARACTER_CLASSES (with corresponding embedded "(?U)" regex flag.) > > Truth be told, even Perl has secret pattern compilation flags to govern > this sort of thing (ascii, locale, unicode), but we (well, I) hope you > never have to use or even notice them. > > That too might be a route forward for Python, although I am not quite sure > how much flexibility and control of your lexical scope you have. However, > the "from __future_" imports suggest you may have enough to do something > slick so that only people who ask for it get it, and also importantly that > they get it all over the place so don't have to add an extra flag or u'...' > or whatever every single time. If the current behaviour is buggy or sub-optimal, I think we should simply fix it (which might be done by replacing "re" with "regex" if someone wants to shepherd its inclusion in the stdlib). By the way, thanks for the detailed explanations, Tom.
msg142112 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-08-15 10:30
If the regex module works fine here, I think it's better to leave the re module alone and include the regex module in 3.3.
msg143040 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2011-08-26 21:22
Really? The re module cannot be salvaged and we should add regex but keep the (buggy) re? That does not make a lot of sense to me. I think it should just be fixed in the re module. Or the re module should be replaced by the code from the regex module (but renamed to re, and with certain backwards compatibilities restored, probably). But I really hope the re module (really: the _sre extension module) can be fixed. We should also make a habit in our docs of citing specific versions of the Unicode standard, and specific TR numbers and versions where they apply. (And hopefully we can supply URLs to the Unicode consortium's canonical copies of those documents.)
msg143091 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-08-28 06:26
> Or the re module should be replaced by the code from the regex module > (but renamed to re, and with certain backwards compatibilities > restored, probably). This is what I meant. > But I really hope the re module (really: the _sre extension module) > can be fixed. Start fixing these issues from scratch doesn't make much sense IMHO. We could "extract" the fixes from regex and merge them in re, but then again it's probably easier to just replace the whole module. > We should also make a habit in our docs of citing specific versions > of the Unicode standard, and specific TR numbers and versions where > they apply. While this is a good thing it's not always doable. Usually someone reports a bug related to something specified in some standard and only that part gets fixed. Sometimes everything else is also updated to follow the whole standard, but often this happens incrementally, so we can't say, e.g., "the re module supports Unicode x.y" unless we go through the whole standard and fix/implements everything.
msg143092 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-08-28 06:37
> But I really hope the re module (really: the _sre extension module) > can be fixed. If you mean on 2.7/3.2, then I guess we could extract the fixes from regex, but we have to see if it's doable and someone will have to do it. Also consider that the regex module is available for 2.7/3.2, so we could suggest the users to use it if they have problems with the re bugs (even if that means having an additional dependency). ISTM that current plan is: * replace re with regex (and rename it) on 3.3 and fix all these bugs; * leave 2.7 and 3.2 with the old re and its bugs; * let people use the external regex module on 2.7/3.2 if they need to. If this is not ok, maybe it should be discussed on python-dev.
msg143109 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2011-08-28 17:22
[me] >> But I really hope the re module (really: the _sre extension module) >> can be fixed. [Ezio] > Start fixing these issues from scratch doesn't make much sense IMHO. We could "extract" the fixes from regex and merge them in re, but then again it's probably easier to just replace the whole module. I have changed my mind at least half-way. I am open to having regex (with some changes, details TBD) replace re in 3.3. (I am not yet 100% convinced, but I'm not rejecting it as strongly as I was when I wrote that comment in this bug. See the ongoing python-dev discussion on this topic.) >> We should also make a habit in our docs of citing specific versions >> of the Unicode standard, and specific TR numbers and versions where >> they apply. > > While this is a good thing it's not always doable. Usually someone reports a bug related to something specified in some standard and only that part gets fixed. Sometimes everything else is also updated to follow the whole standard, but often this happens incrementally, so we can't say, e.g., "the re module supports Unicode x.y" unless we go through the whole standard and fix/implements everything. Hm. I think that for Unicode it may actually be important enough to be consistent in following the whole standard that we should attempt to be consistent and not just chase bug reports. Now, we may consciously decide not to implement a certain recommendation of the standard. E.g. I'm not going to require that IronPython or Jython have string objects that support O(1) indexing of code points, even (assuming PEP 393 gets accepted) CPython will have them. But these decisions should be made explicitly, and documented clearly. Ideally, we need a "Unicode czar" -- a core developer whose job it is to keep track of Python's compliance with various parts and versions of the Unicode standard and who can nudge other developers towards fixing bugs or implementing features, or update the documentation in case things don't get added. (I like Tom's approach to Java 1.7, where he submitted proposed doc fixes explaining the deviations from the standard. Perhaps a bit passive-aggressive, but it was effective. :-)
msg143113 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-08-28 17:58
> Ideally, we need a "Unicode czar" -- a core developer whose job it is > to keep track of Python's compliance with various parts and versions > of the Unicode standard and who can nudge other developers towards > fixing bugs or implementing features, or update the documentation in > case things don't get added. We should first do a full review of the latest Unicode standard and see what's missing. I think there might be parts of older Unicode versions (even < Unicode 5) that are not yet implemented. Chapter 3 is a good place where to start, but I'm not sure that's enough -- there are a few TRs that should be considered as well. If we manage to catch up with Unicode 6, then it shouldn't be too difficult to review the changes that every new version will introduce and open an issue for each (or a single issue if the changes are limited). FWIW I'm planning to look at the conformance of the UTF codecs and fix them (if necessary) whenever I'll have time.
msg144663 - (view)	Author: Ezio Melotti (ezio.melotti) *	Date: 2011-09-30 04:05
The failing re tests after PEP 393 are: FAIL lib re found non alphanumeric string 'cafe' FAIL lib re found non alphanumeric string 'Ⓚ' FAIL lib re found non alphanumeric string '' FAIL lib re found non alphanumeric string '' FAIL lib re found non alphanumeric string 'connector‿punctuation' FAIL lib re found non alphanumeric string 'Ὰ_Στο_Διάολο' FAIL lib re found non alphanumeric string '𐌰𐍄𐍄𐌰‿𐌿𐌽𐍃𐌰𐍂‿𐌸𐌿‿𐌹𐌽‿𐌷𐌹𐌼𐌹𐌽𐌰𐌼' FAIL lib re found all alphanumeric string '¹²³' FAIL lib re found all alphanumeric string '₁₂₃' FAIL lib re found all alphanumeric string '¼½¾' FAIL lib re found all alphanumeric string '⑶'
msg313850 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2018-03-15 00:33
Whatever I may have said before, I favor supporting the Unicode standard for \w, which is related to the standard for identifiers. This is one of 2 issues about \w being defined too narrowly. I am somewhat arbitrarily closing #1693050 as a duplicate of this (fewer digits ;-). There are 3 issues about tokenize.tokenize failing on valid identifiers, defined as \w sequences whose first char is an identifier itself (and therefore a start char). In msg313814 of #32987, Serhiy indicates which start and continue identifier characters are matched by \W for re and regex. I am leaving #24194 open as the tokenizer name issue.
msg334508 - (view)	Author: Henry S. Thompson (HThompson)	Date: 2019-01-29 10:10
This issue is also implicated in a failure of isalpha and friends. Easy way to see this is to compare >>> isalpha('İ') True >>> isalpha('İ'.lower()) False This results from the use of a combining character to encode lower-case Turkish dotted i: >>> len('İ'.lower()) 2 >>> unicodedata.category('İ'.lower()[1]) 'Mn'
msg361105 - (view)	Author: Henry S. Thompson (HThompson)	Date: 2020-01-31 13:31
[One year and 2 days later... :-[ Is this fixed in 3.9? If not, the Versions list above should be updated. The failure of lower() to preserve 'alpha-ness' is a serious bug, it causes significant failures in e.g. Turkish NLP, and it's _not_ just a failure of the documentation! Please can we move this to category Unicode and get at least this aspect of the problem fixed? Should I raise a separate issue on isalpha() etc.?

History
Date	User	Action	Args
2022-04-11 14:57:20	admin	set	github: 56940
2021-05-26 18:46:48	pitrou	set	stage: test needed -> needs patch versions: + Python 3.11, - Python 3.6, Python 3.7, Python 3.8
2020-02-03 08:34:10	vstinner	set	nosy: - vstinner
2020-01-31 14:05:49	terry.reedy	set	assignee: docs@python -> components: + Unicode, - Documentation nosy: + lemburg, benjamin.peterson, serhiy.storchaka
2020-01-31 13:31:30	HThompson	set	messages: + msg361105
2019-09-07 17:07:19	JustinTArthur	set	nosy: + JustinTArthur
2019-01-29 10:10:15	HThompson	set	nosy: + HThompson messages: + msg334508
2018-03-15 00:33:15	terry.reedy	set	stage: needs patch -> test needed messages: + msg313850 versions: + Python 3.6, Python 3.7, Python 3.8, - Python 2.7, Python 3.3, Python 3.4
2016-04-25 06:08:17	serhiy.storchaka	link	issue24194 dependencies
2013-07-10 19:11:02	terry.reedy	set	versions: + Python 3.4, - Python 3.2
2011-09-30 04:05:12	ezio.melotti	set	messages: + msg144663
2011-08-28 17:58:12	ezio.melotti	set	messages: + msg143113
2011-08-28 17:22:44	gvanrossum	set	messages: + msg143109
2011-08-28 06:37:44	ezio.melotti	set	messages: + msg143092
2011-08-28 06:26:11	ezio.melotti	set	messages: + msg143091
2011-08-26 21:22:26	gvanrossum	set	nosy: + gvanrossum messages: + msg143040
2011-08-15 10:30:03	ezio.melotti	set	messages: + msg142112
2011-08-13 17:36:36	pitrou	set	messages: + msg142030
2011-08-13 09:40:58	pitrou	set	nosy: + vstinner
2011-08-13 00:57:20	mrabarnett	set	nosy: + mrabarnett
2011-08-13 00:18:22	tchrist	set	messages: + msg142001
2011-08-12 22:46:48	terry.reedy	set	assignee: docs@python components: + Documentation versions: + Python 3.2, Python 3.3 nosy: + terry.reedy, pitrou, docs@python messages: + msg141993 stage: needs patch
2011-08-12 18:03:10	Arfrever	set	nosy: + Arfrever
2011-08-12 00:20:02	ezio.melotti	set	nosy: + ezio.melotti
2011-08-11 19:18:31	tchrist	create