Issue 10254: unicodedata.normalize('NFC', s) regression

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/54463

classification

Title:	unicodedata.normalize('NFC', s) regression
Type:	crash	Stage:	resolved
Components:	Unicode	Versions:	Python 3.1, Python 3.2, Python 2.7, Python 2.6

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	belopolsky	Nosy List:	Arfrever, barry, belopolsky, bictorman, ezio.melotti, jhalcrow, lemburg, loewis, pitrou, valhallasw, vstinner
Priority:	high	Keywords:	patch

Created on 2010-10-30 15:42 by valhallasw, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
crash.py	belopolsky, 2010-12-16 19:35
issue10254.diff	belopolsky, 2010-12-17 05:45
issue10254a.diff	belopolsky, 2010-12-20 19:50
issue10254b.diff	belopolsky, 2010-12-21 19:23
crash2.py	bictorman, 2011-09-22 21:30	reproduce Segmentation fault

Messages (29)
msg119995 - (view)	Author: Merlijn van Deen (valhallasw) *	Date: 2010-10-30 15:42
Summary: Somewhere between 2.6.5 r79063 and 3.1 r79147 a regression in the unicode NFC normalization has been introduces. This regression leads to bot edit wars on wikipedia [1]. It is reproducable with a simple script [2]. Mediawiki/PHP [3] and C# [4] test scripts both show the old behaviour, which leads me to believe this is a python bug. A search for older bugs shows bug #1054943 [5] which has commits in the suspected region. The regression causes certain NFC-normalized strings to become mangled. Because of the wide range of unicode strings on wikipedia, this causes several problems. Details of those can be found at [1]. Example strings include: (these strings have been NFC-normalized by mediawiki) * u'Li\u030dt-s\u1e73\u0301' * u'\u092e\u093e\u0930\u094d\u0915 \u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917' * u'\u0915\u093f\u0930\u094d\u0917\u093f\u091c\u093c\u0938\u094d\u0924\u093e\u0928' The bug can be shown simply with unicodedata.normalize('NFC', s) == s where s is one of the strings above. This will return True on older python versions, False on newer versions. There is a script available that does this [2]. The bug has been tested on the following machines and python versions. OK indicates the bug is not present, FAIL indicates the bug is present. Host: SunOS willow 5.10 Generic_142910-17 i86pc i386 i86pc Solaris '2.3.3 (#1, Dec 16 2004, 14:38:56) [C]' OK '2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C]' OK '2.7 (r27:82500, Aug 5 2010, 04:28:45) [C]' FAIL '3.1.2 (r312:79147, Sep 24 2010, 05:34:04) [C]' FAIL Host: Linux nightshade 2.6.26-2-amd64 #1 SMP Thu Sep 16 15:56:38 UTC 2010 x86_64 GNU/Linux '2.4.6 (#2, Jan 24 2010, 12:20:41) \n[GCC 4.3.2]' OK '2.5.2 (r252:60911, Jan 24 2010, 17:44:40) \n[GCC 4.3.2]' OK '2.6.4+ (r264:75706, Feb 16 2010, 05:11:28) \n[GCC 4.4.3]' OK Host: Linux dorthonion 2.6.22.18-co-0.7.4 #1 PREEMPT Wed Apr 15 18:57:39 UTC 2009 i686 GNU/Linux '2.5.4 (r254:67916, Jan 20 2010, 21:44:03) \n[GCC 4.3.3]' OK '2.6.2 (release26-maint, Apr 19 2009, 01:56:41) \n[GCC 4.3.3]' OK '3.0.1+ (r301:69556, Apr 15 2009, 15:59:22) \n[GCC 4.3.3]' OK [1] https://sourceforge.net/tracker/index.php?func=detail&aid=3081100&group_id=93107&atid=603138# ; http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysubmit&diff=57753004&oldid=57751674 [2] http://pastebin.ca/1977285 (py2.x), http://pastebin.ca/1977287 (py3.x) [3] http://pastebin.ca/1977292 (PHP, placed in http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/normal/), [4] http://pastebin.ca/1977261 (C#) [5] http://bugs.python.org/issue1054943#
msg119996 - (view)	Author: Merlijn van Deen (valhallasw) *	Date: 2010-10-30 15:44
Please note: The bug might very well be present in python 3.2 and 3.3. However, I do not have these versions installed, so I cannot confirm this.
msg119998 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-10-30 15:52
Confirmed on Python 3.2.
msg120018 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-10-30 21:01
The change from issue1054943 is indeed bogus. As written, the code will happily run over starters, even though a blocked start means that subsequent characters can't possibly be combinable. That way, the code manages to combine, in 'Li\u030dt-s\u1e73\u0301', the final U+0301 with the i - even though there are several starters in-between. I think the code should work like this: if comb!=0 and comb1==0: #starter after character with higher class: # not combinable, and all subsequent characters will be blocked # as well break if comb!=0 and comb1==comb: # blocked combining character, continue searching i1++ continue # candidate pair, check whether i and i1 are combinable It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore.
msg120026 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-10-30 22:35
Martin v. Löwis wrote: > It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore. Why not ? It looks a lot like a security fix.
msg120027 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-10-30 23:14
>> It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore. > > Why not ? It looks a lot like a security fix. Indeed, you could argue that. It's up to the 2.6 release manager, I guess.
msg124073 - (view)	Author: Jonathan Halcrow (jhalcrow)	Date: 2010-12-15 21:29
I think I've come across a related problem. I am experiencing a segfault when NFC-normalizing a certain string [1]. The crash occurs with 2.7.1 in OS X (built from source with homebrew). Here is the backtrace: #0 0x0025a96e in _PyUnicode_Resize () #1 0x00601673 in nfc_nfkc () #2 0x00601bb7 in unicodedata_normalize () #3 0x0029834b in PyEval_EvalFrameEx () #4 0x00299f13 in PyEval_EvalCodeEx () #5 0x0029a0fe in PyEval_EvalCode () #6 0x002bd5f0 in PyRun_FileExFlags () #7 0x002be430 in PyRun_SimpleFileExFlags () #8 0x002d5bd6 in Py_Main () #9 0x00001f8f in _start () #10 0x00001ebd in start () [1] http://pastebin.com/cfNd2QEz
msg124074 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-12-15 21:54
I can reproduce the crash under 2.7, but not 2.6 or 3.x here. So it might be a separate issue.
msg124075 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-12-15 22:18
After a bit of debugging, the crash is due to the "skipped" array being overflowed in nfc_nfkc() in unicodedata.c. "cskipped" goes up to 21 while the array only has 20 entries. This happens in all branches (but only crashes in 2.7 right now for probably unimportant reasons). And the problem was indeed introduced by Victor's patch in issue1054943. Just before, "cskipped" would only go up to 1.
msg124156 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-16 19:35
Adding an assert as shown in the diff below, makes it easy to reproduce the crash in py3k branch: $ ./python.exe crash.py Assertion failed: (cskipped < 20), function nfc_nfkc, file Modules/unicodedata.c, line 714. Abort trap I am attaching jhalcrow's code as crash.py =================================================================== --- Modules/unicodedata.c (revision 87322) +++ Modules/unicodedata.c (working copy) @@ -711,6 +711,7 @@ /* Replace the original character. / i = code; /* Mark the second character unused. / + assert(cskipped < 20); skipped[cskipped++] = i1; i1++; f = find_nfc_index(self, nfc_first, i);
msg124170 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-12-17 00:56
"Ooops", sorry. I just applied the patch suggested by Marc-Andre Lemburg in msg22885 (#1054943). As the patch worked for the examples given in Unicode PRI 29 and the test suite passed, it was enough for me. I don't understand the normalization code, so I don't know how to fix it.
msg124173 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-17 01:34
The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex. (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed. At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped. A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character. See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java. I'll try to come up with a patch.
msg124186 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-17 05:45
Attached patch, issue10254.diff, is essentially Martin's code from msg120018 and Part3 tests from NormalizationTest.txt. Since this bug exposes a buffer overflow condition, I think it qualifies as a security issue, so I am adding 2.6 to versions. Passing Part3 tests and not crashing on crash.py is probably good enough for a commit, but I don't have a proof that length 20 skipped buffer is always enough. As the next step, I would like to consider an alternative algorithm that would not require a "skipped" buffer.
msg124189 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-12-17 08:22
Am 17.12.2010 01:56, schrieb STINNER Victor: > > STINNER Victor <victor.stinner@haypocalc.com> added the comment: > > "Ooops", sorry. I just applied the patch suggested by Marc-Andre > Lemburg in msg22885 (#1054943). As the patch worked for the examples > given in Unicode PRI 29 and the test suite passed, it was enough for > me. I don't understand the normalization code, so I don't know how to > fix it. So lacking a new patch, I think we should revert the existing change for now.
msg124190 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-12-17 08:29
> So lacking a new patch, I think we should revert the existing change > for now. Oops, I missed that Alexander has proposed a patch.
msg124191 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-12-17 08:46
> The logic suggested by Martin in msg120018 looks right to me, but the > whole code seems to be unnecessarily complex. (And comb1==comb may > need to be changed to comb1>=comb.) I don't understand why linear > search through "skipped" array is needed. At the very least instead > of adding their positions to the "skipped" list, used combining > characters can be replaced by a non-character to be later skipped. The skipped array keeps track of what characters have been integrated into a base character, as they must not appear in the output. Assume you have a sequence B,C,N,C,N,B (B: base character, C: combined, N: not combined). You need to remember not to output C, whereas you still need to output N. I don't think replacing them with a non-character can work: which one would you chose (that cannot also appear in the input)? The worst case (wrt. cskipped) is the maximum number of characters that can get combined into a single base character. It used to be (and I hope still is) 20 (decomposition of U+FDFA).
msg124192 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-12-17 08:48
> Passing Part3 tests and not crashing on crash.py is probably good > enough for a commit, but I don't have a proof that length 20 skipped > buffer is always enough. I would agree with that. I still didn't have time to fully review the patch, but assuming it fixes the cases in msg119995, we should proceed with it.
msg124233 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-17 17:24
On Fri, Dec 17, 2010 at 3:47 AM, Martin v. Löwis <report@bugs.python.org> wrote: .. > The worst case (wrt. cskipped) is the maximum number of characters that > can get combined into a single base character. It used to be (and I > hope still is) 20 (decomposition of U+FDFA). > The C forms (NFC and NFKC) do canonical composition and U+FDFA is a compatibility composite. (BTW, makeunicodedata.py checks that maximum decomposed length of a character is < 19, but it would be better if it would compute and define a named constant, say MAXDLENGTH, to be used instead of literal 20.) As far as I (and a two-line script) can tell the maximum length of a canonical decomposition of a character is 4.
msg124249 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2010-12-17 19:08
> The C forms (NFC and NFKC) do canonical composition and U+FDFA is a > compatibility composite. (BTW, makeunicodedata.py checks that maximum > decomposed length of a character is < 19, but it would be better if it > would compute and define a named constant, say MAXDLENGTH, to be used > instead of literal 20.) As far as I (and a two-line script) can tell > the maximum length of a canonical decomposition of a character is 4. Even better - so allowing for 20 characters should be safe.
msg124251 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-17 19:17
On Fri, Dec 17, 2010 at 2:08 PM, Martin v. Löwis <report@bugs.python.org> wrote: .. >> As far as I (and a two-line script) can tell >> the maximum length of a canonical decomposition of a character is 4. > > Even better - so allowing for 20 characters should be safe. I don't disagree, but the number of "break" and "continue" statements before cskipped++ makes me nervous. This said, I am going to add test cases from the first post to test_unicodedata (I think it is a better place than test_normalise because the latter is skipped by default) and commit. Improving the algorithm is a separate issue.
msg124402 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-20 19:50
Attached patch, issue10254a.diff, adds the OP's cases to test_unicodedata and changes the code as I suggested in msg124173 because ISTM that comb >= comb1 matches the pr-29 definition: """ D2'. In any character sequence beginning with a starter S, a character C is blocked from S if and only if there is some character B between S and C, and either B is a starter or it has the same or higher combining class as C. """ http://www.unicode.org/review/pr-29.html Unfortunately, all tests pass with either comb >= comb1 or comb == comb1, so before I commit, I would like to figure out the test case that would properly exercise this code.
msg124417 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-21 05:11
On Mon, Dec 20, 2010 at 2:50 PM, Alexander Belopolsky <report@bugs.python.org> wrote: .. > Unfortunately, all tests pass with either comb >= comb1 or comb == comb1, so before > I commit, I would like to figure out the test case that would properly exercise this code. > After some more thought, I've realized that the comb > comb1 case is impossible if comb1 != 0 (due to canonical reordering step) and if comb1 == 0, the comb1 to comb comparison is not reached. In other words, it does not matter whether comparison is done as Martin suggested in msg120018 or as it is done in the latest patch. The fact that comb > comb1 case is impossible if comb1 != 0 is actually mentioned in PR 29 itself. See Table 1: Differences at http://www.unicode.org/review/pr-29.html.
msg124450 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-21 19:23
In the new patch, issue10254b.diff, I've added a test that would crash unpatched code: >>> unicodedata.normalize('NFC', 'C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸Ç') Segmentation fault Martin, I still feel uneasy about the fixed size of the skipped buffer. It is not obvious that skipped combining characters always get removed from the buffer before the next starter is processed. I would really like another pair of eyes to look at this code before it goes in especially to 2.6. Victor, IIRC, you did some stress testing on random data. I wonder if you could test this code after tightening the assert to cskipped < 4. (The current theory is that this should be enough.)
msg124530 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-23 02:31
Committed to py3k in revision 87442.
msg124800 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-12-28 16:16
Committed backports: r87540 (3.1) r87541 (2.7) r87546 (2.6)
msg144421 - (view)	Author: Victor Ruiz (bictorman)	Date: 2011-09-22 21:30
Hi, I think I've come across what seems to be another flavor of this issue. The following string will cause a crash in some interpreters. text = u"""\u062d\u064e\u064a\u0651\u064b\u0627\u060c\u0648\u064e\u064a\u064e\u062d\u0650\u0642\u0651\u064e \u0627\u0644\u0652\u0642\u064e\u0648\u0652\u0644\u064f \u0648\u064e\u0644\u0651\u064e\u064a\u0652\u062a\u064f\u0643\u064f\u0645\u064e\u0627\u060c \u0648\u064e\u0625\u0650\u0646\u0652 \u0623\u064e\u0628\u064e\u064a\u0652\u062a\u064f\u0645\u064e\u0627 \u0623\u064e\u0646\u0652 \u062a\u064f\u0642\u0650\u0631\u0651\u064e\u0627 \u0628\u0650\u0627\u0644\u0625\u0650\u0633\u0652\u0644\u0627\u064e\u0645\u0650 \u0641\u064e\u0625\u0650\u0646\u0651\u064e \u0648\u064e\u062e\u064e\u064a\u0652\u0644\u0650\u064a \u062a\u064e\u062d\u064f\u0644\u0651\u064f \u0628\u0650\u0633\u064e\u0627\u062d\u064e\u062a\u0650\u0643\u064f\u0645\u064e\u0627\u060c \u0648\u064e\u062a\u064e\u0638\u0652\u0647\u064e\u0631\u064f \u0646\u064f\u0628\u064f\u0648\u0651\u064e\u062a\u0650\u064a \u0645\u064f\u0644\u0652\u0643\u0650\u0643\u064f\u0645\u064e\u0627".\u0648\u0643\u062a\u0628 \u0623\u0628\u064a\u0651\u064f \u0628\u0646 \u0643\u0639\u0628 \u0627\u0644\u0652\u0631\u064e\u0651\u062d\u0650\u064a\u0652\u0645\u060c \u0645\u0650\u0646 \u0645\u064f\u062d\u064e\u0645\u064e\u0651\u062f \u0631\u064e\u0633\u064f\u0648\u0652\u0644 \u0627\u0644\u0652\u0644\u064e\u0651\u0647 \u0625\u0650\u0644\u064e\u0649 \u0627\u0644\u0652\u0645\u064f\u0646\u0652\u0630\u0650\u0631 \u0628\u0652\u0646 \u0633\u064e\u0627\u0648\u0650\u064a \u0633\u064e\u0644\u064e\u0627\u0645 \u0639\u064e\u0644\u064e\u064a\u0652\u0643 \u0641\u064e\u0625\u0650\u0646\u0650\u0651\u064a \u0623\u064e\u062d\u0652\u0645\u064e\u062f \u0627\u0644\u0652\u0644\u064e\u0651\u0647 \u0625\u0650\u0644\u064e\u064a\u0652\u0643 \u0627\u0644\u064e\u0651\u0630\u0650\u064a\u0644\u064e\u0627 \u0625\u0650\u0644\u064e\u0647 \u063a\u064e\u064a\u0652\u0631\u064f\u0647 \u0648\u064e\u0623\u064e\u0634\u0652\u0647\u064e\u062f \u0623\u064e\u0646 \u0644\u064e\u0627 \u0625\u0650\u0644\u064e\u0647 \u0625\u0650\u0644\u064e\u0651\u0627 \u0627\u0644\u0652\u0644\u064e\u0651\u0647 """ There is a sample script attached. This issue does not seem to be related to the python version itself but rather to its compilation. Since the exact same version crashes in OSX but not Ubuntu linux for example. ERROR -> Python 2.7.1 (r271:86832, Apr 9 2011, 17:12:59) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin OK -> Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2 Default version 2.6.6 on Debian squeeze should crash too for example. This is a trace of the error in 2.7.1 OSX (this interpreter passes the test posted on msg124450): Process: Python [78170] Path: /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python Identifier: Python Version: ??? (???) Code Type: X86-64 (Native) Parent Process: bash [77126] Date/Time: 2011-09-22 23:20:48.892 +0200 OS Version: Mac OS X 10.6.8 (10K549) Report Version: 6 Interval Since Last Report: 88509 sec Crashes Since Last Report: 135 Per-App Crashes Since Last Report: 134 Anonymous UUID: F5DD44CE-A8F4-474C-BA10-2B21B4C92C1E Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: 0x000000000000000d, 0x0000000000000000 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Thread 0 Crashed: Dispatch queue: com.apple.main-thread 0 org.python.python 0x0000000100086b33 _PyUnicode_Resize + 51 1 unicodedata.so 0x0000000100601bff nfc_nfkc + 335 2 unicodedata.so 0x0000000100601f2a unicodedata_normalize + 154 3 org.python.python 0x00000001000bfccd PyEval_EvalFrameEx + 20797 4 org.python.python 0x00000001000c1f16 PyEval_EvalCodeEx + 2118 5 org.python.python 0x00000001000c2036 PyEval_EvalCode + 54 6 org.python.python 0x00000001000e6a5e PyRun_FileExFlags + 174 7 org.python.python 0x00000001000e6d19 PyRun_SimpleFileExFlags + 489 8 org.python.python 0x00000001000fd6fc Py_Main + 2940 9 org.python.python 0x0000000100000f14 0x100000000 + 3860 Thread 0 crashed with X86 Thread State (64-bit): rax: 0x0644062700200627 rbx: 0x0000000100373d9c rcx: 0x000000000000003c rdx: 0x000000000000000a rdi: 0x00007fff5fbff078 rsi: 0x0000000080169ba9 rbp: 0x00007fff5fbfefa0 rsp: 0x00007fff5fbfef80 r8: 0x000000000000004e r9: 0x000000000000000a r10: 0x0000000100373db8 r11: 0x0000000100373dac r12: 0x00007fff5fbff078 r13: 0x0000000080169ba9 r14: 0x0000000080169ba9 r15: 0x00000000000000a1 rip: 0x0000000100086b33 rfl: 0x0000000000010206 cr2: 0x000000010066a2f4 Binary Images: 0x100000000 - 0x100000fff +org.python.python 2.7.1 (2.7.1) <751B99F0-4C88-4BF6-C8CD-AE2D57E4254B> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python 0x100003000 - 0x100163fff +org.python.python 2.7.1, (c) 2004-2008 Python Software Foundation. (2.7.1) <E8E430DD-D33C-646E-7B3B-0C2A84996ED5> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Python 0x100600000 - 0x100694fff +unicodedata.so ??? (???) <F56B51DE-1895-5149-B1D2-8DC0F84C7447> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/unicodedata.so 0x7fff5fc00000 - 0x7fff5fc3bdef dyld 132.1 (???) <B536F2F1-9DF1-3B6C-1C2C-9075EA219A06> /usr/lib/dyld 0x7fff83094000 - 0x7fff8314aff7 libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <03140531-3B2D-1EBA-DA7F-E12CC8F63969> /usr/lib/libobjc.A.dylib 0x7fff85ddf000 - 0x7fff85de3ff7 libmathCommon.A.dylib 315.0.0 (compatibility 1.0.0) <95718673-FEEE-B6ED-B127-BCDBDB60D4E5> /usr/lib/system/libmathCommon.A.dylib 0x7fff8642d000 - 0x7fff8643eff7 libz.1.dylib 1.2.3 (compatibility 1.0.0) <FB5EE53A-0534-0FFA-B2ED-486609433717> /usr/lib/libz.1.dylib 0x7fff8674c000 - 0x7fff868c3fe7 com.apple.CoreFoundation 6.6.5 (550.43) <31A1C118-AD96-0A11-8BDF-BD55B9940EDC> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation 0x7fff86f60000 - 0x7fff8711efff libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <4274FC73-A257-3A56-4293-5968F3428854> /usr/lib/libicucore.A.dylib 0x7fff88cd0000 - 0x7fff88e91fef libSystem.B.dylib 125.2.11 (compatibility 1.0.0) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib 0x7fff88eb7000 - 0x7fff88f34fef libstdc++.6.dylib 7.9.0 (compatibility 7.0.0) <35ECA411-2C08-FD7D-11B1-1B7A04921A5C> /usr/lib/libstdc++.6.dylib 0x7fff8a2d8000 - 0x7fff8a324fff libauto.dylib ??? (???) <F7221B46-DC4F-3153-CE61-7F52C8C293CF> /usr/lib/libauto.dylib 0x7fffffe00000 - 0x7fffffe01fff libSystem.B.dylib ??? (???) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib Model: MacBookPro5,4, BootROM MBP53.00AC.B03, 2 processors, Intel Core 2 Duo, 2.53 GHz, 4 GB, SMC 1.49f2 Graphics: NVIDIA GeForce 9400M, NVIDIA GeForce 9400M, PCI, 256 MB Memory Module: global_name AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x8D), Broadcom BCM43xx 1.0 (5.10.131.42.4) Bluetooth: Version 2.4.5f3, 2 service, 12 devices, 1 incoming serial ports Network Service: AirPort, AirPort, en1 Serial ATA Device: ST9320423ASG, 298,09 GB Serial ATA Device: MATSHITADVD-R UJ-868 USB Device: Internal Memory Card Reader, 0x05ac (Apple Inc.), 0x8403, 0x26500000 / 2 USB Device: Built-in iSight, 0x05ac (Apple Inc.), 0x8507, 0x24400000 / 2 USB Device: IR Receiver, 0x05ac (Apple Inc.), 0x8242, 0x04500000 / 3 USB Device: Apple Internal Keyboard / Trackpad, 0x05ac (Apple Inc.), 0x0236, 0x04600000 / 2 USB Device: BRCM2046 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0x06100000 / 2 USB Device: Bluetooth USB Host Controller, 0x05ac (Apple Inc.), 0x8213, 0x06110000 / 5
msg144426 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2011-09-22 22:07
This new data does not crash Python 2.7.2, so I assume the issue has been fixed. Re-closing.
msg144428 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-09-22 22:31
"This new data does not crash Python 2.7.2, so I assume the issue has been fixed." Yes, the bug was already fixed in branch 2.7 by the SVN commit r87541: changeset: 67185:54f1d5651555 branch: 2.7 parent: 67159:2d09af4c137c user: Alexander Belopolsky <alexander.belopolsky@gmail.com> date: Tue Dec 28 15:47:56 2010 +0000 files: Lib/test/test_normalization.py Lib/test/test_unicodedata.py Modules/unicodedata.c description: Merged revisions 87442 via svnmerge from svn+ssh://pythondev@svn.python.org/python/branches/py3k ........ r87442 \| alexander.belopolsky \| 2010-12-22 21:27:37 -0500 (Wed, 22 Dec 2010) \| 1 line Issue #10254: Fixed a crash and a regression introduced by the implementation of PRI 29. ........ This fix is part of Python 2.7.2, but not of 2.7.2.
msg144429 - (view)	Author: STINNER Victor (vstinner) *	Date: 2011-09-22 22:46
"This fix is part of Python 2.7.2, but not of 2.7.2." ... but not of 2.7.1.

History
Date	User	Action	Args
2022-04-11 14:57:08	admin	set	github: 54463
2011-09-22 22:46:21	vstinner	set	messages: + msg144429
2011-09-22 22:31:22	vstinner	set	messages: + msg144428
2011-09-22 22:07:10	belopolsky	set	status: open -> closed messages: + msg144426
2011-09-22 21:38:21	belopolsky	set	status: closed -> open
2011-09-22 21:30:24	bictorman	set	files: + crash2.py nosy: + bictorman messages: + msg144421
2010-12-28 16:16:25	belopolsky	set	status: open -> closed versions: + Python 3.2 nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124800 resolution: fixed stage: commit review -> resolved
2010-12-23 02:31:58	belopolsky	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124530 versions: - Python 3.2
2010-12-21 19:23:57	belopolsky	set	files: + issue10254b.diff nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124450
2010-12-21 05:11:38	belopolsky	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124417
2010-12-20 19:50:25	belopolsky	set	files: + issue10254a.diff nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124402
2010-12-17 19:17:47	belopolsky	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124251
2010-12-17 19:08:06	loewis	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124249
2010-12-17 17:24:52	belopolsky	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124233
2010-12-17 08:48:37	loewis	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124192
2010-12-17 08:46:58	loewis	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124191
2010-12-17 08:29:37	loewis	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124190
2010-12-17 08:22:56	loewis	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124189
2010-12-17 05:45:39	belopolsky	set	files: + issue10254.diff versions: + Python 2.6 nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124186 keywords: + patch stage: commit review
2010-12-17 01:34:49	belopolsky	set	assignee: belopolsky messages: + msg124173 nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
2010-12-17 00:56:17	vstinner	set	nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124170
2010-12-16 19:35:39	belopolsky	set	files: + crash.py nosy: + belopolsky messages: + msg124156
2010-12-15 22:18:30	pitrou	set	priority: normal -> high type: behavior -> crash messages: + msg124075 nosy: lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
2010-12-15 21:54:25	pitrou	set	nosy: lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw messages: + msg124074
2010-12-15 21:29:49	jhalcrow	set	nosy: + jhalcrow messages: + msg124073
2010-10-31 23:11:18	ezio.melotti	set	nosy: + ezio.melotti
2010-10-31 12:01:03	Arfrever	set	nosy: + Arfrever
2010-10-31 03:50:02	r.david.murray	set	nosy: + barry
2010-10-30 23:14:55	loewis	set	messages: + msg120027
2010-10-30 22:35:32	lemburg	set	nosy: + lemburg messages: + msg120026
2010-10-30 21:01:19	loewis	set	messages: + msg120018
2010-10-30 15:52:43	pitrou	set	nosy: + loewis, vstinner, pitrou messages: + msg119998 versions: + Python 3.2
2010-10-30 15:44:35	valhallasw	set	messages: + msg119996
2010-10-30 15:42:13	valhallasw	create