classification
Title: unicodedata.normalize('NFC', s) regression
Type: crash Stage: resolved
Components: Unicode Versions: Python 3.1, Python 3.2, Python 2.7, Python 2.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: Arfrever, barry, belopolsky, bictorman, ezio.melotti, jhalcrow, lemburg, loewis, pitrou, valhallasw, vstinner
Priority: high Keywords: patch

Created on 2010-10-30 15:42 by valhallasw, last changed 2011-09-22 22:46 by vstinner. This issue is now closed.

Files
File name Uploaded Description Edit
crash.py belopolsky, 2010-12-16 19:35
issue10254.diff belopolsky, 2010-12-17 05:45
issue10254a.diff belopolsky, 2010-12-20 19:50
issue10254b.diff belopolsky, 2010-12-21 19:23
crash2.py bictorman, 2011-09-22 21:30 reproduce Segmentation fault
Messages (29)
msg119995 - (view) Author: Merlijn van Deen (valhallasw) * Date: 2010-10-30 15:42
Summary: Somewhere between 2.6.5 r79063 and 3.1 r79147 a regression in the unicode NFC normalization has been introduces. This regression leads to bot edit wars on wikipedia [1]. It is reproducable with a simple script [2]. Mediawiki/PHP [3] and C# [4] test scripts both show the old behaviour, which leads me to believe this is a python bug.
A search for older bugs shows bug #1054943 [5] which has commits in the suspected region.

The regression causes certain NFC-normalized strings to become mangled. Because of the wide range of unicode strings on wikipedia, this causes several problems. Details of those can be found at [1].

Example strings include: (these strings have been NFC-normalized by mediawiki)
 * u'Li\u030dt-s\u1e73\u0301'
 * u'\u092e\u093e\u0930\u094d\u0915 \u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917'
 * u'\u0915\u093f\u0930\u094d\u0917\u093f\u091c\u093c\u0938\u094d\u0924\u093e\u0928'

The bug can be shown simply with
unicodedata.normalize('NFC', s) == s
where s is one of the strings above. This will return True on older python versions, False on newer versions. There is a script available that does this [2].

The bug has been tested on the following machines and python versions. OK indicates the bug is not present, FAIL indicates the bug is present.

Host: SunOS willow 5.10 Generic_142910-17 i86pc i386 i86pc Solaris
'2.3.3 (#1, Dec 16 2004, 14:38:56) [C]' OK
'2.6.5 (r265:79063, Jul 10 2010, 17:50:38) [C]' OK
'2.7 (r27:82500, Aug  5 2010, 04:28:45) [C]' FAIL
'3.1.2 (r312:79147, Sep 24 2010, 05:34:04) [C]' FAIL

Host: Linux nightshade 2.6.26-2-amd64 #1 SMP Thu Sep 16 15:56:38 UTC 2010 x86_64 GNU/Linux
'2.4.6 (#2, Jan 24 2010, 12:20:41) \n[GCC 4.3.2]' OK
'2.5.2 (r252:60911, Jan 24 2010, 17:44:40) \n[GCC 4.3.2]' OK
'2.6.4+ (r264:75706, Feb 16 2010, 05:11:28) \n[GCC 4.4.3]' OK

Host: Linux dorthonion 2.6.22.18-co-0.7.4 #1 PREEMPT Wed Apr 15 18:57:39 UTC 2009 i686 GNU/Linux
'2.5.4 (r254:67916, Jan 20 2010, 21:44:03) \n[GCC 4.3.3]' OK
'2.6.2 (release26-maint, Apr 19 2009, 01:56:41) \n[GCC 4.3.3]' OK
'3.0.1+ (r301:69556, Apr 15 2009, 15:59:22) \n[GCC 4.3.3]' OK

[1] https://sourceforge.net/tracker/index.php?func=detail&aid=3081100&group_id=93107&atid=603138# ; http://fr.wikipedia.org/w/index.php?title=Mark_Zuckerberg&action=historysubmit&diff=57753004&oldid=57751674
[2] http://pastebin.ca/1977285 (py2.x), http://pastebin.ca/1977287 (py3.x)
[3] http://pastebin.ca/1977292 (PHP, placed in http://svn.wikimedia.org/svnroot/mediawiki/trunk/phase3/normal/), 
[4] http://pastebin.ca/1977261 (C#)
[5] http://bugs.python.org/issue1054943#
msg119996 - (view) Author: Merlijn van Deen (valhallasw) * Date: 2010-10-30 15:44
Please note: The bug might very well be present in python 3.2 and 3.3. However, I do not have these versions installed, so I cannot confirm this.
msg119998 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-10-30 15:52
Confirmed on Python 3.2.
msg120018 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-30 21:01
The change from issue1054943 is indeed bogus. As written, the code will happily run over starters, even though a blocked start means that subsequent characters can't possibly be combinable. That way, the code manages to combine, in 'Li\u030dt-s\u1e73\u0301', the final U+0301 with the i - even though there are several starters in-between.

I think the code should work like this:

if comb!=0 and comb1==0:
  #starter after character with higher class:
  # not combinable, and all subsequent characters will be blocked
  # as well
  break
if comb!=0 and comb1==comb:
  # blocked combining character, continue searching
  i1++
  continue
# candidate pair, check whether *i and *i1 are combinable

It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore.
msg120026 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2010-10-30 22:35
Martin v. Löwis wrote:
> It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore.

Why not ? It looks a lot like a security fix.
msg120027 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-10-30 23:14
>> It's unfortunate that the patch had been backported to 2.6.6; we can't fix it there anymore.
> 
> Why not ? It looks a lot like a security fix.

Indeed, you could argue that. It's up to the 2.6 release manager, I guess.
msg124073 - (view) Author: Jonathan Halcrow (jhalcrow) Date: 2010-12-15 21:29
I think I've come across a related problem.  I am experiencing a segfault when NFC-normalizing a certain string [1].
The crash occurs with 2.7.1 in OS X (built from source with homebrew).   

Here is the backtrace:
#0  0x0025a96e in _PyUnicode_Resize ()
#1  0x00601673 in nfc_nfkc ()
#2  0x00601bb7 in unicodedata_normalize ()
#3  0x0029834b in PyEval_EvalFrameEx ()
#4  0x00299f13 in PyEval_EvalCodeEx ()
#5  0x0029a0fe in PyEval_EvalCode ()
#6  0x002bd5f0 in PyRun_FileExFlags ()
#7  0x002be430 in PyRun_SimpleFileExFlags ()
#8  0x002d5bd6 in Py_Main ()
#9  0x00001f8f in _start ()
#10 0x00001ebd in start ()


[1] http://pastebin.com/cfNd2QEz
msg124074 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-15 21:54
I can reproduce the crash under 2.7, but not 2.6 or 3.x here. So it might be a separate issue.
msg124075 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2010-12-15 22:18
After a bit of debugging, the crash is due to the "skipped" array being overflowed in nfc_nfkc() in unicodedata.c. "cskipped" goes up to 21 while the array only has 20 entries. This happens in all branches (but only crashes in 2.7 right now for probably unimportant reasons).

And the problem was indeed introduced by Victor's patch in issue1054943. Just before, "cskipped" would only go up to 1.
msg124156 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-16 19:35
Adding an assert as shown in the diff below, makes it easy to reproduce the crash in py3k branch:

$ ./python.exe  crash.py
Assertion failed: (cskipped < 20), function nfc_nfkc, file Modules/unicodedata.c, line 714.
Abort trap

I am attaching jhalcrow's code as crash.py 

===================================================================
--- Modules/unicodedata.c	(revision 87322)
+++ Modules/unicodedata.c	(working copy)
@@ -711,6 +711,7 @@
           /* Replace the original character. */
           *i = code;
           /* Mark the second character unused. */
+          assert(cskipped < 20);
           skipped[cskipped++] = i1;
           i1++;
           f = find_nfc_index(self, nfc_first, *i);
msg124170 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-17 00:56
"Ooops", sorry. I just applied the patch suggested by Marc-Andre Lemburg in msg22885 (#1054943). As the patch worked for the examples given in Unicode PRI 29 and the test suite passed, it was enough for me. I don't understand the normalization code, so I don't know how to fix it.
msg124173 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-17 01:34
The logic suggested by Martin in msg120018 looks right to me, but the whole code seems to be unnecessarily complex.  (And comb1==comb may need to be changed to comb1>=comb.) I don't understand why linear search through "skipped" array is needed.  At the very least instead of adding their positions to the "skipped" list, used combining characters can be replaced by a non-character to be later skipped.  A better algorithm should be able to avoid the whole issue of "skipping" by properly computing the length of the decomposed character.  See internalCompose() at http://www.unicode.org/reports/tr15/Normalizer.java.

I'll try to come up with a patch.
msg124186 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-17 05:45
Attached patch, issue10254.diff, is essentially Martin's code from msg120018 and Part3 tests from NormalizationTest.txt.

Since this bug exposes a buffer overflow condition, I think it qualifies as a security issue, so I am adding 2.6 to versions.

Passing Part3 tests and not crashing on crash.py is probably good enough for a commit, but I don't have a proof that length 20 skipped buffer is always enough.  As the next step, I would like to consider an alternative algorithm that would not require a "skipped" buffer.
msg124189 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-17 08:22
Am 17.12.2010 01:56, schrieb STINNER Victor:
> 
> STINNER Victor <victor.stinner@haypocalc.com> added the comment:
> 
> "Ooops", sorry. I just applied the patch suggested by Marc-Andre
> Lemburg in msg22885 (#1054943). As the patch worked for the examples
> given in Unicode PRI 29 and the test suite passed, it was enough for
> me. I don't understand the normalization code, so I don't know how to
> fix it.

So lacking a new patch, I think we should revert the existing change
for now.
msg124190 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-17 08:29
> So lacking a new patch, I think we should revert the existing change
> for now.

Oops, I missed that Alexander has proposed a patch.
msg124191 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-17 08:46
> The logic suggested by Martin in msg120018 looks right to me, but the
> whole code seems to be unnecessarily complex.  (And comb1==comb may
> need to be changed to comb1>=comb.) I don't understand why linear
> search through "skipped" array is needed.  At the very least instead
> of adding their positions to the "skipped" list, used combining
> characters can be replaced by a non-character to be later skipped.

The skipped array keeps track of what characters have been integrated
into a base character, as they must not appear in the output.
Assume you have a sequence B,C,N,C,N,B (B: base character, C: combined,
N: not combined). You need to remember not to output C, whereas you
still need to output N. I don't think replacing them with a
non-character can work: which one would you chose (that cannot also
appear in the input)?

The worst case (wrt. cskipped) is the maximum number of characters that
can get combined into a single base character. It used to be (and I
hope still is) 20 (decomposition of U+FDFA).
msg124192 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-17 08:48
> Passing Part3 tests and not crashing on crash.py is probably good
> enough for a commit, but I don't have a proof that length 20 skipped
> buffer is always enough.

I would agree with that. I still didn't have time to fully review the
patch, but assuming it fixes the cases in msg119995, we should proceed
with it.
msg124233 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-17 17:24
On Fri, Dec 17, 2010 at 3:47 AM, Martin v. Löwis <report@bugs.python.org> wrote:
..
> The worst case (wrt. cskipped) is the maximum number of characters that
> can get combined into a single base character. It used to be (and I
> hope still is) 20 (decomposition of U+FDFA).
>

The C forms (NFC and NFKC) do canonical composition and U+FDFA is a
compatibility composite. (BTW, makeunicodedata.py checks that maximum
decomposed length of a character is < 19, but it would be better if it
would compute and define a named constant, say MAXDLENGTH, to be used
instead of literal 20.)  As far as I (and a two-line script) can tell
the maximum length of a canonical decomposition of a character is 4.
msg124249 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-17 19:08
> The C forms (NFC and NFKC) do canonical composition and U+FDFA is a
> compatibility composite. (BTW, makeunicodedata.py checks that maximum
> decomposed length of a character is < 19, but it would be better if it
> would compute and define a named constant, say MAXDLENGTH, to be used
> instead of literal 20.)  As far as I (and a two-line script) can tell
> the maximum length of a canonical decomposition of a character is 4.

Even better - so allowing for 20 characters should be safe.
msg124251 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-17 19:17
On Fri, Dec 17, 2010 at 2:08 PM, Martin v. Löwis <report@bugs.python.org> wrote:
..
>> As far as I (and a two-line script) can tell
>> the maximum length of a canonical decomposition of a character is 4.
>
> Even better - so allowing for 20 characters should be safe.

I don't disagree, but the number of "break" and "continue" statements
before cskipped++ makes me nervous.  This said, I am going to  add
test cases from the first post to test_unicodedata (I think it is a
better place than test_normalise because the latter is skipped by
default) and commit.

Improving the algorithm is a separate issue.
msg124402 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-20 19:50
Attached patch, issue10254a.diff, adds the OP's cases to test_unicodedata and changes the code as I suggested in msg124173 because ISTM that comb >= comb1 matches the pr-29 definition:

"""
D2'. In any character sequence beginning with a starter S, a character C is blocked from S if and only if there is some character B between S and C, and either B is a starter or it has the same or higher combining class as C.
""" http://www.unicode.org/review/pr-29.html

Unfortunately, all tests pass with either comb >= comb1 or comb == comb1, so before I commit, I would like to figure out the test case that would properly exercise this code.
msg124417 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-21 05:11
On Mon, Dec 20, 2010 at 2:50 PM, Alexander Belopolsky
<report@bugs.python.org> wrote:
..
> Unfortunately, all tests pass with either comb >= comb1 or comb == comb1, so before
> I commit, I would like to figure out the test case that would properly exercise this code.
>

After some more thought, I've realized that the comb > comb1 case is
impossible if comb1 != 0 (due to canonical reordering step) and if
comb1 == 0, the comb1 to comb comparison is not reached.  In other
words, it does not matter whether comparison is done as Martin
suggested in msg120018 or as it is done in the latest patch.  The fact
that comb > comb1 case is impossible if comb1 != 0 is actually
mentioned in PR 29 itself.  See Table 1: Differences at
http://www.unicode.org/review/pr-29.html.
msg124450 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-21 19:23
In the new patch, issue10254b.diff, I've added a test that would crash unpatched code:

>>> unicodedata.normalize('NFC', 'C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸C̸Ç')
Segmentation fault

Martin, I still feel uneasy about the fixed size of the skipped buffer.  It is not obvious that skipped combining characters always get removed from the buffer before the next starter is processed.

I would really like another pair of eyes to look at this code before it goes in especially to 2.6.

Victor,

IIRC, you did some stress testing on random data.  I wonder if you could test this code after tightening the assert to cskipped < 4.  (The current theory is that this should be enough.)
msg124530 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-23 02:31
Committed to py3k in revision 87442.
msg124800 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-12-28 16:16
Committed backports:

r87540 (3.1)
r87541 (2.7)
r87546 (2.6)
msg144421 - (view) Author: Victor Ruiz (bictorman) Date: 2011-09-22 21:30
Hi,

I think I've come across what seems to be another flavor of this issue. The  following string will cause a crash in some interpreters.

text = u"""\u062d\u064e\u064a\u0651\u064b\u0627\u060c\u0648\u064e\u064a\u064e\u062d\u0650\u0642\u0651\u064e \u0627\u0644\u0652\u0642\u064e\u0648\u0652\u0644\u064f
\u0648\u064e\u0644\u0651\u064e\u064a\u0652\u062a\u064f\u0643\u064f\u0645\u064e\u0627\u060c \u0648\u064e\u0625\u0650\u0646\u0652 \u0623\u064e\u0628\u064e\u064a\u0652\u062a\u064f\u0645\u064e\u0627 
\u0623\u064e\u0646\u0652 \u062a\u064f\u0642\u0650\u0631\u0651\u064e\u0627 \u0628\u0650\u0627\u0644\u0625\u0650\u0633\u0652\u0644\u0627\u064e\u0645\u0650 \u0641\u064e\u0625\u0650\u0646\u0651\u064e
\u0648\u064e\u062e\u064e\u064a\u0652\u0644\u0650\u064a \u062a\u064e\u062d\u064f\u0644\u0651\u064f \u0628\u0650\u0633\u064e\u0627\u062d\u064e\u062a\u0650\u0643\u064f\u0645\u064e\u0627\u060c 
\u0648\u064e\u062a\u064e\u0638\u0652\u0647\u064e\u0631\u064f \u0646\u064f\u0628\u064f\u0648\u0651\u064e\u062a\u0650\u064a 
\u0645\u064f\u0644\u0652\u0643\u0650\u0643\u064f\u0645\u064e\u0627".\u0648\u0643\u062a\u0628 \u0623\u0628\u064a\u0651\u064f \u0628\u0646 \u0643\u0639\u0628 
\u0627\u0644\u0652\u0631\u064e\u0651\u062d\u0650\u064a\u0652\u0645\u060c \u0645\u0650\u0646 \u0645\u064f\u062d\u064e\u0645\u064e\u0651\u062f \u0631\u064e\u0633\u064f\u0648\u0652\u0644 
\u0627\u0644\u0652\u0644\u064e\u0651\u0647 \u0625\u0650\u0644\u064e\u0649 \u0627\u0644\u0652\u0645\u064f\u0646\u0652\u0630\u0650\u0631 \u0628\u0652\u0646 \u0633\u064e\u0627\u0648\u0650\u064a
\u0633\u064e\u0644\u064e\u0627\u0645 \u0639\u064e\u0644\u064e\u064a\u0652\u0643 \u0641\u064e\u0625\u0650\u0646\u0650\u0651\u064a \u0623\u064e\u062d\u0652\u0645\u064e\u062f \u0627\u0644\u0652\u0644\u064e\u0651\u0647
\u0625\u0650\u0644\u064e\u064a\u0652\u0643 \u0627\u0644\u064e\u0651\u0630\u0650\u064a\u0644\u064e\u0627 \u0625\u0650\u0644\u064e\u0647 \u063a\u064e\u064a\u0652\u0631\u064f\u0647 
\u0648\u064e\u0623\u064e\u0634\u0652\u0647\u064e\u062f \u0623\u064e\u0646 \u0644\u064e\u0627 \u0625\u0650\u0644\u064e\u0647 \u0625\u0650\u0644\u064e\u0651\u0627 \u0627\u0644\u0652\u0644\u064e\u0651\u0647
"""

There is a sample script attached. This issue does not seem to be related to the python version itself but rather to its compilation. Since the exact same version crashes in OSX but not Ubuntu linux for example.

ERROR -> Python 2.7.1 (r271:86832, Apr 9 2011, 17:12:59) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin
OK -> Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2

Default version 2.6.6 on Debian squeeze should crash too for example.

This is a trace of the error in 2.7.1 OSX (this interpreter passes the test posted on msg124450):

Process:         Python [78170]
Path:            /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Identifier:      Python
Version:         ??? (???)
Code Type:       X86-64 (Native)
Parent Process:  bash [77126]

Date/Time:       2011-09-22 23:20:48.892 +0200
OS Version:      Mac OS X 10.6.8 (10K549)
Report Version:  6

Interval Since Last Report:          88509 sec
Crashes Since Last Report:           135
Per-App Crashes Since Last Report:   134
Anonymous UUID:                      F5DD44CE-A8F4-474C-BA10-2B21B4C92C1E

Exception Type:  EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: 0x000000000000000d, 0x0000000000000000
Crashed Thread:  0  Dispatch queue: com.apple.main-thread

Thread 0 Crashed:  Dispatch queue: com.apple.main-thread
0   org.python.python             	0x0000000100086b33 _PyUnicode_Resize + 51
1   unicodedata.so                	0x0000000100601bff nfc_nfkc + 335
2   unicodedata.so                	0x0000000100601f2a unicodedata_normalize + 154
3   org.python.python             	0x00000001000bfccd PyEval_EvalFrameEx + 20797
4   org.python.python             	0x00000001000c1f16 PyEval_EvalCodeEx + 2118
5   org.python.python             	0x00000001000c2036 PyEval_EvalCode + 54
6   org.python.python             	0x00000001000e6a5e PyRun_FileExFlags + 174
7   org.python.python             	0x00000001000e6d19 PyRun_SimpleFileExFlags + 489
8   org.python.python             	0x00000001000fd6fc Py_Main + 2940
9   org.python.python             	0x0000000100000f14 0x100000000 + 3860

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0644062700200627  rbx: 0x0000000100373d9c  rcx: 0x000000000000003c  rdx: 0x000000000000000a
  rdi: 0x00007fff5fbff078  rsi: 0x0000000080169ba9  rbp: 0x00007fff5fbfefa0  rsp: 0x00007fff5fbfef80
   r8: 0x000000000000004e   r9: 0x000000000000000a  r10: 0x0000000100373db8  r11: 0x0000000100373dac
  r12: 0x00007fff5fbff078  r13: 0x0000000080169ba9  r14: 0x0000000080169ba9  r15: 0x00000000000000a1
  rip: 0x0000000100086b33  rfl: 0x0000000000010206  cr2: 0x000000010066a2f4

Binary Images:
       0x100000000 -        0x100000fff +org.python.python 2.7.1 (2.7.1) <751B99F0-4C88-4BF6-C8CD-AE2D57E4254B> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
       0x100003000 -        0x100163fff +org.python.python 2.7.1, (c) 2004-2008 Python Software Foundation. (2.7.1) <E8E430DD-D33C-646E-7B3B-0C2A84996ED5> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Python
       0x100600000 -        0x100694fff +unicodedata.so ??? (???) <F56B51DE-1895-5149-B1D2-8DC0F84C7447> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/unicodedata.so
    0x7fff5fc00000 -     0x7fff5fc3bdef  dyld 132.1 (???) <B536F2F1-9DF1-3B6C-1C2C-9075EA219A06> /usr/lib/dyld
    0x7fff83094000 -     0x7fff8314aff7  libobjc.A.dylib 227.0.0 (compatibility 1.0.0) <03140531-3B2D-1EBA-DA7F-E12CC8F63969> /usr/lib/libobjc.A.dylib
    0x7fff85ddf000 -     0x7fff85de3ff7  libmathCommon.A.dylib 315.0.0 (compatibility 1.0.0) <95718673-FEEE-B6ED-B127-BCDBDB60D4E5> /usr/lib/system/libmathCommon.A.dylib
    0x7fff8642d000 -     0x7fff8643eff7  libz.1.dylib 1.2.3 (compatibility 1.0.0) <FB5EE53A-0534-0FFA-B2ED-486609433717> /usr/lib/libz.1.dylib
    0x7fff8674c000 -     0x7fff868c3fe7  com.apple.CoreFoundation 6.6.5 (550.43) <31A1C118-AD96-0A11-8BDF-BD55B9940EDC> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
    0x7fff86f60000 -     0x7fff8711efff  libicucore.A.dylib 40.0.0 (compatibility 1.0.0) <4274FC73-A257-3A56-4293-5968F3428854> /usr/lib/libicucore.A.dylib
    0x7fff88cd0000 -     0x7fff88e91fef  libSystem.B.dylib 125.2.11 (compatibility 1.0.0) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib
    0x7fff88eb7000 -     0x7fff88f34fef  libstdc++.6.dylib 7.9.0 (compatibility 7.0.0) <35ECA411-2C08-FD7D-11B1-1B7A04921A5C> /usr/lib/libstdc++.6.dylib
    0x7fff8a2d8000 -     0x7fff8a324fff  libauto.dylib ??? (???) <F7221B46-DC4F-3153-CE61-7F52C8C293CF> /usr/lib/libauto.dylib
    0x7fffffe00000 -     0x7fffffe01fff  libSystem.B.dylib ??? (???) <9AB4F1D1-89DC-0E8A-DC8E-A4FE4D69DB69> /usr/lib/libSystem.B.dylib

Model: MacBookPro5,4, BootROM MBP53.00AC.B03, 2 processors, Intel Core 2 Duo, 2.53 GHz, 4 GB, SMC 1.49f2
Graphics: NVIDIA GeForce 9400M, NVIDIA GeForce 9400M, PCI, 256 MB
Memory Module: global_name
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x8D), Broadcom BCM43xx 1.0 (5.10.131.42.4)
Bluetooth: Version 2.4.5f3, 2 service, 12 devices, 1 incoming serial ports
Network Service: AirPort, AirPort, en1
Serial ATA Device: ST9320423ASG, 298,09 GB
Serial ATA Device: MATSHITADVD-R   UJ-868
USB Device: Internal Memory Card Reader, 0x05ac  (Apple Inc.), 0x8403, 0x26500000 / 2
USB Device: Built-in iSight, 0x05ac  (Apple Inc.), 0x8507, 0x24400000 / 2
USB Device: IR Receiver, 0x05ac  (Apple Inc.), 0x8242, 0x04500000 / 3
USB Device: Apple Internal Keyboard / Trackpad, 0x05ac  (Apple Inc.), 0x0236, 0x04600000 / 2
USB Device: BRCM2046 Hub, 0x0a5c  (Broadcom Corp.), 0x4500, 0x06100000 / 2
USB Device: Bluetooth USB Host Controller, 0x05ac  (Apple Inc.), 0x8213, 0x06110000 / 5
msg144426 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2011-09-22 22:07
This new data does not crash Python 2.7.2, so I assume the issue has been fixed.  Re-closing.
msg144428 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-22 22:31
"This new data does not crash Python 2.7.2, so I assume the issue has been fixed."

Yes, the bug was already fixed in branch 2.7 by the SVN commit r87541:

changeset:   67185:54f1d5651555
branch:      2.7
parent:      67159:2d09af4c137c
user:        Alexander Belopolsky <alexander.belopolsky@gmail.com>
date:        Tue Dec 28 15:47:56 2010 +0000
files:       Lib/test/test_normalization.py Lib/test/test_unicodedata.py Modules/unicodedata.c
description:
Merged revisions 87442 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/branches/py3k

........
  r87442 | alexander.belopolsky | 2010-12-22 21:27:37 -0500 (Wed, 22 Dec 2010) | 1 line

  Issue #10254: Fixed a crash and a regression introduced by the implementation of PRI 29.
........

This fix is part of Python 2.7.2, but not of 2.7.2.
msg144429 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-09-22 22:46
"This fix is part of Python 2.7.2, but not of 2.7.2."

... but not of 2.7.1.
History
Date User Action Args
2011-09-22 22:46:21vstinnersetmessages: + msg144429
2011-09-22 22:31:22vstinnersetmessages: + msg144428
2011-09-22 22:07:10belopolskysetstatus: open -> closed

messages: + msg144426
2011-09-22 21:38:21belopolskysetstatus: closed -> open
2011-09-22 21:30:24bictormansetfiles: + crash2.py
nosy: + bictorman
messages: + msg144421

2010-12-28 16:16:25belopolskysetstatus: open -> closed
versions: + Python 3.2
nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124800

resolution: fixed
stage: commit review -> resolved
2010-12-23 02:31:58belopolskysetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124530
versions: - Python 3.2
2010-12-21 19:23:57belopolskysetfiles: + issue10254b.diff
nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124450
2010-12-21 05:11:38belopolskysetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124417
2010-12-20 19:50:25belopolskysetfiles: + issue10254a.diff
nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124402
2010-12-17 19:17:47belopolskysetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124251
2010-12-17 19:08:06loewissetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124249
2010-12-17 17:24:52belopolskysetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124233
2010-12-17 08:48:37loewissetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124192
2010-12-17 08:46:58loewissetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124191
2010-12-17 08:29:37loewissetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124190
2010-12-17 08:22:56loewissetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124189
2010-12-17 05:45:39belopolskysetfiles: + issue10254.diff
versions: + Python 2.6
nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124186

keywords: + patch
stage: commit review
2010-12-17 01:34:49belopolskysetassignee: belopolsky
messages: + msg124173
nosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
2010-12-17 00:56:17vstinnersetnosy: lemburg, loewis, barry, belopolsky, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124170
2010-12-16 19:35:39belopolskysetfiles: + crash.py
nosy: + belopolsky
messages: + msg124156

2010-12-15 22:18:30pitrousetpriority: normal -> high

type: behavior -> crash
messages: + msg124075
nosy: lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
2010-12-15 21:54:25pitrousetnosy: lemburg, loewis, barry, pitrou, vstinner, ezio.melotti, Arfrever, jhalcrow, valhallasw
messages: + msg124074
2010-12-15 21:29:49jhalcrowsetnosy: + jhalcrow
messages: + msg124073
2010-10-31 23:11:18ezio.melottisetnosy: + ezio.melotti
2010-10-31 12:01:03Arfreversetnosy: + Arfrever
2010-10-31 03:50:02r.david.murraysetnosy: + barry
2010-10-30 23:14:55loewissetmessages: + msg120027
2010-10-30 22:35:32lemburgsetnosy: + lemburg
messages: + msg120026
2010-10-30 21:01:19loewissetmessages: + msg120018
2010-10-30 15:52:43pitrousetnosy: + loewis, vstinner, pitrou

messages: + msg119998
versions: + Python 3.2
2010-10-30 15:44:35valhallaswsetmessages: + msg119996
2010-10-30 15:42:13valhallaswcreate