Issue 1076790: test test_codecs failed

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/41262

classification

Title:	test test_codecs failed
Type:	behavior	Stage:	test needed
Components:	Extension Modules, Unicode	Versions:	Python 2.6

process

Status:	closed	Resolution:	wont fix
Dependencies:		Superseder:	remove --with-wctype-functions configure option View: 9210
Assigned To:		Nosy List:	BreamoreBoy, amaury.forgeotdarc, ezio.melotti, lemburg, nijel, pierre42, pitrou
Priority:	normal	Keywords:	patch

Created on 2004-12-01 14:41 by nijel, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
Python-2.4-wctype.patch	nijel, 2004-12-03 11:46

Messages (31)
msg23431 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 14:41
test test_codecs failed -- Traceback (most recent call last): File "/usr/src/packages/BUILD/Python-2.4/Lib/test/test_codecs.py", line 446, in test_nameprep raise test_support.TestFailed("Test 3.%d: %s" % (pos+1, str(e))) TestFailed: Test 3.5: u'\u0143 \u03b9' != u'\u0144 \u03b9'
msg23432 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-01 14:53
Logged In: YES user_id=38388 Please make sure that Python is picking up the correct modules. You can do so, buy running Python in verbose mode (python -vv).
msg23433 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 14:59
Logged In: YES user_id=192186 It's clean build root with no other python, so it has no chance to pickup bad modules.
msg23434 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 15:26
Logged In: YES user_id=192186 System information: i386 kernel 2.6.8 glibc 2.3.3
msg23435 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-01 16:15
Logged In: YES user_id=38388 The tests pass just fine on my machine. Is it possible that your compiler is broken ? gcc 2.3.3 is very old !
msg23436 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-01 16:16
Logged In: YES user_id=38388 Sorry: I misread glibc as gcc. Still, this sounds a lot like a broken compiler. BTW, are you building a UCS4 version ?
msg23437 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 16:21
Logged In: YES user_id=192186 gcc (GCC) 3.3.4 (pre 3.3.5 20040809) Yes, I'm building UCS4 version.
msg23438 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 16:32
Logged In: YES user_id=192186 The problem seems to be in glibc, when I remove --with-wctype-functions, it passes. Or could it be in Python interface to wctype functions?
msg23439 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-01 17:20
Logged In: YES user_id=38388 Ah, now I understand: it is well possible that the Unicode database versions differ. Python uses version 3.2. Do you know which version glibc 2.3.3 uses ? Note that for portability it is usually better not to use wctype functions.
msg23440 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 17:29
Logged In: YES user_id=192186 I'm not sure what means "uses", but I found several mentions of Unicode 3.2 in code and in changelogs.
msg23441 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-01 18:33
Logged In: YES user_id=38388 The wctype functions must have been built using tables from the Unicode code point database. Python's own APIs for this were built using the Unicode DB 3.2. My question is whether you know which version the glibc was built from. It is not surprising that the two tests fail if the underlying Unicode DB versions differ.
msg23442 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-01 18:37
Logged In: YES user_id=192186 I understand the question, but I have no idea how to find this information inside glibc.
msg23443 - (view)	Author: Pierre (pierre42)	Date: 2004-12-01 21:30
Logged In: YES user_id=512388 I have the same problem
msg23444 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-02 11:07
Logged In: YES user_id=192186 Well, glibc 2.3.3 is reportedly using Unicode DB 3.2, so there must be either bug in it or in Python, I can't tell. Any idea how to find out?
msg23445 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-02 15:40
Logged In: YES user_id=38388 Do you get the same error when compiling without --with-wctype-functions ? If not, then we'll just have to close this report as "won't fix" - the reason is that we as Python developers don't have control over what glibc does or does not do. Unfortunately, there's not way to disable the failing tests since the configure option is not available to the Python program.
msg23446 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-02 16:03
Logged In: YES user_id=192186 Compiling without --with-wctype-functions "fixes" this problem. I still don't see what has wctype functions to do with this. They are used for operations like is this numeric, alphanumeric, upper,... I'd like to trace this bug either it is in Python or glibc, but I still don't know what of glibc functions do influence this test.
msg23447 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-02 16:23
Logged In: YES user_id=38388 The punycode codec uses the .upper() method on Unicode objects. Since this method uses Py_UNICODE_TOUPPER(), any difference in case mapping between the Unicode DB used in Python and the one used in glibc will be noticable as a result of --with-wctype-functions.
msg23448 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-02 16:38
Logged In: YES user_id=192186 I tried towupper and towupper functions for all characters in failed test and I can see no difference comared to python ones...
msg23449 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-03 10:43
Logged In: YES user_id=38388 Maybe you should add some hooks to the Py_UNICODE_* macros and recompile (or run the script in a C debugger). The difference in output is minimal (\u0143 vs. \u0144) which I believe hints at a change in the used Unicode DB: 0143;LATIN CAPITAL LETTER N WITH ACUTE;Lu;0;L;004E 0301;;;;N;LATIN CAPITAL LETTER N ACUTE;;;0144; 0144;LATIN SMALL LETTER N WITH ACUTE;Ll;0;L;006E 0301;;;;N;LATIN SMALL LETTER N ACUTE;;0143;;0143 The only difference here is the case.
msg23450 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-03 11:03
Logged In: YES user_id=192186 However when I make simple C program containing: s = 0x143; printf("%lc %lc %lc\n", s, towupper(s), towlower(s)); s = 0x144; printf("%lc %lc %lc\n", s, towupper(s), towlower(s)); I get expected results and they're same as from python code: s =u'\u0143' print '%s %s %s' % (s, s.upper(), s.lower()) s =u'\u0144' print '%s %s %s' % (s, s.upper(), s.lower()) I'm starting to thing that it might be something with locales, I'll investigate it more.
msg23451 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-03 11:19
Logged In: YES user_id=38388 Could you run this test (comparing lower and upper) for all code points in the range(sys.maxunicode) ?! The origin of the problem could be a different code point. I don't think that it has to do with locale (but you never know...), since Unicode is all about unifying locales. The C functions should not be locale aware (even though the man page says it depends on LC_CTYPE).
msg23452 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-03 11:46
Logged In: YES user_id=192186 Okay, it IS locales problem. You should trust man page :-), calling towupper/towlower without set locales (or with POSIX locales) gives wrong result. After applying attached patch, all problems in tests are gone.
msg23453 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-03 12:16
Logged In: YES user_id=38388 Thanks for the patch. I see a few problems with this approach, though: * We brake binary compatibility depending on the configure settings used for building Python; if this is really necessary we should place the changes into the _PyUnicode_ToLowerCase() et al. APIs defined in unicodectype.c * I'm not sure whether there is any performance or memory usage win in using the wctype functions from glibc: the Unicode type mapping DB table has to be included anyway (due to the title case mapping), so the only win I could see is a performance one and given that towlower et al. do seem to be locale aware I have strong doubts that these functions are actually faster than the lookup in our own database. Could you check whether using the wctype functions from glibc does have any effect on size of the interpreter and performance of e.g. .lower() and .upper() ? If not, I'm inclined to remove the wctype function support altogether.
msg23454 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-03 13:13
Logged In: YES user_id=192186 After talk to glibc developer: towlower/towupper will never work as expected with POSIX/C locales (because anything besides a-z is not alpha character for these). I can give some performace results, but even without tests, it looks to me like good idea to drop support for this.
msg23455 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-03 13:26
Logged In: YES user_id=192186 without wctype: 100x test_codecs: 10.209s, libpython size: 1140098 with wctype: 100x test_codecs: 10.120s (removed one failing test), libpython size: 1140314
msg23456 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2004-12-03 14:50
Logged In: YES user_id=38388 Thanks for the tests. Looks to me as if the trouble of keeping the wctype support and working around quirks with the locales is not worth it. I think it's better to remove the support altogether and stick with the builtin type database.
msg23457 - (view)	Author: Michal Čihař (nijel) *	Date: 2004-12-03 14:55
Logged In: YES user_id=192186 I agree with removing it, however I'm not the one who could decide :-)
msg110029 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-07-11 17:36
Is there any point in leaving this open as the patch is against Python 2.4? How many times has test_codecs been successfully run against various Python versions since the patch was produced 6 1/2 years ago?
msg110050 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-07-11 21:26
The OP compiled python with --with-wctype-functions, and the libc wctype functions work differently depending on the locale. I suggest closing this issue as "won't fix", and favor the removal of the "--with-wctype-functions" option proposed in issue9210.
msg110102 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2010-07-12 15:51
Amaury's suggestion sounds good to me.
msg110167 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2010-07-13 09:59
Amaury Forgeot d'Arc wrote: > > Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment: > > The OP compiled python with --with-wctype-functions, and the libc wctype functions work differently depending on the locale. > I suggest closing this issue as "won't fix", and favor the removal of the "--with-wctype-functions" option proposed in issue9210. +1

History
Date	User	Action	Args
2022-04-11 14:56:08	admin	set	github: 41262
2010-07-13 09:59:14	lemburg	set	messages: + msg110167
2010-07-12 15:51:03	pitrou	set	status: pending -> closed nosy: + pitrou messages: + msg110102 superseder: remove --with-wctype-functions configure option
2010-07-11 21:26:17	amaury.forgeotdarc	set	status: open -> pending nosy: + amaury.forgeotdarc messages: + msg110050 resolution: wont fix
2010-07-11 17:36:26	BreamoreBoy	set	nosy: + BreamoreBoy messages: + msg110029
2010-07-11 17:19:12	ezio.melotti	set	nosy: + ezio.melotti
2009-02-15 22:34:33	ajaksu2	set	stage: test needed type: behavior components: + Extension Modules, Unicode, - Library (Lib) versions: + Python 2.6, - Python 2.4
2007-09-20 04:55:22	brett.cannon	set	keywords: + patch
2004-12-01 14:41:24	nijel	create