Message 141988 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	tchrist
Recipients	Arfrever, ezio.melotti, lemburg, loewis, pitrou, tchrist, terry.reedy
Date	2011-08-12.20:09:30
SpamBayes Score	5.551115e-17
Marked as misclassified	No
Message-id	<25262.1313179759@chthon>
In-reply-to	<1313177329.06.0.777738734656.issue12728@psf.upfronthosting.co.za>

Content
> Terry J. Reedy <tjreedy@udel.edu> added the comment: > I am not sure that everyone will agree that this is a bug, rather than a fe= > ature request, or that if a bug, that it should be changed in existing rele= > ases and possibly break running code. The doc just says, somewhat vaguely, = > that IGNORECASE "works for Unicode characters as expected". I have added ot= > hers as nosy for their opinions. Working as expected for Unicode characters means it must the Unicode's rules for casefolding. Otherwise you don't have Unicode at all; you just have ISO 10646. Unicode is not merely a larger character repertoire; again, that is merely ISO 10646. Unicode is all about the rules for processing this larger repertoire. This is a very common mistake, so common that it is in the Unicode FAQ: Q: What is the relation between ISO/IEC 10646 and Unicode? A: In 1991, the ISO Working Group responsible for ISO/IEC 10646 (JTC 1/SC 2/WG 2) and the Unicode Consortium decided to create one universal standard for coding multilingual text. Since then, the ISO 10646 Working Group (SC 2/WG 2) and the Unicode Consortium have worked together very closely to extend the standard and to keep their respective versions synchronized. [EH] Q: So are they the same thing? A: No. Although the character codes and encoding forms are synchronized between Unicode and ISO/IEC 10646, the Unicode Standard imposes additional constraints on implementations to ensure that they treat characters uniformly across platforms and applications. To this end, it supplies an extensive set of functional character specifications, character data, algorithms and substantial background material that is not in ISO/IEC 10646. http://unicode.org/faq/unicode_iso.html Part of those functional character specifications can be found in the three casefolding fields of the file UnicodeData.txt and also in two auxiliary files of the Unicode distribution, CaseFolding.txt and SpecialCasing.txt. The Unicode Character Database is not optional. If you do not use it, you do not have Unicode; instead you merely have ISO 10646, which is of zero practical use to anyone compared with Unicode. I'm sure that Python would not want to be stuck having something of no use to anyone when everyone else actually supports Unicode. One is not allowed to make up one's own rules that run counter to Unicode's and still make the claim that one is working on Unicode, since that is in fact not what one is doing. Based on all that, Python does not do case insensitive matching on Unicode, a condition contrary to its documented claims. That clearly makes it a bug that needs fixing rather than a feature request to be summarily ignored. > The test file should have omitted the gratuitous and distracting warnings, = > especially the one that effectively scolds Windows users for running Window= > s. With those omitted, the test cases given would form the basis for an add= > ed TestCase. I have absolutely no idea what on earth you could possibly be referring to. Honestly. I ran my tests on both releases (2.7 and 3.2), on both builds (wide and narrow), and on both platforms (Unix and Mac). The warnings are in there so I can make sure I have everything set up correctly to run the tests, and will understand why I get more failures than expected in the event that things are not set up appropriately. Let me make perfectly clear that I have never in my life come anywhere near a Microsoft system, let alone touched one, and that I furthermore never shall. I have not the foggiest notion what in the world you are complaining about. If the problem is that you are for some reason unable to create a Python with full Unicode support under Microsoft, that is hardly my fault. Render unto Caesar that which is Caesar's: complain to Microsoft about Microsoft's bugs, not to me, as I am wholly blameless of their problems. If you don't like my test cases, you know where to find vi. I supposed I could always send you the program that writes these programs for me, but as I knew you won't like it, I withheld it. You already have all that you need to see exactly where the bugs are and how to fix them. --tom

> Terry J. Reedy <tjreedy@udel.edu> added the comment:

> I am not sure that everyone will agree that this is a bug, rather than a fe=
> ature request, or that if a bug, that it should be changed in existing rele=
> ases and possibly break running code. The doc just says, somewhat vaguely, =
> that IGNORECASE "works for Unicode characters as expected". I have added ot=
> hers as nosy for their opinions.

Working as expected for Unicode characters means it must the Unicode's
rules for casefolding.  Otherwise you don't have Unicode at all; you just 
have ISO 10646.  Unicode is not merely a larger character repertoire; again,
that is merely ISO 10646.  Unicode is all about the rules for processing this
larger repertoire.  This is a very common mistake, so common that it is in the 
Unicode FAQ:

    Q: What is the relation between ISO/IEC 10646 and Unicode?

    A: In 1991, the ISO Working Group responsible for ISO/IEC 10646 (JTC
       1/SC 2/WG 2) and the Unicode Consortium decided to create one
       universal standard for coding multilingual text. Since then, the
       ISO 10646 Working Group (SC 2/WG 2) and the Unicode Consortium
       have worked together very closely to extend the standard and to
       keep their respective versions synchronized. [EH]

    Q: So are they the same thing?

    A: No. Although the character codes and encoding forms are
       synchronized between Unicode and ISO/IEC 10646, the Unicode
       Standard imposes additional constraints on implementations to
       ensure that they treat characters uniformly across platforms and
       applications. To this end, it supplies an extensive set of
       functional character specifications, character data, algorithms
       and substantial background material that is *not* in ISO/IEC 10646.

    http://unicode.org/faq/unicode_iso.html

Part of those functional character specifications can be found in the three
casefolding fields of the file UnicodeData.txt and also in two auxiliary
files of the Unicode distribution, CaseFolding.txt and SpecialCasing.txt.
The Unicode Character Database is not optional.  If you do not use it, you
do not have Unicode; instead you merely have ISO 10646, which is of zero
practical use to anyone compared with Unicode.  I'm sure that Python would
not want to be stuck having something of no use to anyone when everyone
else actually supports Unicode.

One is not allowed to make up one's own rules that run counter to Unicode's
and still make the claim that one is working on Unicode, since that is in
fact not what one is doing.  Based on all that, Python does not do case
insensitive matching on Unicode, a condition contrary to its documented
claims.  That clearly makes it a bug that needs fixing rather than a 
feature request to be summarily ignored.

> The test file should have omitted the gratuitous and distracting warnings, =
> especially the one that effectively scolds Windows users for running Window=
> s. With those omitted, the test cases given would form the basis for an add=
> ed TestCase.

I have absolutely no idea what on earth you could possibly be referring to.
Honestly.  I ran my tests on both releases (2.7 and 3.2), on both builds
(wide and narrow), and on both platforms (Unix and Mac).  The warnings are
in there so I can make sure I have everything set up correctly to run the 
tests, and will understand why I get more failures than expected in the event 
that things are not set up appropriately.

Let me make perfectly clear that I have never in my life come anywhere near a
Microsoft system, let alone touched one, and that I furthermore never shall.  
I have not the foggiest notion what in the world you are complaining about.
If the problem is that you are for some reason unable to create a Python with
full Unicode support under Microsoft, that is hardly my fault.   Render unto
Caesar that which is Caesar's: complain to Microsoft about Microsoft's bugs,
not to me, as I am wholly blameless of their problems.

If you don't like my test cases, you know where to find vi.  

I supposed I could always send you the program that writes these programs
for me, but as I knew you won't like it, I withheld it.  You already have
all that you need to see exactly where the bugs are and how to fix them.

--tom

History
Date	User	Action	Args
2011-08-12 20:09:32	tchrist	set	recipients: + tchrist, lemburg, loewis, terry.reedy, pitrou, ezio.melotti, Arfrever
2011-08-12 20:09:32	tchrist	link	issue12728 messages
2011-08-12 20:09:30	tchrist	create