Message 178451 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	terry.reedy
Recipients	William.D..Colburn, ezio.melotti, terry.reedy
Date	2012-12-29.01:04:22
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1356743063.48.0.446174480546.issue16783@psf.upfronthosting.co.za>
In-reply-to

Content
Opening a duplicate issue to rant against the developers is not responsible behavior. Since you do not seem to understand Martin's 2.x solution, ask for help on python-list or elsewhere (and read below). The proper fix for multiple Unicode and text coding problems was and is to use Unicode for text, as we did and do in 3.x. Note that while we link to sqlite3 with a Python interface, and choose that as the database to link to in the stdlib, we do not control sqlite3 itself. As documented and as Martin wrote, sqlite assumes, by default, that byte-encoded text handed to it is error-free utf-8 encoded. However, docs and Martin both say that you can override that assumption by replacing its text_factory. Sqlite should not reject any bytes because anything could be just what the use intended. The problem of multiple byte encodings for text and of encoding info getting separated from encoded bytes is a general one. We constantly get questions on python-list like "how do I determine the real encoding of a web page if the encoding information is missing or wrong". We are doing our part to solve it by using unicode for text and pushing utf-8 as the one, true encoding that everyone should use whenever possible. If you need more explanation, try python-list, as I said before.

Opening a duplicate issue to rant against the developers is not responsible behavior. Since you do not seem to understand Martin's 2.x solution, ask for help on python-list or elsewhere (and read below). The proper fix for multiple Unicode and text coding problems was and is to use Unicode for text, as we did and do in 3.x.

Note that while we link to sqlite3 with a Python interface, and choose that as the database to link to in the stdlib, we do not control sqlite3 itself. As documented and as Martin wrote, sqlite *assumes*, by default, that byte-encoded text handed to it is error-free utf-8 encoded. However, docs and Martin both say that you can override that assumption by replacing its text_factory. Sqlite should not reject *any* bytes because anything *could* be just what the use intended.

The problem of multiple byte encodings for text and of encoding info getting separated from encoded bytes is a general one. We constantly get questions on python-list like "how do I determine the real encoding of a web page if the encoding information is missing or wrong". We are doing our part to solve it by using unicode for text and pushing utf-8 as the one, true encoding that everyone should use whenever possible.

If you need more explanation, try python-list, as I said before.

History
Date	User	Action	Args
2012-12-29 01:04:23	terry.reedy	set	recipients: + terry.reedy, ezio.melotti, William.D..Colburn
2012-12-29 01:04:23	terry.reedy	set	messageid: <1356743063.48.0.446174480546.issue16783@psf.upfronthosting.co.za>
2012-12-29 01:04:23	terry.reedy	link	issue16783 messages
2012-12-29 01:04:22	terry.reedy	create