Author terry.reedy
Recipients William.D..Colburn, ezio.melotti, terry.reedy
Date 2012-12-29.01:04:22
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1356743063.48.0.446174480546.issue16783@psf.upfronthosting.co.za>
In-reply-to
Content
Opening a duplicate issue to rant against the developers is not responsible behavior. Since you do not seem to understand Martin's 2.x solution, ask for help on python-list or elsewhere (and read below). The proper fix for multiple Unicode and text coding problems was and is to use Unicode for text, as we did and do in 3.x.

Note that while we link to sqlite3 with a Python interface, and choose that as the database to link to in the stdlib, we do not control sqlite3 itself. As documented and as Martin wrote, sqlite *assumes*, by default, that byte-encoded text handed to it is error-free utf-8 encoded. However, docs and Martin both say that you can override that assumption by replacing its text_factory. Sqlite should not reject *any* bytes because anything *could* be just what the use intended.

The problem of multiple byte encodings for text and of encoding info getting separated from encoded bytes is a general one. We constantly get questions on python-list like "how do I determine the real encoding of a web page if the encoding information is missing or wrong". We are doing our part to solve it by using unicode for text and pushing utf-8 as the one, true encoding that everyone should use whenever possible.

If you need more explanation, try python-list, as I said before.
History
Date User Action Args
2012-12-29 01:04:23terry.reedysetrecipients: + terry.reedy, ezio.melotti, William.D..Colburn
2012-12-29 01:04:23terry.reedysetmessageid: <1356743063.48.0.446174480546.issue16783@psf.upfronthosting.co.za>
2012-12-29 01:04:23terry.reedylinkissue16783 messages
2012-12-29 01:04:22terry.reedycreate