Message 152441 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	petri.lehtinen
Recipients	petri.lehtinen, pitrou
Date	2012-02-01.20:57:36
SpamBayes Score	1.153555e-11
Marked as misclassified	No
Message-id	<1328129858.32.0.164852943165.issue13921@psf.upfronthosting.co.za>
In-reply-to

Content
Connection.text_factory can be used to control what objects are returned for the TEXT data type. An excerpt from the docs: For efficiency reasons, there’s also a way to return str objects only for non-ASCII data, and bytes otherwise. To activate it, set this attribute to sqlite3.OptimizedUnicode. However, it always returns Unicode strings now. There's even a test for this feature which is obviously wrong: def CheckOptimizedUnicode(self): self.con.text_factory = sqlite.OptimizedUnicode austria = "Österreich" germany = "Deutchland" a_row = self.con.execute("select ?", (austria,)).fetchone() d_row = self.con.execute("select ?", (germany,)).fetchone() self.assertTrue(type(a_row[0]) == str, "type of non-ASCII row must be str") self.assertTrue(type(d_row[0]) == str, "type of ASCII-only row must be str") It checks for str in both cases even though it should test for bytes in the latter case. --- The user can get bytes if he wants to by saying so explicitly. Having the library mix bytes and unicode by itself makes it harder for the user. Furthermore, I don't really buy the "efficiency" reason here, so I'd vote for removing the whole OptimizeUnicode thing. It has never worked for Py3k so it would be safe.

Connection.text_factory can be used to control what objects are
returned for the TEXT data type. An excerpt from the docs:

    For efficiency reasons, there’s also a way to return str
    objects only for non-ASCII data, and bytes otherwise. To
    activate it, set this attribute to sqlite3.OptimizedUnicode.

However, it always returns Unicode strings now. There's even a
test for this feature which is obviously wrong:

    def CheckOptimizedUnicode(self):
        self.con.text_factory = sqlite.OptimizedUnicode
        austria = "Österreich"
        germany = "Deutchland"
        a_row = self.con.execute("select ?", (austria,)).fetchone()
        d_row = self.con.execute("select ?", (germany,)).fetchone()
        self.assertTrue(type(a_row[0]) == str, "type of non-ASCII row must be str")
        self.assertTrue(type(d_row[0]) == str, "type of ASCII-only row must be str")

It checks for str in both cases even though it should test for
bytes in the latter case.

---

The user can get bytes if he wants to by saying so explicitly.
Having the library mix bytes and unicode by itself makes it
harder for the user. Furthermore, I don't really buy
the "efficiency" reason here, so I'd vote for removing the whole
OptimizeUnicode thing. It has never worked for Py3k so it would
be safe.

History
Date	User	Action	Args
2012-02-01 20:57:38	petri.lehtinen	set	recipients: + petri.lehtinen, pitrou
2012-02-01 20:57:38	petri.lehtinen	set	messageid: <1328129858.32.0.164852943165.issue13921@psf.upfronthosting.co.za>
2012-02-01 20:57:37	petri.lehtinen	link	issue13921 messages
2012-02-01 20:57:36	petri.lehtinen	create