classification
Title: unable to retrieve latin-1 encoded data from sqlite3
Type: behavior Stage:
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: not a bug
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, ezio.melotti, izarf, jcea, loewis
Priority: normal Keywords:

Created on 2009-05-13 10:35 by izarf, last changed 2012-12-31 15:06 by jcea. This issue is now closed.

Messages (4)
msg87672 - (view) Author: (izarf) Date: 2009-05-13 10:35
it is impossible to retrieve a latin-1 encoded string from sqlite3 database.

How to:
1. create a new db.
2. create a new table with text field.
3. insert a row with data like "åäö"

4. select all rows from table
5. write:
     for data in cursor1:
           var = data

you will now get an error explaining something like ascii couldn't decode
msg87682 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2009-05-13 12:21
Confirmed here, tested with python2.6 
on Linux with sys.stding.encoding == 'ISO-8859-1'
on Windows with sys.stdin.encoding == 'cp437'

>>> import sqlite3
>>> db = sqlite3.connect(':memory:')
>>> cur = db.cursor()
>>> cur.execute("create table foo (x)")
>>> cur.execute("insert into foo values ('café')")
>>> cur.execute("select * from foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
sqlite3.OperationalError: Could not decode to UTF-8 column 'x' with text
'café'


It seems that sqlite3 expects strings to be utf-8 encoded.
It works fine if you pass unicode strings, and with python 3.0.
msg87699 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2009-05-13 18:37
This is not a bug. By default, pysqlite decodes all strings to Unicode,
assuming UTF-8 encoding (which SQLite assumes when parsing statements).

To override this default, you need to change the connection's text_factory:

py> import sqlite3
py> db = sqlite3.connect(':memory:')
py> db.text_factory = str
py> cur = db.cursor()
py> cur.execute("create table foo (x)")
<sqlite3.Cursor object at 0xb7cfb500>
py> cur.execute("insert into foo values ('café')")
<sqlite3.Cursor object at 0xb7cfb500>
py> cur.execute("select * from foo")
<sqlite3.Cursor object at 0xb7cfb500>
py> _.fetchall()
[('caf\xe9',)]
msg178584 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-12-30 16:23
While trying to reproduce the issue I noticed this while inserting values:

import sqlite3
db = sqlite3.connect(':memory:')
cur = db.cursor()
cur.execute("create table foo (x)")
# this works fine
cur.execute(u"insert into foo values ('café')".encode('latin1'))
# this fails
cur.execute(u"insert into foo values (?)", (u'café'.encode('latin1'),))
# this fails too
cur.execute("insert into foo values (?)", (u'café'.encode('latin1'),))

The error is:
 sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings.

Should this be reported in the first case too? (This would be backward-incompatible, but, unless it's expected to work, we can always add a warning.)
History
Date User Action Args
2012-12-31 15:06:28jceasetnosy: + jcea
2012-12-30 16:32:42ezio.melottilinkissue16783 superseder
2012-12-30 16:23:25ezio.melottisetversions: + Python 2.7, - Python 2.5
nosy: + ezio.melotti

messages: + msg178584

components: + Library (Lib), - None
2009-05-13 18:37:32loewissetstatus: open -> closed

nosy: + loewis
messages: + msg87699

resolution: not a bug
2009-05-13 12:21:23amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg87682
2009-05-13 10:35:06izarfcreate