classification
Title: sqlite3: Zero byte truncates string contents
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2, Python 3.3, Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ghaering, jcea, petri.lehtinen, pitrou, python-dev
Priority: normal Keywords: needs review, patch

Created on 2011-12-29 12:27 by petri.lehtinen, last changed 2012-02-01 20:45 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
sqlite3_zero_byte.patch petri.lehtinen, 2011-12-29 12:27 review
sqlite3_zero_byte_v2.patch petri.lehtinen, 2011-12-30 09:38 Now with a test case! review
sqlite3_zero_byte_v3.patch petri.lehtinen, 2012-01-01 20:21 review
Messages (8)
msg150329 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-12-29 12:27
Inserting a string with embedded zero byte only inserts the string up to the first zero byte:

import sqlite3

connection = sqlite3.connect(':memory:')
cursor = connection.cursor()

cursor.execute('CREATE TABLE test (value TEXT)')
cursor.execute('INSERT INTO test (value) VALUES (?)', ('foo\x00bar',))

cursor.execute('SELECT value FROM test')
print(cursor.fetchone())
# expected output: (u'foo\x00bar',)
# actual output: (u'foo',)

Also, if there's already data inserted to a table like above with embedded zero bytes, the sqlite-API-to-Python-string conversion truncates the strings to just before the first zero byte.

Attaching a patch against 3.3 that fixes the problem. Basically, it uses PyUnicode_AsStringAndSize and PyUnicode_FromStringAndSize instead of the non-size variants.

Please review, as I'm not sure it covers each possible case.
msg150354 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-29 22:39
Where are the tests? :)
msg150367 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2011-12-30 09:38
What? Don't you SEE that it works correctly? :)

Attached an updated patch with a test case.

FTR, I also tried to make it possible to have the SQL statement include a zero byte, but it seems that sqlite3_prepare() (and also the newer sqlite3_prepare_v2()) always stops reading at the zero byte. See:

    http://www.sqlite.org/c3ref/prepare.html
msg150389 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-12-30 20:23
It would be nice to also have tests for the bytes and bytearray cases.
It also seems the generic case hasn't been fixed ("""PyObject_CallFunction(self->connection->text_factory, "y", val_str)""").
msg150441 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-01-01 20:18
Attached an updated patch. The custom text_factory case is now fixed, and bytes, bytearray and custom factory are all tested.

I also added back the pysqlite_unicode_from_string() function, as this makes the patch a bit smaller. It also seems to me (only by looking at the code) that the sqlite3.OptimizedUnicode factory isn't currently working as documented.

Antoine: Do you happen to know what's the status of the OptimizeUnicode thingie? Has it been changed for a reason or is it just an error that happened during the py3k transition?
msg150442 - (view) Author: Petri Lehtinen (petri.lehtinen) * (Python committer) Date: 2012-01-01 20:21
(Whoops, I didn't mean to change the magic source coding comment. Updating the patch once again.)
msg150444 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-01-01 22:13
> Attached an updated patch. The custom text_factory case is now fixed,
> and bytes, bytearray and custom factory are all tested.

Thanks, looks good to me.

> Antoine: Do you happen to know what's the status of the
> OptimizeUnicode thingie? Has it been changed for a reason or is it
> just an error that happened during the py3k transition?

It looks obsolete in 3.x to me. If you look at the 2.7 source code, it
had a real meaning there. Probably we could simplify the 3.x source code
by removing that option (but better to do it in a separate patch).
msg152440 - (view) Author: Roundup Robot (python-dev) Date: 2012-02-01 20:45
New changeset 2e13011b3719 by Petri Lehtinen in branch '3.2':
sqlite3: Handle strings with embedded zeros correctly
http://hg.python.org/cpython/rev/2e13011b3719

New changeset 93ac4b12a750 by Petri Lehtinen in branch '2.7':
sqlite3: Handle strings with embedded zeros correctly
http://hg.python.org/cpython/rev/93ac4b12a750

New changeset 6f4044afa600 by Petri Lehtinen in branch 'default':
Merge branch 3.2
http://hg.python.org/cpython/rev/6f4044afa600
History
Date User Action Args
2012-02-01 20:45:08python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg152440

resolution: fixed
stage: resolved
2012-01-01 22:13:36pitrousetmessages: + msg150444
2012-01-01 20:21:37petri.lehtinensetfiles: + sqlite3_zero_byte_v3.patch

messages: + msg150442
2012-01-01 20:21:17petri.lehtinensetfiles: - sqlite3_zero_byte_v3.patch
2012-01-01 20:18:22petri.lehtinensetfiles: + sqlite3_zero_byte_v3.patch

messages: + msg150441
2011-12-31 18:27:15jceasetnosy: + jcea
2011-12-30 20:23:11pitrousetmessages: + msg150389
2011-12-30 09:38:04petri.lehtinensetfiles: + sqlite3_zero_byte_v2.patch

messages: + msg150367
2011-12-29 22:39:50pitrousetnosy: + pitrou
messages: + msg150354
2011-12-29 12:27:34petri.lehtinencreate