classification
Title: sqlite3 segfaults and bus errors when given certain unicode strings as queries
Type: crash Stage: resolved
Components: Library (Lib) Versions: Python 3.1
process
Status: closed Resolution: works for me
Dependencies: Superseder:
Assigned To: Nosy List: amaury.forgeotdarc, ghaering, jeremybanks, ned.deily, vstinner
Priority: normal Keywords:

Created on 2011-07-14 23:23 by jeremybanks, last changed 2011-07-15 09:56 by amaury.forgeotdarc. This issue is now closed.

Messages (11)
msg140381 - (view) Author: Jeremy Banks (jeremybanks) Date: 2011-07-14 23:23
I was experimenting with the sqlite3 library and noticed that using certain strings as identifiers cause bus errors or segfaults. I'm not very familiar with unicode, but after some Googling I'm pretty sure this happens when I use non-characters or surrogate characters incorrectly.

This causes a bus error:

import sqlite3
c = sqlite3.connect(":memory:")
table_name = '"' + chr(0xD800) + '"'
c.execute("create table " + table_name + " (bar)")

The equivalent Python 2 (replacing chr with unichr) works properly.
msg140384 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-15 03:24
What operating system platform and version are you seeing this behavior? Also can you report the versions of sqlite3 adapter and the sqlite3 library by executing the following in the interpreter?

>>> sqlite3.version
'2.6.0'
>>> sqlite3.sqlite_version
'3.6.12'

On Linux and OS X systems I've tested, rather than a segfault your test case causes an exception to be raised.

For Python 3.1.4:
"sqlite3.Warning: SQL is of wrong type. Must be string or unicode."

For Python 3.2.1
"UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 14: surrogates not allowed"
msg140385 - (view) Author: Jeremy Banks (jeremybanks) Date: 2011-07-15 03:43
I'm using OS X 10.6.7.

The bus error is occurring with my Python 3.1 installation:
  path: /Library/Frameworks/Python.framework/Versions/3.1/bin/python3
  sqlite3.version == 2.4.1
  sqlite3.sqlite_version = 3.6.11.

But now that you mention it, my MacPorts installations of Python 3.0 and 3.1 just get an exception like you do:
  paths: /opt/local/bin/python3.0 / python3.1
  sqlite3.version == 2.4.1
  sqlite3.sqlite_version == 3.7.7.1

A Python 2.7 installation where it works without any error:
  path: /Library/Frameworks/Python.framework/Versions/2.7/bin/python
  sqlite3.version == 2.6.0
  sqlite3.sqlite_version == 3.6.12

A MacPorts Python 2.6 installation where it works without any error:
  path: /opt/local/bin/python2.6
  sqlite3.version == 2.4.1
  sqlite3.sqlite_version == 3.7.7.1
msg140386 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-15 04:36
Sorry, I cannot reproduce on Mac OS X 10.6.8 the crash behavior you report using various Python 3.1.x installed from the python.org Python OS X installers, in particular, 3.1 and 3.1.4 (the first and the most recent 3.1 releases).  If this Python instance was not installed from a python.org installer, I suggest contacting the distributor that supplied it.  If you built it from source, suggest checking what ./configure options you used and which copy of the sqlite3 library was used.  You might want to take this opportunity to update to Python 3.2.1 since no further bug fixes (other than security fixes) are expected to be released for 3.1.x.
msg140387 - (view) Author: Jeremy Banks (jeremybanks) Date: 2011-07-15 04:40
I'll try that, thank you.

If it works without exception in Python 2, isn't the behaviour in Python 3 a regression bug, even if it doesn't crash? If so, should I create a new/separate issue for the behaviour?
msg140389 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2011-07-15 06:47
0xD800 does not represent a valid Unicode character; it's a surrogate code point (see http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates).  If you use a code point that does represent a Unicode character, say 0xA800, there is no error.  If there is a bug here, it's that the Python 2 version does not report an error for this edge case.
msg140392 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-15 07:50
I already fixed this issue in Python 3.1, 3.2 and 3.3: issue #6697 (e.g. commit 7ba851d1b46e).

$ ./python 
Python 3.3.0a0 (default:ab162f925761, Jul 15 2011, 09:36:17) 
>>> import sqlite3
>>> c = sqlite3.connect(":memory:")
>>> table_name = '"' + chr(0xD800) + '"'
>>> c.execute("create table " + table_name + " (bar)")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 14: surrogates not allowed

@jeremybanks: I don't think that you use sqlite3 coming from Python 3 but the third party module.
msg140393 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-15 08:02
> I already fixed this issue in Python 3.1, 3.2 and 3.3:
> issue #6697 (e.g. commit 7ba851d1b46e).

Oh, wrong: the bug was only fixed in Python 3.2 and 3.3. There was already a check after _PyUnicode_AsStringAndSize(), but the test was on the wrong variable (operation vs operation_cstr).

Because only security bugs can be fixed in Python 3.1, I think that this issue should be closed. Or do you consider dereferencing a NULL pointer in sqlite3 as a security vulnerability?
msg140395 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-07-15 08:30
It seems that a fix was merged in the 3.1 branch, somewhere between 3.1.2 and 3.1.3.
msg140398 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2011-07-15 08:40
> It seems that a fix was merged in the 3.1 branch,
> somewhere between 3.1.2 and 3.1.3.

Which fix? The code is still wrong in Mercurial (branch 3.1):

   493     operation_cstr = _PyUnicode_AsStringAndSize(operation, &operation_len);
   494     if (operation == NULL)
   495         goto error;

http://hg.python.org/cpython/file/42ec507815d2/Modules/_sqlite/cursor.c
msg140400 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2011-07-15 09:56
The fix was c073f3c3276e (thanks to hg bisect)
the variable operation_cstr is not used before the call to pysqlite_cache_get(), which also tries to encode the statement into utf8 and correctly raises an exception.
In early 3.1.2, the segfault came from the DECREF of an uninitialized member...
History
Date User Action Args
2011-07-15 09:56:41amaury.forgeotdarcsetmessages: + msg140400
2011-07-15 08:40:35vstinnersetmessages: + msg140398
2011-07-15 08:30:50amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg140395
2011-07-15 08:02:16vstinnersetmessages: + msg140393
2011-07-15 07:50:53vstinnersetnosy: + vstinner
messages: + msg140392
2011-07-15 06:47:43ned.deilysetmessages: + msg140389
2011-07-15 04:40:20jeremybankssetmessages: + msg140387
2011-07-15 04:36:06ned.deilysetstatus: open -> closed
resolution: works for me
messages: + msg140386

stage: resolved
2011-07-15 03:43:06jeremybankssetmessages: + msg140385
2011-07-15 03:24:09ned.deilysetnosy: + ghaering, ned.deily
messages: + msg140384
2011-07-14 23:23:42jeremybankscreate