This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sqlite3.Connection.iterdump() dies with encoding exception
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 2.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: biny, ekontsevoy, eric.smith, petri.lehtinen, python-dev, r.david.murray
Priority: high Keywords:

Created on 2012-06-19 22:18 by ekontsevoy, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
death.py ekontsevoy, 2012-06-20 01:09
Messages (10)
msg163227 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012-06-19 22:18
When calling connection.iterdump() on a database with non-ASCII string values, the following exception is raised:

----------------------------------------------------
File "/python-2.7.3/lib/python2.7/sqlite3/dump.py", line 56, in _iterdump
    yield("{0};".format(row[0]))

UnicodeEncodeError: 'ascii' codec can't encode characters in position 48-51: ordinal not in range(128)
----------------------------------------------------

The older versions used the following (safer) version in /python-2.7.3/lib/python2.7/sqlite3/dump.py:56:

yield("%s;" % row[0])
msg163230 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012-06-19 22:53
Proposed fix:

maybe 
yield(u"%s;" % row[0]) 

or simply

row[0] + ";"?
msg163235 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-20 00:48
It's not clear to me why the behavior differs.  Hopefully Eric will explain.

For 2.7 we should probably just revert the change to the yield statement to restore the previous behavior, unless format can be fixed.
msg163237 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012-06-20 00:57
If the behavior of string.format() can be fixed to act identically to u"%s" % "" that would be simply wonderful!

Currently at work we have a rule in place: to never use string.format() since it cannot be used for anything but constants due to encoding exceptions.
msg163239 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2012-06-20 01:02
Could you reproduce this in a short script that doesn't use sqlite? I'm looking for something like:

str = 'some-string'
"{0}".format(str)

Also: is that the entire traceback? I don't see how format could be invoking a codec. Maybe the error occurs when writing it to stdout, or some other operation that's encoding?
msg163241 - (view) Author: Ev Kontsevoy (ekontsevoy) Date: 2012-06-20 01:09
I am attaching death.py file which dies on string.format()
The stack trace above is at the full depth. Python doesn't print anything from inside of format().
msg163243 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-20 01:49
>>> print('{}'.format(u'\u2107'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2107' in position 0: ordinal not in range(128)
>>> print('%s' % u'\u2107')
ℇ

(You get the exception without the print as well, just in case that isn't clear.)

Ah, and now I see why this is true.  The '%s' gets implicitly coerced to unicode.  So, it is not a bug in format, and the yield statement change should be reverted.

You can use format if you just always make your format input strings unicode strings (which you should be doing anyway, especially now that python3.3 will allow the 'u' prefix...that is, such code will be forward-compatible with Python3).
msg163244 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-20 01:50
Or use 'from __future__ import unicode_literals'.
msg163246 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2012-06-20 01:58
Note that this is a regression in 2.7.3 relative to 2.7.2, which is why I'm marking it as high priority.
msg179614 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-01-11 02:12
New changeset 2a417ad8bfbf by R David Murray in branch '2.7':
#15109: revert '%'->'format' changes in 4b105d328fe7 to fix regression.
http://hg.python.org/cpython/rev/2a417ad8bfbf
History
Date User Action Args
2022-04-11 14:57:31adminsetgithub: 59314
2013-01-11 02:13:00r.david.murraysetstatus: open -> closed
resolution: fixed
stage: needs patch -> resolved
2013-01-11 02:12:17python-devsetnosy: + python-dev
messages: + msg179614
2012-07-17 18:48:35binysetnosy: + biny
2012-06-20 01:58:43r.david.murraysetpriority: normal -> high

nosy: + petri.lehtinen
messages: + msg163246

stage: needs patch
2012-06-20 01:50:29r.david.murraysetmessages: + msg163244
2012-06-20 01:49:14r.david.murraysetmessages: + msg163243
2012-06-20 01:09:38ekontsevoysetfiles: + death.py

messages: + msg163241
2012-06-20 01:02:39eric.smithsetmessages: + msg163239
2012-06-20 00:57:16ekontsevoysetmessages: + msg163237
2012-06-20 00:48:59r.david.murraysetnosy: + r.david.murray, eric.smith
messages: + msg163235
2012-06-19 22:53:19ekontsevoysetmessages: + msg163230
2012-06-19 22:19:08ekontsevoysettype: behavior
2012-06-19 22:18:47ekontsevoycreate