classification
Title: namedtuple row factory for sqlite3
Type: enhancement Stage: patch review
Components: Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: ghaering Nosy List: BreamoreBoy, Russell.Sim, YoSTEALTH, dlenski, eric.araujo, ghaering, ncoghlan, petri.lehtinen, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2011-10-31 00:32 by ncoghlan, last changed 2015-01-21 04:43 by YoSTEALTH.

Files
File name Uploaded Description Edit
issue_13299.patch Russell.Sim, 2012-08-20 03:20 review
issue_13299.1.patch Russell.Sim, 2012-08-21 12:16 review
issue_13299.2.patch Russell.Sim, 2012-08-21 12:49 review
sqlite_namedtuplerow.patch serhiy.storchaka, 2015-01-11 08:05 review
Messages (16)
msg146670 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2011-10-31 00:32
Currently, sqlite3 allows rows to be easily returned as ordinary tuples (default) or sqlite3.Row objects (which allow dict-style access).

collections.namedtuple provides a much nicer interface than sqlite3.Row for accessing ordered data which uses valid Python identifiers for field names, and can also tolerate field names which are *not* valid identifiers.

It would be convenient if sqlite3 provided a row factory along the lines of the one posted here:
http://peter-hoffmann.com/2010/python-sqlite-namedtuple-factory.html

(except with smarter caching on the named tuples)
msg146738 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2011-10-31 19:44
+1
msg147474 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-11-12 11:20
> collections.namedtuple provides a much nicer interface than sqlite3.Row
Definitely!
msg168617 - (view) Author: Russell Sim (Russell.Sim) Date: 2012-08-20 03:20
Hi,

Here is an implementation using lru_cache to prevent regeneration of the named tuple each time.

Cheers,
Russell
msg168748 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2012-08-21 07:20
Caching based on the cursor going to be problematic because a single cursor can be used multiple times with different descriptions:

   c = conn.cursor()
   c.execute('select symbol from stocks')
   print c.description
   c.execute('select price from stocks')
   print c.description     # same cursor, different layout, needs a new named tuple

It might make more sense to cache the namedtuple() factory itself:

   sql_namedtuple = lru_cache(maxsize=20)(namedtuple)

Also, the example in the docs is too lengthy and indirect.  Cut-out the step for creating an populating the database -- just assume db created in the example at the top of the page:

   For example::

    >>> conn.row_factory = sqlite3.NamedTupleRow
    >>> c = conn.cursor()
    >>> for record in c.execute('select * from stocks'):
            print record

    Row(date='2006-01-05', trans='BUY', symbol='RHAT', qty=100.0, price=35.14)
    Row(date='2006-01-05', trans='BUY', symbol='RHAT', qty=100, price=35.14)
    Row(date='2006-03-28', trans='BUY', symbol='IBM', qty=1000, price-45.0)

No need to go into a further lesson on how to use named tuples.


Also, the patch uses star-unpacking:  _namedtuple_row(cursor)(*row)

Instead, it should use _make:  _namedtuple_row(cursor)._make(row)


(u'2006-04-05', u'BUY', u'MSFT', 1000, 72.0)
msg168761 - (view) Author: Russell Sim (Russell.Sim) Date: 2012-08-21 12:16
Raymond, Thanks for the comprehensive feedback! It's fantastic!  I have updated the patch with most of you feedback... but there was one part that I couldn't follow entirely.  I am now using the _make method but I have had to use star unpacking to allow the method to be cached, lru_cache won't allow a key to be a list because they aren't hash-able.
msg168762 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2012-08-21 12:35
You should be able to just use "tuple(col[0] for col in cursor.description)" instead of the current list comprehension in order to make the argument hashable.
msg168763 - (view) Author: Russell Sim (Russell.Sim) Date: 2012-08-21 12:49
Nick, Thanks for the tip.  I have removed the star unpacking.
msg218657 - (view) Author: Glenn Langford (glangford) * Date: 2014-05-16 12:32
In abstract, I like the namedtuple interface for sqlite3 as well. One caution is that the approach suggested at

http://peter-hoffmann.com/2010/python-sqlite-namedtuple-factory.html 

can have a dramatic impact on performance. For one DB-intensive application, I experienced 20+ seconds run time with the row factory (under 3.4), versus sub second without (identified with cProfile). Many thousands of calls to namedtuple_factory were not good. :)
msg218668 - (view) Author: Glenn Langford (glangford) * Date: 2014-05-16 15:19
...if I understand the proposed caching scheme, then repeated executions of the query

SELECT a,b,c FROM table

would result in cache hits, since the column names remain the same. I'm guessing this would resolve the performance problem in the app I saw, but it would be good to verify that performance is broadly similar with/without named tuples.
msg223704 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-22 22:11
I'd like to see this in 3.5 as I often use sqlite so what needs doing here?
msg223725 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-07-23 05:51
There is significant overhead. Microbenchmark results:

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:')"  "con.execute('select 1 as a, 2 as b').fetchall()"
10000 loops, best of 3: 35.8 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.Row"  "con.execute('select 1 as a, 2 as b').fetchall()"
10000 loops, best of 3: 37.3 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow"  "con.execute('select 1 as a, 2 as b').fetchall()"
10000 loops, best of 3: 92.1 usec per loop

It would be easier to add __getattr__ to sqlite3.Row.
msg224702 - (view) Author: Daniel Lenski (dlenski) * Date: 2014-08-04 09:28
Serhiy,
52 usec/loop doesn't seem like much overhead. This is not 52 usec per row fetched, but just 52 usec per cursor.execute(). An example where >1 row is fetched for each cursor would show this more clearly.

The advantage of namedtuple is that it's a very well-known interface to most Python programmers. Other db-api modules have taken a similar approach; psycopg2 has a dict-like cursor similar to Row, but has added NameTupleCursor in recent versions. (http://initd.org/psycopg/docs/extras.html#namedtuple-cursor)
msg224707 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-08-04 10:37
Yes, above microbenchmarks measure the time of execute() + the time of fetching one row. Here is more precise microbenchmarks.

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
1000 loops, best of 3: 624 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.Row; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
1000 loops, best of 3: 915 usec per loop

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
100 loops, best of 3: 6.21 msec per loop

Here sqlite3.Row is about 1.5 times slower than tuple, but sqlite3.NamedTupleRow is about 7 times slower than sqlite3.Row.

With C implementation of lru_cache() (issue14373) the result is much better:

100 loops, best of 3: 3.16 msec per loop

And it will be even more faster (up to 1.7x) when add to the Cursor class a method which returns a tuple of field names.
msg233839 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-11 08:05
Here is faster implementation.

$ ./python -m timeit -s "import sqlite3; con = sqlite3.connect(':memory:'); con.row_factory = sqlite3.NamedTupleRow; con.execute('create table t (a, b)')" -s "for i in range(100): con.execute('insert into t values (1, 2)')" -- "con.execute('select * from t').fetchall()"
100 loops, best of 3: 2.74 msec per loop

But it is still 3 times slower than sqlite3.Row.
msg234424 - (view) Author: (YoSTEALTH) * Date: 2015-01-21 04:43
note: sqlite_namedtuplerow.patch _cache method conflicts with attached database with say common table.column name like "id"

Using namedtuple method over sqlite3.Row was a terrible idea for me. I thought namedtuple is like tuple so should be faster then dict! wrong. I wasted 2 days change my work to namedtuple and back to sqlite3.Row, the speed difference on my working project was:

namedtuple 0.035s/result
sqlite3.Rows 0.0019s/result

for(speed test) range: 10000
namedtuple 17.3s
sqlite3.Rows 0.4s

My solution was to use sqlite3.Row (for speed) but to get named like usage by convert dict keys() with setattr names:

class dict2named(dict):
    def __init__(self, *args, **kwargs):
        super(dict2named, self).__init__(*args, **kwargs)
        self.__dict__ = self

Usage:

for i in con.execute('SELECT * FROM table'):
    yield dict2named(i)

Now i can use:

print(i.title)

and handy dict methods for dash column names:

print(i['my-title'])
print(i.get('my-title', 'boo'))

Now working project speed:
sqlite3.Rows 0.0020s/result

for(speed test) range: 10000
sqlite3.Rows 0.8s with dict2named converting

This i can work with, tiny compromise in speed with better usage.
History
Date User Action Args
2015-01-21 04:43:49YoSTEALTHsetnosy: + YoSTEALTH
messages: + msg234424
2015-01-11 08:05:11serhiy.storchakasetfiles: + sqlite_namedtuplerow.patch

messages: + msg233839
2015-01-11 01:58:33ghaeringsetassignee: ghaering
2014-08-04 10:37:41serhiy.storchakasetmessages: + msg224707
2014-08-04 09:28:05dlenskisetnosy: + dlenski
messages: + msg224702
2014-07-23 05:51:21serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg223725
2014-07-22 22:11:39BreamoreBoysetnosy: + BreamoreBoy

messages: + msg223704
versions: + Python 3.5, - Python 3.4
2014-07-18 16:45:09glangfordsetnosy: - glangford
2014-05-16 15:19:15glangfordsetmessages: + msg218668
2014-05-16 12:32:42glangfordsetnosy: + glangford
messages: + msg218657
2012-10-02 05:27:19ezio.melottisetstage: needs patch -> patch review
versions: + Python 3.4, - Python 3.3
2012-08-21 12:49:55Russell.Simsetfiles: + issue_13299.2.patch

messages: + msg168763
2012-08-21 12:35:55ncoghlansetmessages: + msg168762
2012-08-21 12:16:26Russell.Simsetfiles: + issue_13299.1.patch

messages: + msg168761
2012-08-21 07:20:33rhettingersetmessages: + msg168748
2012-08-20 03:20:32Russell.Simsetfiles: + issue_13299.patch

nosy: + Russell.Sim
messages: + msg168617

keywords: + patch
2012-02-03 09:19:39petri.lehtinensetnosy: + petri.lehtinen
2011-11-12 11:20:05eric.araujosetnosy: + eric.araujo
messages: + msg147474
2011-10-31 19:44:37rhettingersetnosy: + rhettinger
messages: + msg146738
2011-10-31 05:12:23ned.deilysetnosy: + ghaering
2011-10-31 00:32:25ncoghlancreate