Index: Doc/library/csv.rst =================================================================== --- Doc/library/csv.rst (revision 70236) +++ Doc/library/csv.rst (working copy) @@ -31,7 +31,8 @@ The :mod:`csv` module's :class:`reader` and :class:`writer` objects read and write sequences. Programmers can also read and write data in dictionary form -using the :class:`DictReader` and :class:`DictWriter` classes. +using the :class:`DictReader` and :class:`DictWriter` classes or in namedtuple +form using the :class:`NamedTupleReader` and :class:`NamedTupleWriter`. .. note:: @@ -185,7 +186,39 @@ are not ordered, there is not enough information available to deduce the order in which the row should be written to the *csvfile*. +.. class:: NamedTupleReader(csvfile, fieldnames=None[, restkey=None[, restval=None[, dialect='excel', rename=False[, *args, **kwds]]]]) + + Create an object which operates like a regular reader but maps the information + read into a :func:`namedtuple` whose fields can be accessed using attribute lookup. + The contents of *fieldnames* are passed directly to be used as the + namedtuple fieldnames. If *fieldnames* is None the values in the + first row of the *csvfile* will be used as the fieldnames. + If the row read has fewer fields than the fieldnames sequence, the value of + *restval* will be used as the default value. If the row read has more fields + than the fieldnames sequence, then the extra fields will be clipped unless + *restkey* has been specified. If it is has, then the extra fields are stored + as a single list in the last field, named is *restkey*. The contents of *rename* + are passed to the namedtuple factory function as a keyword argument. + If *rename* is true, invalid fieldnames are automatically replaced with + positional names (implemented by the :func:`namedtuple` factory function.) + Any other optional or keyword arguments are passed to the underlying + :class:`reader` instance. + +.. class:: NamedTupleWriter(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]]) + + Create an object which operates like a regular writer but maps namedtuples onto + output rows. The *fieldnames* parameter identifies the valid fieldnames that will + be written from a *namedtuple* passed to the :meth:`writetrow` to the *csvfile*. + The optional *restval* parameter specifies the value to be written if the *namedtuple* + is missing a field listed in *fieldnames*. If the *namedtuple* passed to the + :meth:`writerow` method contains a field not found in *fieldnames*, the optional + *extrasaction* parameter indicates what action to take. If it is set to ``'raise'`` a + :exc:`ValueError` is raised. If it is set to ``'ignore'``, extra fields in the + *namedtuple* are ignored. Any other optional or keyword arguments are passed to + the underlying :class:`writer` instance. + + .. class:: Dialect The :class:`Dialect` class is a container class relied on primarily for its @@ -348,15 +381,6 @@ Reader Objects -------------- -Reader objects (:class:`DictReader` instances and objects returned by the -:func:`reader` function) have the following public methods: - - -.. method:: csvreader.next() - - Return the next row of the reader's iterable object as a list, parsed according - to the current dialect. - Reader objects have the following public attributes: @@ -372,7 +396,8 @@ -DictReader objects have the following public attribute: +:class:`DictReader` and :class:`NamedTupleReader` objects have the following +public attribute: .. attribute:: csvreader.fieldnames @@ -386,13 +411,17 @@ Writer Objects -------------- -:class:`Writer` objects (:class:`DictWriter` instances and objects returned by -the :func:`writer` function) have the following public methods. A *row* must be -a sequence of strings or numbers for :class:`Writer` objects and a dictionary -mapping fieldnames to strings or numbers (by passing them through :func:`str` -first) for :class:`DictWriter` objects. Note that complex numbers are written -out surrounded by parens. This may cause some problems for other programs which -read CSV files (assuming they support complex numbers at all). +:class:`Writer` objects (:class:`DictWriter` instances, :class:`NamedTupleWriter` +instances and objects returned by the :func:`writer` function) have the following +public methods. A *row* must be a sequence of strings or numbers for +:class:`Writer` objects, a dictionary mapping fieldnames to strings or numbers +(by passing them through :func:`str` first) for :class:`DictWriter` objects or +a namedtuple for :class:`NamedTupleWriter` objects that define the fieldnames +attribute. :class:`NamedTupleWriter` objects that do not define the fieldnames +attribute behave in accordance with a :class:`Writer` object. +Note that complex numbers are written out surrounded by parens. +This may cause some problems for other programs which read CSV files +(assuming they support complex numbers at all). .. method:: csvwriter.writerow(row) Index: Lib/csv.py =================================================================== --- Lib/csv.py (revision 70236) +++ Lib/csv.py (working copy) @@ -10,6 +10,7 @@ QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONNUMERIC, QUOTE_NONE, \ __doc__ from _csv import Dialect as _Dialect +from collections import namedtuple as _namedtuple from io import StringIO @@ -65,7 +66,133 @@ delimiter = '\t' register_dialect("excel-tab", excel_tab) +class NamedTupleReader: + def __init__(self, f, fieldnames=None, restkey=None, restval=None, + dialect="excel", rename=False, *args, **kwds): + # list of fieldnames for the namedtuple or a namedtuple. + self._fieldnames = fieldnames + + try: + # namedtuple subclass name. + self._name = self._fieldnames.__name__ + except AttributeError: + self._name = 'Fields' + + self.restkey = restkey # key to catch long rows + self.restval = restval # default value for short rows + self.reader = reader(f, dialect, *args, **kwds) + self.dialect = dialect + self.line_num = 0 + + # Allow namedtuple module to automatically replace invalid fieldnames. + self.rename = rename + + def __iter__(self): + return self + + @property + def fieldnames(self): + """Fetch field names from the stored namedtuple subclass, create one if + one is not yet stored.""" + # attempt to short circuit if we already have a namedtuple stored. + try: + return list(self._fieldnames._fields) + except AttributeError: + pass + + # if no fieldnames were passed, attempt to read from + # first row. + if self._fieldnames is None: + try: + self._fieldnames = next(self.reader) + except StopIteration: + return + finally: + self.line_num = self.reader.line_num + + self._fieldnames = _namedtuple(self._name, self._fieldnames, + rename=self.rename) + + return list(self._fieldnames._fields) + + @fieldnames.setter + def fieldnames(self, value): + """Set the fieldnames to a new namedtuple. + Can be a sequence of fieldnames or a namedtuple subclass.""" + if hasattr(value, '_fields'): + # attempt to keep self._name attribute up to date. + self._name = value.__name__ + + self._fieldnames = value + + def __next__(self): + if self.line_num == 0: + # Used only for its side effect. + self.fieldnames + row = next(self.reader) + self.line_num = self.reader.line_num + + # unlike the basic reader, we prefer not to return blanks, + # because we will typically wind up with a namedtuple full of None + # values + while row == []: + row = next(self.reader) + + n = len(self.fieldnames) + + # pad missing fields with restval. + if len(row) < n: + row += [self.restval] * (n - len(row)) + + # either clip or assign to restkey. + if self.restkey is None: + row = row[:n] + else: + # need to make a new nt with restkey fieldname. + fieldnames = self._fieldnames._fields + (self.restkey,) + rest_nt = _namedtuple(self._name, fieldnames, rename=self.rename) + row[n:] = [row[n:]] + + return rest_nt._make(row) + + return self._fieldnames._make(row) + +class NamedTupleWriter: + def __init__(self, f, fieldnames=None, restval="", extrasaction="raise", + dialect="excel", *args, **kwds): + + self.fieldnames = fieldnames # list of fieldnames for the namedtuple. + self.restval = restval # for writing short namedtuples. + + if extrasaction.lower() not in ("raise", "ignore"): + raise ValueError("extrasaction (%s) must be 'raise' or 'ignore'" + % extrasaction) + + self.extrasaction = extrasaction + self.writer = writer(f, dialect, *args, **kwds) + + def _nt_to_list(self, row_nt): + if self.fieldnames is None: + return row_nt + + if self.extrasaction == "raise": + wrong_fields = [n for n in row_nt._fields if n not in self.fieldnames] + if wrong_fields: + raise ValueError("namedtuple contains fields not in fieldnames: " + ", ".join(wrong_fields)) + return [getattr(row_nt, name, self.restval) for name in self.fieldnames] + + def writerow(self, row_nt): + return self.writer.writerow(self._nt_to_list(row_nt)) + + def writerows(self, row_nts): + rows = [] + for row_nt in row_nts: + rows.append(self._nt_to_list(row_nt)) + return self.writer.writerows(rows) + + class DictReader: def __init__(self, f, fieldnames=None, restkey=None, restval=None, dialect="excel", *args, **kwds): Index: Lib/test/test_csv.py =================================================================== --- Lib/test/test_csv.py (revision 70236) +++ Lib/test/test_csv.py (working copy) @@ -9,6 +9,7 @@ from io import StringIO, BytesIO from tempfile import TemporaryFile import csv +import collections import gc from test import support @@ -516,6 +517,208 @@ def test_read_escape_fieldsep(self): self.readerAssertEqual('"abc\\,def"\r\n', [['abc,def']]) +#--------------------------------------------------------------------------------- + +class TestNamedTupleFields(unittest.TestCase): + ### "long" means the row is longer than the number of fieldnames + ### "short" means there are fewer elements in the row than fieldnames + def test_rename_fields(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("1one,class,_private\r\n1,2,abc\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, rename=True) + self.assertEqual(reader.fieldnames, ["_1", "_2", "_3"]) + + Fields = collections.namedtuple('fieldnames', 'f1 f2 f3') + values = Fields(f1='1', f2='2', f3='abc') + + self.assertEqual(next(reader), values) + + def test_no_rename_fields(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("valid,2invalid,alsovalid\r\n1,2,abc\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj) + # accessing fieldnames causes the namedtuple factory function to be + # called. It will be passed (invalid) fieldnames from the first row in the + # fileobj. + self.assertRaises(ValueError, getattr, reader, 'fieldnames') + + def test_write_simple_nt(self): + with TemporaryFile('w+', newline='') as fileobj: + writer = csv.NamedTupleWriter(fileobj, fieldnames=["f1", "f2", "f3"]) + + Fields = collections.namedtuple('mynt', 'f1 f3') + fieldnames = Fields(f1=10, f3='abc') + + writer.writerow(fieldnames) + fileobj.seek(0) + self.assertEqual(fileobj.read(), "10,,abc\r\n") + + def test_write_nt_no_fields(self): + with TemporaryFile('w+', newline='') as fileobj: + writer = csv.NamedTupleWriter(fileobj) + + Fields = collections.namedtuple('mynt', 'f1 f2 f3') + fieldnames = Fields(f1=10, f2=None, f3='abc') + + writer.writerow(fieldnames) + fileobj.seek(0) + self.assertEqual(fileobj.read(), "10,,abc\r\n") + + def test_read_namedtuple_fields(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("1,2,abc\r\n") + fileobj.seek(0) + fieldnames = collections.namedtuple('fieldnames', 'f1 f2 f3') + reader = csv.NamedTupleReader(fileobj, + fieldnames=fieldnames) + + Fields = collections.namedtuple('fieldnames', 'f1 f2 f3') + fieldnames = Fields(f1='1', f2='2', f3='abc') + + self.assertEqual(next(reader), fieldnames) + + def test_read_nt_no_fieldnames(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("f1,f2,f3\r\n1,2,abc\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj) + self.assertEqual(reader.fieldnames, ["f1", "f2", "f3"]) + + Fields = collections.namedtuple('fieldnames', 'f1 f2 f3') + fieldnames = Fields(f1='1', f2='2', f3='abc') + + self.assertEqual(next(reader), fieldnames) + + # Two test cases to make sure existing ways of implicitly setting + # fieldnames continue to work. Both arise from discussion in issue3436. + def test_read_nt_fieldnames_from_file(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("f1,f2,f3\r\n1,2,abc\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, + fieldnames=next(csv.reader(fileobj))) + self.assertEqual(reader.fieldnames, ["f1", "f2", "f3"]) + + Fields = collections.namedtuple('fieldnames', 'f1 f2 f3') + fieldnames = Fields(f1='1', f2='2', f3='abc') + + self.assertEqual(next(reader), fieldnames) + + def test_read_nt_fieldnames_chain(self): + import itertools + with TemporaryFile('w+') as fileobj: + fileobj.write("f1,f2,f3\r\n1,2,abc\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj) + first = next(reader) + for row in itertools.chain([first], reader): + self.assertEqual(reader.fieldnames, ["f1", "f2", "f3"]) + + Fields = collections.namedtuple('fieldnames', 'f1 f2 f3') + fieldnames = Fields(f1='1', f2='2', f3='abc') + + self.assertEqual(row, fieldnames) + + def test_read_long_clipped(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("1,2,abc,4,5,6\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, + fieldnames=["f1", "f2"]) + + Fields = collections.namedtuple('fieldnames', 'f1 f2') + fieldnames = Fields(f1='1', f2='2') + + self.assertEqual(next(reader), fieldnames) + + def test_read_long_rest(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("1,2,abc,4,5,6\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, + fieldnames=["f1", "f2"], restkey='DEFAULT') + + Fields = collections.namedtuple('fieldnames', 'f1 f2 DEFAULT') + fieldnames = Fields(f1='1', f2='2', DEFAULT=['abc', '4', '5', '6']) + + self.assertEqual(next(reader), fieldnames) + + def test_read_long_with_rest(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("1,2,abc,4,5,6\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, + fieldnames=["f1", "f2"], restkey="rest") + + Fields = collections.namedtuple('fieldnames', 'f1 f2 rest') + fieldnames = Fields(f1='1', f2='2', rest=['abc', '4', '5', '6']) + + self.assertEqual(next(reader), fieldnames) + + def test_read_long_with_rest_no_fieldnames(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("f1,f2\r\n1,2,abc,4,5,6\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, restkey="rest") + self.assertEqual(reader.fieldnames, ["f1", "f2"]) + self.assertEqual(next(reader)._asdict(), {"f1": '1', "f2": '2', + "rest": ["abc", "4", "5", "6"]}) + + def test_read_short(self): + with TemporaryFile('w+') as fileobj: + fileobj.write("1,2,abc,4,5,6\r\n1,2,abc\r\n") + fileobj.seek(0) + reader = csv.NamedTupleReader(fileobj, + fieldnames="one two three four five six", + restval="DEFAULT") + + self.assertEqual(next(reader)._asdict(), + dict(one='1', two='2', three='abc', + four='4', five='5', six='6')) + + self.assertEqual(next(reader)._asdict(), + dict(one='1', two='2', three='abc', + four='DEFAULT', five='DEFAULT', + six='DEFAULT')) + + def test_read_multi(self): + sample = [ + '2147483648,43.0e12,17,abc,def\r\n', + '147483648,43.0e2,17,abc,def\r\n', + '47483648,43.0,170,abc,def\r\n' + ] + + reader = csv.NamedTupleReader(sample, + fieldnames="i1 float i2 s1 s2".split()) + self.assertEqual(next(reader)._asdict(), {"i1": '2147483648', + "float": '43.0e12', + "i2": '17', + "s1": 'abc', + "s2": 'def'}) + + def test_read_with_blanks(self): + reader = csv.NamedTupleReader(["1,2,abc,4,5,6\r\n","\r\n", + "1,2,abc,4,5,6\r\n"], + fieldnames="one two three four five six") + + Fields = collections.namedtuple('Fields', 'one two three four five six') + fieldnames = Fields(one='1', two='2', three='abc', four='4', five='5', + six='6') + + self.assertEqual(next(reader), fieldnames) + self.assertEqual(next(reader), fieldnames) + + def test_read_semi_sep(self): + reader = csv.NamedTupleReader(["1;2;abc;4;5;6\r\n"], + fieldnames="one two three four five six", + delimiter=';') + self.assertEqual(next(reader)._asdict(), dict(one='1', two='2', + three='abc', four='4', five='5', six='6')) + +#--------------------------------------------------------------------------------- + class TestDictFields(unittest.TestCase): ### "long" means the row is longer than the number of fieldnames ### "short" means there are fewer elements in the row than fieldnames