classification
Title: configparser support for reading from strings and dictionaries
Type: enhancement Stage: patch review
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: accepted
Dependencies: Superseder:
Assigned To: Nosy List: brian.curtin, eric.araujo, ezio.melotti, fdrake, lukasz.langa
Priority: normal Keywords: patch

Created on 2010-08-02 00:31 by lukasz.langa, last changed 2010-08-09 12:53 by fdrake. This issue is now closed.

Files
File name Uploaded Description Edit
issue9452.diff lukasz.langa, 2010-08-08 19:02 Patch that introduces read_string and read_dict + related bugfixes
Messages (19)
msg112409 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-02 00:31
Overview
--------

It's a fairly common need in configuration parsing to take configuration from a string or a Python data structure (most commonly, a dictionary). The attached patch introduces two new methods to RawConfigParser that serve this purpose: readstring and readdict. In the process, two behavioral bugs were detected and fixed.

Detailed information about the patch
------------------------------------
Source changes:
* added readstring and readdict methods
* fixed a bug in SafeConfigParser.set when setting a None value would raise an exception (the code tried to remove interpolation escapes from None)
* added a new exception DuplicateOptionError, raised when during parsing a single file, string or a dictionary the same option occurs twice. This catches mispellings and case-sensitivity errors (parsers are by default case-insensitive in regard to options).
* added checking for option duplicates described above in _read

Test changes:
* self.fromstring is using readstring now
* self.cf removed because it was bad design, now config parser instances are explicit everywhere
  ** changed definition of get_error and parse_error
* split test_basic into two parts: config parser building and the actual test
  ** introduced config parser built from the dictionary
* added a duplicate option checking case for string and dictionary based reading

Documentation changes:
* documented readstring and readdict
* documented DuplicateOptionError
* explicit remark about the case-insensitivity
* corrected remark about leading whitespace being removed from values (also trailing whitespace is, and for keys as well)
msg112410 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-02 00:35
There goes the patch.
msg112413 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-02 00:58
Patch updated after review by Ezio Melotti.

To answer a common question that came up in the review: all atypical names and implementation details are there due to consistency with existing configparser code, e.g.:

 * readstring ~= readfp (no _ between words)
 * DuplicateOptionError ~= DuplicateSectionError (not Duplicated)
 * all exceptions use old style BaseClass.__init__ and not super()

API won't change so this has to remain that way. Exceptions may be refactored in one go at a later stage.
msg112416 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-08-02 02:38
Although you say this is fairly common, I haven't heard of anyone using or requesting this type of feature. Do you have any real-world use cases for this? Before we start adding more read methods I think we should know who wants them and why.

I'm not sure duplicates should raise exceptions. To me, the current behavior of using the last read section/option is fine. It's predictable and it works. Halting a program's operation due to duplicate sections/options seems a bit harsh to me.
msg112429 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-02 08:46
Good questions, thanks! The answers will come useful for documentation and later hype :)



READING CONFIGURATION FROM A DATA STRUCTURE
-------------------------------------------

This is all about templating a decent set of default values. The major use case I'm using this for (with a homebrew SafeConfigParser subclass at the moment) is to provide *in one place* a set of defaults for the whole configuration. The so-called `defaults=` that we have at the moment don't fit this space well enough because they provide values that can (and will) jump into every section. This made them useless for me twice:
- when configuring access to external components in a fairly complex system; abstracting out the useless details the template I was looking for was

[name-server]
port=
protocol=
verbose=

[workflow-manager]
port=
protocol=
verbose=

[legacy-integration]
port=
protocol=
verbose=

# there were about 15 of these

- second case was a legacy CSV translation system (don't ask!). An abstract of a config with conflicting keys:

[company1-report]
delimiter=,
amount_column=
amount_type=
description_column=
description_type=
ignore_first_line=True

[company2-report]
delimiter=;
amount_column=
amount_type=
description_column=
description_type=
ignore_first_line=False

# and so on for ~10 entries

As you can see, in both examples `defaults=` couldn't be a good enough template. The reason I wanted these default values to be specified in the program was two-fold:

1. to be able to use the configuration without worrying about NoOptionErrors or fallback values on each get() invocation

2. to handle customers with existing configuration files which didn't include specific sections; if they didn't need customization they could simply use the defaults provided

I personally like the dictionary reading method but this is a matter of taste. Plus, .fromstring() is already used in unit tests :)



DUPLICATE OPTION VALIDATION
---------------------------

Firstly, I'd like to stress that this validation does NOT mean that we cannot update keys once they appear in configuration. Duplicate option detection works only while parsing a single file, string or dictionary. In this case duplicates are a configuration error and should be notified to the user.

You are right that for a programmer accepting the last value provided is acceptable. In this case the impact should be on the user who might not feel the same. If his configuration is ambiguous, it's best to use the Zen: "In the face of ambiguity, refuse the temptation to guess."

This is very much the case for large configuration files (take /etc/ssh/sshd_config or any real life ftpd config, etc.) when users might miss the fact that one option is uncommented in the body or thrown in at the end of the file by another admin or even the user himself.

Users might also be unaware of the case-insensitivity. 

These two problems are even more likely to cause headaches for the dictionary reading algorithm where there actually isn't an order in the keys within a section and you can specify a couple of values that represent the same key because of the case-insensitivity. Plus, this is going to be even more visible once we introduce mapping protocol access when you can add a whole section with keys using the dictionary syntax.

Another argument is that there is already section duplicate validation but it doesn't work when reading from files. This means that the user might add two sections of the same name with contradicting options.



SUMMARY
-------
Reading from strings or dictionaries provides an additional way to feed the parser with values. Judging from the light complexity of both methods I would argue that it's beneficial to configparser users to have well factored unit tested methods for these tasks so they don't have to reimplement them over and over again when the need arises.

In terms of validation, after you remark and thinking about it for a while, I think that the best path may be to let programmers choose during parser initialization whether they want validation or not. This would be also a good place to include section duplicate validation during file reading. Should I provide an updated patch?

After a couple of years of experience with external customers configuring software I find it better for the software to aid me in customer support. This is the best solution when users can help themselves. And customers (and we ourselves, too!) do stupid things all the time. And so, specifying a default set of sane values AND checking for duplicates within the same section helps with that.
msg112510 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-08-02 18:41
Reading from a string is certainly fairly common, though I'm pretty happy with using an io.StringIO seems reasonable and straightforward.

I've never stumbled over the need to "read" from dictionaries as described.
msg112520 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-02 19:25
Corrected a simple mistake in the patch.
msg112614 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-03 14:38
Updated patch after discussion on #python-dev:

- PEP8 compliant names used: read_file, read_string, read_dict. readfp has been PendingDeprecated
- documentation updates
- option validation is now optional with the use of `strict=` argument in the parser's __init__
- new unit tests introduced to check for the new behaviour
msg112617 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-03 14:47
FTR, some people questioned the purpose of read_dict(). Let me summarize this very briefly here:

- the API is using dictionaries similar to those in defaults= but has one level of depth more (sections)
- initializing a parser with a dictionary produces syntax that is more natural in code
- having a single method implementing reading a dictionary with unit tests, support for proper duplicate handling, etc. frees users from writing their own
- we need that anyway for upcoming mapping protocol access discussed in #5412
- more detailed use cases in msg112429
msg112720 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-03 23:04
Rietveld review link: http://codereview.appspot.com/1924042/edit
msg112736 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-08-04 01:44
I agree that the existing defaults={...} should never have been added to the stdlib.  It made sense in the originating application, but should have been implemented differently to keep application-specific behavior out of what eventually was added to the stdlib.

Will think further about the rest of this when I'm on my own computer and can read & play with the patch in a more usable environment.
msg112799 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-04 13:09
Patch updated after review by Ezio Melotti and Éric Araujo. Thanks guys.
msg112949 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-08-05 07:10
(Apparently I don't have the right permissions on Rietveld.)

- Docstrings should be written in the standard PEP-8 way (single line
  summary + additional explanation as needed following a blank line).

- read_sting and read_dict should still take a `filename` argument for
  use in messages, with <string> and something like <data in ...> (with
  the caller's __file__ being filled in if not provided).

- Indentation in the last read_dict of
  test.test_cfgparser.BasicTestCase.test_basic_from_dict is
  incconsistent with the previous read_dict in the same test.

- Lines over 79 characters should be shortened.  Most of these are in
  docstrings, so just re-wrapping should be sufficient for most.

- Changing the test structure to avoid self.cf may have been convenient,
  but is unrelated to the actual functionality changes.  In the future
  such refactorings should be performed in separate patches.  (Ordering
  dependencies are fine, so long as they're noted in the relevant
  issues.)

- DuplicateOptionError is missing from __all__.

- Changing the constructor to use keyword-only arguments carries some
  backward-compatibility risk.  That can be avvoided by removing that
  change and adding strict=False at the end of the parameter list.

  It's unlikely that this is a significant risk, since these parameters
  generally lend themselves to keyword usage.


I think this should have been several separate patches:

- refactoring (the self.cf changes in the tests)

- addition of the DuplicateOptionError

- the read_* methods (including the readfp deprecation)

- the new "strict" option

Don't change that at this point, but please consider smaller chunks in
the future.
msg113057 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-05 22:16
Updated patch after review by Fred Drake. Thanks, it was terrific!

Status:

> Docstrings should be written in the standard PEP-8 way (single line
> summary + additional explanation as needed following a blank line).

Corrected where applicable. Is it OK if the one-sentence summary is occasionally longer than one line? Check out DuplicateSectionError, could it have a summary as complete as this that would fit a single line?

On a similar note, an inconsistency of configparser.py is that sometimes a blank line is placed after the docstring and sometimes there is none. How would you want this to look vi in the end?

> read_sting and read_dict should still take a `filename` argument for
> use in messages, with <string> and something like <data in ...> (with
> the caller's __file__ being filled in if not provided).

Are you sure about that? `read_string` might have some remote use for an argument like that, but I'd call it source= anyway. As for `read_dict`, the implementation does not even use _read(), so I don't know where could this potential `filename=` be used.

> Indentation in the last read_dict of
> test.test_cfgparser.BasicTestCase.test_basic_from_dict is
> incconsistent with the previous read_dict in the same test.

Updated. All in all, some major unit test lipsticking should be done as a separate patch.

> Lines over 79 characters should be shortened.  Most of these are in
> docstrings, so just re-wrapping should be sufficient for most.

Corrected, even made my Vim show me these kinds of formatting problems. I also corrected a couple of these which were there before the change.

> Changing the test structure to avoid self.cf may have been convenient,
> but is unrelated to the actual functionality changes.  In the future
> such refactorings should be performed in separate patches.  (Ordering
> dependencies are fine, so long as they're noted in the relevant
> issues.)

Good point, thanks.

> DuplicateOptionError is missing from __all__.

Corrected.

> Changing the constructor to use keyword-only arguments carries some
> backward-compatibility risk.  That can be avvoided by removing that
> change and adding strict=False at the end of the parameter list.

All of these arguments are new in trunk so there is no backwards compatibility here to think about. The arguments that are not new (defaults and dict_type) are placed before the asterisk.

> Don't change that at this point, but please consider smaller chunks in
> the future.

I will, thanks.
msg113058 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-05 22:16
Ah, forgot to remind you that I don't have commit privileges yet.
msg113059 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-08-05 22:18
>> Docstrings should be written in the standard PEP-8 way (single line
>> summary + additional explanation as needed following a blank line).
> Corrected where applicable. Is it OK if the one-sentence summary is
> occasionally longer than one line?

It’s a one-line summary, not one sentence. PEP 257 has all the details you’re looking for, and more.
msg113075 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-08-06 05:51
- Summmary lines in docstrings are one line, as Éric points out.
  They're summaries, so need not be complete.  Use elaboration text as
  needed, and omit anything that's not relevant in context.  An
  alternate wording to consider:

    """Raised by strict parsers for options repeated in an input source."""

  In particular, there's no mention of what the kinds of sources are,
  since that's not relevant for the exception itself.

- I like the blank line before the ending """, others don't.  More
  important is consistency within the module, so feel free to coerce it
  either way.

- Perhaps read_* should all call the extra argument source= instead of
  filename=.  ``readfp`` should take `filename` and pass it as `source`
  for compatibility.  This still makes sense for ``read_file`` since the
  "file" part of the name refers to the kind of object that's passed in,
  not that the file represents a simple file on the filesystem.

- Since the Duplicate*Error exceptions can now be raised when "reading"
  (_read itself is an implementation detail, so that's irrelevant), they
  should grow filename, lineno, line constructor arguments.  These
  should be filled in as much as possible when those exceptions are
  raised.

- Adding a ``source`` attribute to exceptions that have ``filename``
  attribute is reasonable; they should have the same value.  (One should
  be a property that mirrors the other.)  The ``filename`` parameter and
  attribute name must remain for compatibility.

- There's at least one way to make vim show the too-long text with a
  violent red background.  Appropriate for code, sucks when reading
  logfiles.  But I'm not a vim user.  Whatever tool makes it show up for
  you is fine.

- The constructors in Python 3.2 should be compatible with those from
  Python 2.7 (IMO).  They're currently not: `allow_no_value` should be
  allowed positionally and come right after `dict_type`.  (This problem
  existed before your patch; it should be corrected first.)

- In the docs, "Parse configuration from a dictionary" should probably
  be "Load configuration from a dictionary"; most of the parsing effort
  is skipped in this case, so this reads a little oddly.

- Indentation in the DuplicateOptionError constructor is a bit weird in
  the superclass constructor call.  Something like this would be better:

    def __init__(self, section, option):
        Error.__init__(
            self,
            "Option %r in section %r already exists" % (option, section))
        self.section = section
        self.option = option
        self.args = (section, option)
msg113298 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2010-08-08 19:02
Patch updated.

All docstrings now have a one-line summary.

All multiline docstrings now have a newline character before the closing """.

No method docstrings now include any additional newlines between them and the
code. Most of them were okay, a couple were edited for consistency.

All read_* methods now have a source= attribute. read_file defaults to <???>,
read_string to <string>, read_dict to <dict>. Didn't provide any additional
introspection because it's not trivial and I want this patch to be committed at
last. This might be an additional advantage because a generic <string> or <dict>
name may motivate programmers to specify a more context-specific source name of
their own.

As for Duplicate*Error, I've added source= and lineno= to both. Didn't add line=
because it's useless, the error is very specific on what is wrong. Reading from
a dictionary does not pass the lineno for obvious reasons. Reading from files
and strings does.

The `filename' attribute and __init__ argument in ParsingError were
PendingDeprecated and a `source' attribute was introduced for consistency.

`allow_no_value` was moved to the 3rd place in the argument list for backwards
compatibility with Python 2.7. I didn't notice your change made to Python
2.7 as well, all other new arguments were added by me. This ensures there's no
  backwards compatibility involved in their case. That's why I left all of the
  new arguments as keyword only.

Documentation and unit tests updated.

BTW, if `allow_no_value` made it to 2.7 this means it has a bug in
SafeConfigParser.set. Try to set a valueless option, line 694 will raise an
exception. Should I provide you with a separate patch only to fix this for
2.7.1?

PS. I made Vim show me too long lines with a delicate red background. Way better
than the violent alternative ;) Plus, I only set it for the Python filetype.
msg113411 - (view) Author: Fred Drake (fdrake) (Python committer) Date: 2010-08-09 12:53
Patch committed on py3k branch in r83889.
History
Date User Action Args
2010-08-09 12:53:41fdrakesetstatus: open -> closed
resolution: accepted
messages: + msg113411
2010-08-08 19:02:50lukasz.langasetfiles: + issue9452.diff

messages: + msg113298
2010-08-08 19:01:47lukasz.langasetfiles: - issue9452.diff
2010-08-06 05:51:04fdrakesetmessages: + msg113075
2010-08-05 22:18:44eric.araujosetnosy: + eric.araujo
messages: + msg113059
2010-08-05 22:16:59lukasz.langasetmessages: + msg113058
2010-08-05 22:16:19lukasz.langasetfiles: + issue9452.diff

messages: + msg113057
2010-08-05 21:56:34lukasz.langasetfiles: - issue9452.diff
2010-08-05 07:10:05fdrakesetmessages: + msg112949
2010-08-04 13:09:40lukasz.langasetfiles: + issue9452.diff

messages: + msg112799
2010-08-04 13:08:14lukasz.langasetfiles: - issue9452.diff
2010-08-04 01:44:55fdrakesetmessages: + msg112736
2010-08-03 23:04:38lukasz.langasetmessages: + msg112720
2010-08-03 21:54:10eric.araujolinkissue2204 superseder
2010-08-03 14:47:57lukasz.langasetmessages: + msg112617
2010-08-03 14:38:14lukasz.langasetfiles: + issue9452.diff

messages: + msg112614
2010-08-03 14:22:00lukasz.langasetfiles: - issue9452.diff
2010-08-02 19:25:25lukasz.langasetfiles: + issue9452.diff

messages: + msg112520
2010-08-02 19:23:54lukasz.langasetfiles: - issue9452.diff
2010-08-02 18:41:50fdrakesetnosy: + fdrake
messages: + msg112510
2010-08-02 08:46:07lukasz.langasetmessages: + msg112429
2010-08-02 02:38:42brian.curtinsetnosy: + brian.curtin

messages: + msg112416
stage: patch review
2010-08-02 00:58:50lukasz.langasetfiles: + issue9452.diff

messages: + msg112413
2010-08-02 00:54:53lukasz.langasetfiles: - issue9452.diff
2010-08-02 00:35:30lukasz.langasetfiles: + issue9452.diff

nosy: + ezio.melotti
messages: + msg112410

keywords: + patch
2010-08-02 00:31:15lukasz.langacreate