Message 112429 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	lukasz.langa
Recipients	brian.curtin, ezio.melotti, lukasz.langa
Date	2010-08-02.08:46:04
SpamBayes Score	6.7168493e-15
Marked as misclassified	No
Message-id	<1280738770.7.0.292802440385.issue9452@psf.upfronthosting.co.za>
In-reply-to

Content
Good questions, thanks! The answers will come useful for documentation and later hype :) READING CONFIGURATION FROM A DATA STRUCTURE ------------------------------------------- This is all about templating a decent set of default values. The major use case I'm using this for (with a homebrew SafeConfigParser subclass at the moment) is to provide in one place a set of defaults for the whole configuration. The so-called `defaults=` that we have at the moment don't fit this space well enough because they provide values that can (and will) jump into every section. This made them useless for me twice: - when configuring access to external components in a fairly complex system; abstracting out the useless details the template I was looking for was [name-server] port= protocol= verbose= [workflow-manager] port= protocol= verbose= [legacy-integration] port= protocol= verbose= # there were about 15 of these - second case was a legacy CSV translation system (don't ask!). An abstract of a config with conflicting keys: [company1-report] delimiter=, amount_column= amount_type= description_column= description_type= ignore_first_line=True [company2-report] delimiter=; amount_column= amount_type= description_column= description_type= ignore_first_line=False # and so on for ~10 entries As you can see, in both examples `defaults=` couldn't be a good enough template. The reason I wanted these default values to be specified in the program was two-fold: 1. to be able to use the configuration without worrying about NoOptionErrors or fallback values on each get() invocation 2. to handle customers with existing configuration files which didn't include specific sections; if they didn't need customization they could simply use the defaults provided I personally like the dictionary reading method but this is a matter of taste. Plus, .fromstring() is already used in unit tests :) DUPLICATE OPTION VALIDATION --------------------------- Firstly, I'd like to stress that this validation does NOT mean that we cannot update keys once they appear in configuration. Duplicate option detection works only while parsing a single file, string or dictionary. In this case duplicates are a configuration error and should be notified to the user. You are right that for a programmer accepting the last value provided is acceptable. In this case the impact should be on the user who might not feel the same. If his configuration is ambiguous, it's best to use the Zen: "In the face of ambiguity, refuse the temptation to guess." This is very much the case for large configuration files (take /etc/ssh/sshd_config or any real life ftpd config, etc.) when users might miss the fact that one option is uncommented in the body or thrown in at the end of the file by another admin or even the user himself. Users might also be unaware of the case-insensitivity. These two problems are even more likely to cause headaches for the dictionary reading algorithm where there actually isn't an order in the keys within a section and you can specify a couple of values that represent the same key because of the case-insensitivity. Plus, this is going to be even more visible once we introduce mapping protocol access when you can add a whole section with keys using the dictionary syntax. Another argument is that there is already section duplicate validation but it doesn't work when reading from files. This means that the user might add two sections of the same name with contradicting options. SUMMARY ------- Reading from strings or dictionaries provides an additional way to feed the parser with values. Judging from the light complexity of both methods I would argue that it's beneficial to configparser users to have well factored unit tested methods for these tasks so they don't have to reimplement them over and over again when the need arises. In terms of validation, after you remark and thinking about it for a while, I think that the best path may be to let programmers choose during parser initialization whether they want validation or not. This would be also a good place to include section duplicate validation during file reading. Should I provide an updated patch? After a couple of years of experience with external customers configuring software I find it better for the software to aid me in customer support. This is the best solution when users can help themselves. And customers (and we ourselves, too!) do stupid things all the time. And so, specifying a default set of sane values AND checking for duplicates within the same section helps with that.

Good questions, thanks! The answers will come useful for documentation and later hype :)

READING CONFIGURATION FROM A DATA STRUCTURE
-------------------------------------------

This is all about templating a decent set of default values. The major use case I'm using this for (with a homebrew SafeConfigParser subclass at the moment) is to provide *in one place* a set of defaults for the whole configuration. The so-called `defaults=` that we have at the moment don't fit this space well enough because they provide values that can (and will) jump into every section. This made them useless for me twice:
- when configuring access to external components in a fairly complex system; abstracting out the useless details the template I was looking for was

[name-server]
port=
protocol=
verbose=

[workflow-manager]
port=
protocol=
verbose=

[legacy-integration]
port=
protocol=
verbose=

# there were about 15 of these

- second case was a legacy CSV translation system (don't ask!). An abstract of a config with conflicting keys:

[company1-report]
delimiter=,
amount_column=
amount_type=
description_column=
description_type=
ignore_first_line=True

[company2-report]
delimiter=;
amount_column=
amount_type=
description_column=
description_type=
ignore_first_line=False

# and so on for ~10 entries

As you can see, in both examples `defaults=` couldn't be a good enough template. The reason I wanted these default values to be specified in the program was two-fold:

1. to be able to use the configuration without worrying about NoOptionErrors or fallback values on each get() invocation

2. to handle customers with existing configuration files which didn't include specific sections; if they didn't need customization they could simply use the defaults provided

I personally like the dictionary reading method but this is a matter of taste. Plus, .fromstring() is already used in unit tests :)

DUPLICATE OPTION VALIDATION
---------------------------

Firstly, I'd like to stress that this validation does NOT mean that we cannot update keys once they appear in configuration. Duplicate option detection works only while parsing a single file, string or dictionary. In this case duplicates are a configuration error and should be notified to the user.

You are right that for a programmer accepting the last value provided is acceptable. In this case the impact should be on the user who might not feel the same. If his configuration is ambiguous, it's best to use the Zen: "In the face of ambiguity, refuse the temptation to guess."

This is very much the case for large configuration files (take /etc/ssh/sshd_config or any real life ftpd config, etc.) when users might miss the fact that one option is uncommented in the body or thrown in at the end of the file by another admin or even the user himself.

Users might also be unaware of the case-insensitivity.

These two problems are even more likely to cause headaches for the dictionary reading algorithm where there actually isn't an order in the keys within a section and you can specify a couple of values that represent the same key because of the case-insensitivity. Plus, this is going to be even more visible once we introduce mapping protocol access when you can add a whole section with keys using the dictionary syntax.

Another argument is that there is already section duplicate validation but it doesn't work when reading from files. This means that the user might add two sections of the same name with contradicting options.

SUMMARY
-------
Reading from strings or dictionaries provides an additional way to feed the parser with values. Judging from the light complexity of both methods I would argue that it's beneficial to configparser users to have well factored unit tested methods for these tasks so they don't have to reimplement them over and over again when the need arises.

In terms of validation, after you remark and thinking about it for a while, I think that the best path may be to let programmers choose during parser initialization whether they want validation or not. This would be also a good place to include section duplicate validation during file reading. Should I provide an updated patch?

After a couple of years of experience with external customers configuring software I find it better for the software to aid me in customer support. This is the best solution when users can help themselves. And customers (and we ourselves, too!) do stupid things all the time. And so, specifying a default set of sane values AND checking for duplicates within the same section helps with that.

History
Date	User	Action	Args
2010-08-02 08:46:11	lukasz.langa	set	recipients: + lukasz.langa, ezio.melotti, brian.curtin
2010-08-02 08:46:10	lukasz.langa	set	messageid: <1280738770.7.0.292802440385.issue9452@psf.upfronthosting.co.za>
2010-08-02 08:46:07	lukasz.langa	link	issue9452 messages
2010-08-02 08:46:04	lukasz.langa	create