Title: ConfigParser should never write broken configurations
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: lukasz.langa, r.david.murray, spaceone, terry.reedy, vstinner
Priority: normal Keywords:

Created on 2015-11-24 15:31 by spaceone, last changed 2015-11-27 00:18 by spaceone.

Messages (12)
msg255270 - (view) Author: SpaceOne (spaceone) Date: 2015-11-24 15:31
>>> from configparser import ConfigParser
>>> from io import StringIO
>>> from configparser import ConfigParser
>>> c = ConfigParser()
>>> c.add_section('foo]\nbar=baz\n[bar')
>>> fd = StringIO()
>>> c.write(fd)
>>> print(fd.getvalue())

User input should always be validated.

At least a ValueError should be raised if add_section() is called with a string containing anything like ']\x00\n[' or any other non-printable string. As this will always create a broken configuration or might lead to ini-injections.

Otherwise ConfigParser cannot be used to write new config files without having deeper knowledge about the implementation.

See also:
msg255272 - (view) Author: SpaceOne (spaceone) Date: 2015-11-24 15:32
I would have expected something like ValueError('A section must not contain any of ...') or at least that the characters are simply stripped.
msg255305 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-24 23:42
This strikes me as an overt bug.  .add_section should add one new empty section, not a section with an item and a second section.  Since a section name cannot contain \n, I would agree that passing a string containing \n should raise ValueError("Section names cannot contain newlines.").  Since anything else without ']' is valid and since even that can be accommodated with a custom section name re, that is the only check that should be done.
msg255324 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2015-11-25 07:57
Terry: "Since anything else without ']' is valid (...)"

A Python script can be used to generate a configuration read by another application. This application can more more strict on the configuration format than Python, so I would prefer to deny '\n', '[' and ']' characters in section names.

I'm not sure that it's ok to modify Python < 3.6 since it can break applications relying on this ugly "feature". I propose to only modify Python 3.6.

If you need strict ConfigParser, you can inherit from the class to override add_section() to add checks on the section name.

@SpaceOne: Are you interested to work on a patch?
msg255360 - (view) Author: SpaceOne (spaceone) Date: 2015-11-25 14:42
I added also a rejection of '\r' and '\x00':

It's only my opinion but I would prefer to reject all of these non printable characters:

But as the _parse currently detects them as valid and your opinion is probably a very different one I will not add these characters.
msg255373 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-25 18:03
Newline (\n) and possibly \x00, if it necessarily causes an actual problem, are the only characters that we might reject by default.  Rejecting anything else is unwarrented editorializing about what people 'should' use.  As you said, people who want something stricter can replace .add_section.  (This would be most useful in a group or corporate setting which might, for instance, want to severely limit the character set allowed.)

In particular, I showed in #20923 how to replace .SECTCRE to make "[Test[2]_foo]" yield "Test[2]_foo".  The OP for that issue filed it after seeing an application that used such section names and they are not at all unreasonable. Python should be able to continue writing .ini files for that application as well as any others.

Victor, your point that even a minimal change could break working code is a good one.  It suggests to me that we should maybe do nothing.  This issue is motivated by a theoretical principle "User input should always be validated" that is not a Python design principle (what is valid, who decides), not by actual problems in the field.
msg255374 - (view) Author: SpaceOne (spaceone) Date: 2015-11-25 18:14
Isn't is an actual problem in the field? We had a vulnerability in our code due to this as we only sanitized the config values and didn't recognized that add_section() does no validation of input.
msg255387 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-25 20:47
Can you explain how passing \n createda vulnerability (to who, doing what) that flagging \n would prevent?  Your opening example (nicely presented, by the way) shows that passing \n allows one to do with one call what would otherwise take (in the case of the example) three calls. But the result is identical, so the shortcut seems harmless.
msg255388 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-25 21:36
The issue would be if the section name came from user input.  Then an attacker could craft a section name that would result in inserting arbitrary names and values into the config file.
msg255400 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2015-11-26 04:56
We all know that blindly inserting external data into a database can be a bad idea.  But raising ValueError if the data contains \n barely scratches the surface of a real defense.  The external data should be checked before passing it to .add_section or as part of a derived method in a subclass.  I already suggested the possibility of allowing only a restricted set of letter characters.  Such a check comes after defending against the possibility of someone submitting 'a'*1000000 as, in this case, a section name.

configparser is permissive by design, not by accident.  The un-abbreviated verbose re for ConfigParser.SECTCRE say so.
  (?P<header>[^]]+) # very permissive!
msg255424 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-26 16:56
I view this as similar to the corresponding issue with email headers, where we fixed a similar security issue.  The special danger of \n is that it allows you to create a *new* header, or in this case section, with an arbitrary value, possibly overriding an existing section and thus changing the behavior of the program in an exploitable way.  This is *far* easier to exploit than the ability to introduce arbitrary data into the section name itself.  Good security involves concentric rings of defense, and one should almost always be more secure by default when it has a small usability impact.  In this case, there is no legitimate use for \n in a section name, so the only usability impact would be if some weird program out there was actually making use of this for some reason, against all reasonable logic :).  Which is why we are suggesting changing it only in 3.6.

\x00 is problematic (though somewhat less so) for the same reason, as many file readers will treat it as equivalent to end of line and allow a similar exploit.  \r, \f, and \x1c-\x1e should also be blocked, but otherwise we should probably ignore non-printables for backward compatibility reasons (there we move further into the usability impact area).
msg255442 - (view) Author: SpaceOne (spaceone) Date: 2015-11-27 00:18
Of course both of you have reasonable arguments.
For compatibility with overridden SECTRE attributes it should not raise ValueError for characters like [ and ]. (too bad that SECTRE is a public attribute otherwise it could also be used to validate the name (SECTRE.match('[%s]')). What if somebody changed SECTRE to multiline? Then even rejecting '\n' would break compatibility.
But: How often does this happen? In open source projects it seems none. A and google search exposed that no project does this.

Terry, I completely agree with your argument "that blindly inserting external input into a database is bad idea". But in the real world it happens that there are many applications out which doesn't validate what they pass into .add_section(). (Do you want me to give you a list of python projects which are either broken or vulnerable?). In my opinion this is dangerous, as well as not validating HTTP/Mail/MIME headers for such characters and so on.
What's the goal of python here? Giving programmers nice utilities which have security considerations in its software design by default or giving everything up to the programmer which is forced to never trust the stdlib and always have to read the source code it uses?

As I understand when I read the documentation is that config parser is loosely based on M$ INI files and as the name says it is for configuration files. Usually(!) configuration files are human readable files editable with an editor. Disallowing non-printable characters would have been the best option in the first release of config parser.
From my experience it is good to restrict things from the beginning and make them overrideable to be more relaxed if this is really needed.

And regarding issue20923: I think it would be a great feature to include the code change instead of changing the documentation. In my research about add_section() I found some projects which uses URI's as section name. As you know the WWW is evolving and actually http://[::1]/ is a valid URI nowadays. If this would be changed these implementations will not have to overwrite SECTRE in the future and they also won't run into that bug one day.

I adapted my commit to only disallow \r \n and \x00. [ ] are allowed for customization of SECTRE.
Date User Action Args
2015-11-27 00:18:56spaceonesetmessages: + msg255442
2015-11-26 16:56:07r.david.murraysetmessages: + msg255424
2015-11-26 04:56:36terry.reedysetmessages: + msg255400
2015-11-25 21:36:15r.david.murraysetnosy: + r.david.murray
messages: + msg255388
2015-11-25 20:47:58terry.reedysetmessages: + msg255387
2015-11-25 18:14:42spaceonesetmessages: + msg255374
2015-11-25 18:03:47terry.reedysetmessages: + msg255373
2015-11-25 14:42:22spaceonesetmessages: + msg255360
2015-11-25 07:57:38vstinnersetnosy: + vstinner

messages: + msg255324
versions: - Python 2.7, Python 3.5
2015-11-24 23:42:09terry.reedysetnosy: + terry.reedy
messages: + msg255305
2015-11-24 15:40:22r.david.murraysetnosy: + lukasz.langa

versions: - Python 3.2, Python 3.3, Python 3.4
2015-11-24 15:32:51spaceonesetmessages: + msg255272
2015-11-24 15:31:44spaceonecreate