Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConfigParser load speedup #51362

Closed
aioryi mannequin opened this issue Oct 12, 2009 · 8 comments
Closed

ConfigParser load speedup #51362

aioryi mannequin opened this issue Oct 12, 2009 · 8 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@aioryi
Copy link
Mannequin

aioryi mannequin commented Oct 12, 2009

BPO 7113
Nosy @brettcannon, @voidspace, @briancurtin, @ambv
Files
  • speedfix_71564.patch: Patch to speed up loading of multi-line options.
  • issue7113.diff: Patch for Python 3.2
  • issue7113_without_itertools.diff: Patch for Python 3.2 without itertools dependency
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2010-07-26.02:31:21.086>
    created_at = <Date 2009-10-12.12:43:03.413>
    labels = ['library', 'performance']
    title = 'ConfigParser load speedup'
    updated_at = <Date 2010-07-26.02:31:21.080>
    user = 'https://bugs.python.org/aioryi'

    bugs.python.org fields:

    activity = <Date 2010-07-26.02:31:21.080>
    actor = 'brian.curtin'
    assignee = 'none'
    closed = True
    closed_date = <Date 2010-07-26.02:31:21.086>
    closer = 'brian.curtin'
    components = ['Library (Lib)']
    creation = <Date 2009-10-12.12:43:03.413>
    creator = 'aioryi'
    dependencies = []
    files = ['15109', '18166', '18204']
    hgrepos = []
    issue_num = 7113
    keywords = ['patch']
    message_count = 8.0
    messages = ['93895', '111399', '111495', '111497', '111500', '111569', '111578', '111590']
    nosy_count = 6.0
    nosy_names = ['brett.cannon', 'aioryi', 'michael.foord', 'brian.curtin', 'BreamoreBoy', 'lukasz.langa']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue7113'
    versions = ['Python 3.2']

    @aioryi
    Copy link
    Mannequin Author

    aioryi mannequin commented Oct 12, 2009

    Current implementation (r71564) uses "'%s\n%s' % (old_val, new_line)" to
    merge multi-line options into one string.
    For options with many lines, this wastes a lot of CPU power.

    Attached patch against r71564 fixes this problem by first building a
    list of lines for each loaded option, and after reading the whole file,
    merging them with the already loaded data. In that way, the '\n'.join()
    can be performed once.
    Patched ConfigParser.py works against test/test_cfgparser.py (and Python
    2.5)

    We have witnessed a reduction from 4 hours to 3 seconds loading time
    with Python 2.6 and an option of 800000 lines.

    @aioryi aioryi mannequin added stdlib Python modules in the Lib dir performance Performance or resource usage labels Oct 12, 2009
    @ambv
    Copy link
    Contributor

    ambv commented Jul 23, 2010

    Won't happen for Python 2.x but the idea is good. The original patch won't apply for Py3k so I implemented this approach from scratch for Python 3.2. Patch included.

    Some microbenchmarks of mine show:

    • a 3-5% slowdown for very small files (we can ignore this since it's still a blink of an eye)
    • a 10% speedup in usual cases (files the size of a typical Pylons app configuration or a typical buildout)
    • more so in unrealistic ones (many values with > 100 lines each)

    Brett, a quick summary of changes:

    • the idea to introduce a list of lines instead of gluing '%s\n%s' during file reading was introduced
    • this approach needs an additional phase of joining the lists to strings after reading is done.
    • I used itertools.chain to add the default section for iteration purposes without having to create a separate list. I hope the additional dependency on itertools is no problem.
    • I reformatted a bit the changes by fred.drake in r78233 (he added support for valueless keys)
    • some other slight formatting changes apply as well
    • in the tests I've added a testcase for a file that uses multi-line values heavily. The file is generated temporarily since it's 127kB in size. Creation takes a fraction of a second so shouldn't be noticeable.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 24, 2010

    @Łukasz: I patched the unit test file only and ran it and all tests passed, I assume that this behaviour is incorrect.

    @ambv
    Copy link
    Contributor

    ambv commented Jul 24, 2010

    Thanks, Mark. The tests should pass for both the old version and the patched one.

    The change in the code involves performance, not functionality. I didn't want to set assertions that are time based so naturally the new test also runs on the old implementation. The thing is that it runs slower.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 24, 2010

    Łukasz in that case I think we should get this committed as the patch is small and looks clean. I assume that this will be backported as 2.7 and 3.1 are likely to be around for some years yet.

    @ambv
    Copy link
    Contributor

    ambv commented Jul 25, 2010

    In msg111399 I remarked the new itertools dependency. It seems it is in fact problematic because when building Python from scratch setup.py which is used to build C extensions is using configparser. And one of the C-only modules is itertools :)

    Attached a new patch that doesn't include itertools dependencies. The only difference with the last one is:

    96d95
    < import itertools
    549,550c548,550
    < all_sections = itertools.chain([self._defaults],
    < self._sections.values())
    ---

        all_sections = [self.\_defaults]
        all_sections.extend(self.\_sections.values()) 
    

    @ambv
    Copy link
    Contributor

    ambv commented Jul 25, 2010

    Patch from py3k root. Some minor formatting changes suggested by Ezio Melotti included.

    @briancurtin
    Copy link
    Member

    Committed to py3k in r83154 and release27-maint in r83155.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage stdlib Python modules in the Lib dir
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants