Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-process 2to3 #47698

Closed
nedds mannequin opened this issue Jul 25, 2008 · 17 comments
Closed

Multi-process 2to3 #47698

nedds mannequin opened this issue Jul 25, 2008 · 17 comments
Labels
performance Performance or resource usage topic-2to3

Comments

@nedds
Copy link
Mannequin

nedds mannequin commented Jul 25, 2008

BPO 3448
Nosy @amauryfa, @pitrou, @benjaminp
Files
  • multiProcess.diff: Diff for refactor.py changes
  • multiProcess2.diff: Diff for refactor with situational processing import
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-04-01.22:41:23.192>
    created_at = <Date 2008-07-25.21:06:38.064>
    labels = ['expert-2to3', 'performance']
    title = 'Multi-process 2to3'
    updated_at = <Date 2009-04-01.22:41:23.159>
    user = 'https://bugs.python.org/nedds'

    bugs.python.org fields:

    activity = <Date 2009-04-01.22:41:23.159>
    actor = 'benjamin.peterson'
    assignee = 'collinwinter'
    closed = True
    closed_date = <Date 2009-04-01.22:41:23.192>
    closer = 'benjamin.peterson'
    components = ['2to3 (2.x to 3.x conversion tool)']
    creation = <Date 2008-07-25.21:06:38.064>
    creator = 'nedds'
    dependencies = []
    files = ['10985', '10986']
    hgrepos = []
    issue_num = 3448
    keywords = ['patch']
    message_count = 17.0
    messages = ['70277', '70280', '74108', '74109', '74121', '74144', '74145', '74155', '74200', '74226', '74272', '74273', '74429', '74437', '74486', '74615', '85103']
    nosy_count = 5.0
    nosy_names = ['collinwinter', 'amaury.forgeotdarc', 'pitrou', 'benjamin.peterson', 'nedds']
    pr_nums = []
    priority = 'normal'
    resolution = 'accepted'
    stage = None
    status = 'closed'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue3448'
    versions = []

    @nedds
    Copy link
    Mannequin Author

    nedds mannequin commented Jul 25, 2008

    Here is a working, multiprocess version of 2to3 with a few caveats.
    First, you need to already have the processing module installed for this
    to work. If we don't want to include processing in some way, I think I
    can modify this to only import processing and use the Process method if
    the user wants to run it with more than one process. Also, I know that
    logger is supposed to be thread safe, so I am correct in assuming that
    this means it is also multi-process safe? This fix delegates the fixing
    of files to a designated number of processes, so on a multi-core
    machine, it is significantly faster. It may be appropriate to add a test
    suite for it, but I have not yet done so. I've tested it on my dual-core
    laptop running Ubuntu and a quad-core mac and it appears to be working
    fine. I don't know if the use of tempfile was the right choice, but it
    seemed reasonable. Another possibility is to instead using a processing
    Queue and pass it the output, filename, and input rather than calling
    write_file. Also, the join timeout I used is completely arbitrary
    because I don't really have a sense of what a reasonable value for it
    should be.

    @nedds nedds mannequin assigned collinwinter Jul 25, 2008
    @nedds nedds mannequin added topic-2to3 performance Performance or resource usage labels Jul 25, 2008
    @nedds
    Copy link
    Mannequin Author

    nedds mannequin commented Jul 25, 2008

    Here is a version that only imports processing if the multi-process
    option is specified. I don't know if this is the most efficient way it
    can be done, and I think there's a better way to do it, but this works.

    @nedds
    Copy link
    Mannequin Author

    nedds mannequin commented Oct 1, 2008

    Is there still any interest in this Collin? Is there anything else you
    need me to do for it?

    @benjaminp
    Copy link
    Contributor

    Nick, is there a way you could isolate the process functionality in a
    RefactoringTool subclass? It's an interesting idea, but I don't it needs
    to infect the main library.

    @collinwinter
    Copy link
    Mannequin

    collinwinter mannequin commented Oct 1, 2008

    I believe the only issue I recall was that the patch didn't work
    out-of-the-box for Python 2.6 (changed imports, PEP-8 compliance changes
    in the multiprocess module). Has that been fixed?

    I disagree with Benjamin: this is an import speed increase, and I don't
    see the point in adding the needless complexity of a subclass. The user
    won't notice any difference, and anyone wanting to use this as a library
    will want the faster version.

    @benjaminp
    Copy link
    Contributor

    I don't think we should force people using it as a library to go
    multiprocess. Also, it's trivial to just change the name of the class
    used if that is wanted.

    @nedds
    Copy link
    Mannequin Author

    nedds mannequin commented Oct 1, 2008

    The currently attached patch works in Python2.5 not Python2.6, so I will
    update it for 2.6 when I get the chance. But as it is currently written,
    the default behavior is not multiprocess. Instead, if you want
    multiprocess, you specify how many processes you want when you run 2to3
    from the command line. And as long as you've got multiple files to run
    2to3 on and multiple cores, doing multiprocess 2to3 is significantly faster.

    @collinwinter
    Copy link
    Mannequin

    collinwinter mannequin commented Oct 2, 2008

    "I don't think we should force people using it as a library to go
    multiprocess."

    I don't understand this. What downsides do you perceive in
    multiprocessing support? Multiprocessing is a significant speed-up for
    2to3 on multicore systems.

    @benjaminp
    Copy link
    Contributor

    I'm just saying that a client of lib2to3 shouldn't have to use multiple
    processes. Just as long as the multiple processes are optional, I'm
    happy. :)

    @collinwinter
    Copy link
    Mannequin

    collinwinter mannequin commented Oct 3, 2008

    You have yet to articulate a reason for that preference.

    @benjaminp
    Copy link
    Contributor

    Because I would like to use lib2to3 without the complexity of multiple
    processes. That it is a good performance boost is excellent, but I don't
    think it should be a required part of using the library.

    @pitrou
    Copy link
    Member

    pitrou commented Oct 3, 2008

    I think it's reasonable to only enable multiprocessing if the adequate
    command-line option has been set. It's how make already works (next
    time you compile Python, try make -jN where N is your number of CPU
    cores).

    @collinwinter
    Copy link
    Mannequin

    collinwinter mannequin commented Oct 7, 2008

    Benjamin, what "complexity" did you encounter when trying to use lib2to3
    in your own work? Unless there's a concrete use-case where the mere
    existence of multiprocessing support (as opposed to actually enabling
    that support) made a tangible project harder, I say we leave it in. I
    can't imagine someone making the conscious choice to slow down their own
    lib2to3-based refactoring tool by 2x; if those people appear later,
    we'll deal with their concerns at that time.

    Nick: if you don't have time to make this work in 2.6, I can do so. The
    patch will also need doc fixes and tests, though the core functionality
    is fine.

    @amauryfa
    Copy link
    Member

    amauryfa commented Oct 7, 2008

    I suggest that when using lib2to3 as a library, multiprocessing is not
    enabled by default; there may be uses of the library that are
    incompatible with multiprocessing.

    It may be enabled by default when using it from the command line (or the
    lib2to3.main module). But which default number of processes would this use?

    Concerning the patch itself:

    • the line "from processing import Process" seems suspect.
    • Did you consider using something as simple as:
      pool = multiprocessing.Pool(self.options.num_processes)
      pool.map(self.refactor_file, fullnames)
      It should do all the job: start processes, queue tasks, wait for results.

    @benjaminp
    Copy link
    Contributor

    I'm not opposed to having the support available. I just don't what it
    enabled by default.

    @nedds
    Copy link
    Mannequin Author

    nedds mannequin commented Oct 10, 2008

    I had very little experience with the processing module prior to the
    creation of this patch, and because pool objects are listed last in the
    documentation, I did not read about them because I saw a way to achieve
    what I wanted using Process. But having looked at the documentation for
    Pool, I think you are correct that it would be a much cleaner solution
    to the problem, as I was essentially implementing a Pool myself. I will
    post a new patch to reflect this change tomorrow. I will also submit the
    patch to take advantage of the fact that the multiprocessing module is
    included in Python2.6, as opposed to my prior patch which was designed
    for Python2.5 which required the user to get the processing module. And
    in the patch, as before, multiprocess will not be default, but it will
    be something the user can enable via the command line.

    @benjaminp
    Copy link
    Contributor

    I added some support for concurrent running in r70999.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage topic-2to3
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants