Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode() #66520

Closed
vstinner opened this issue Sep 1, 2014 · 8 comments
Closed

Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode() #66520

vstinner opened this issue Sep 1, 2014 · 8 comments
Labels
type-feature A feature request or enhancement

Comments

@vstinner
Copy link
Member

vstinner commented Sep 1, 2014

BPO 22324
Nosy @loewis, @vstinner, @serhiy-storchaka
Files
  • wchar.patch
  • wchar_posixmodule.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2015-10-02.21:06:01.782>
    created_at = <Date 2014-09-01.22:39:50.361>
    labels = ['type-feature']
    title = 'Use PyUnicode_AsWideCharString() instead of PyUnicode_AsUnicode()'
    updated_at = <Date 2015-10-02.21:06:01.782>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2015-10-02.21:06:01.782>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2015-10-02.21:06:01.782>
    closer = 'vstinner'
    components = []
    creation = <Date 2014-09-01.22:39:50.361>
    creator = 'vstinner'
    dependencies = []
    files = ['36522', '36523']
    hgrepos = []
    issue_num = 22324
    keywords = ['patch']
    message_count = 8.0
    messages = ['226247', '226248', '226250', '226295', '226298', '226517', '228586', '228588']
    nosy_count = 3.0
    nosy_names = ['loewis', 'vstinner', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'wont fix'
    stage = None
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue22324'
    versions = ['Python 3.5']

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 1, 2014

    I would like to deprecate PyUnicode_AsUnicode(), see the issue bpo-22271 for the rationale (hint: memory footprint).

    To deprecate PyUnicode_AsUnicode(), we should stop using it internally.

    The attached patch is a work-in-progress patch, untested on Windows (only tested on Linux). It gives an idea of how many files should be modified.

    TODO:

    • Modify posixmodule.c: I don't understand how the Argument Clinic generates the call to PyUnicode_AsUnicode() when the parameter type is declared as "unicode". What is the "unicode" type? Where is the code generating the call to PyUnicode_AsUnicode()?

    • Modify a few other files.

    @vstinner vstinner added the type-feature A feature request or enhancement label Sep 1, 2014
    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 1, 2014

    Oh, I didn't generated wchar.patch correctly: please ignore changes in the unicodeobject.c files. These changes are part of issues bpo-22271 and bpo-22323.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 1, 2014

    wchar_posixmodule.patch: patch for posixmodule.c. Sorry, the code calling PyUnicode_AsUnicode() was not generated by Argument Clinic in fact.

    @serhiy-storchaka
    Copy link
    Member

    Will not this cause performance regression? When we hardly work with wchar_t-based API, it looks good to cache encoded value.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 3, 2014

    Will not this cause performance regression? When we hardly work with wchar_t-based API, it looks good to cache encoded value.

    Yes, it will be slower. But I prefer slower code with a lower memory footprint. On UNIX, I don't think that anyone will notice the difference.

    My concern is that the cache is never released. If the conversion is only needed once at startup, the memory will stay until Python exits. It's not really efficient.

    On Windows, conversion to wchar_t* is common because Python uses the Windows wide character API ("W" API vs "A" ANSI code page API). For example, most access to the filesystem use wchar_t* type.

    On Python < 3.3, Python was compiled in narrow mode and so Unicode was already using wchar_t* internally to store characters. Since Python 3.3, Python uses a more compact representation. wchar_t* shares Unicode data only if sizeof(wchar_t*) == KIND where KIND is 1, 2 or 4 bytes per character. Examples: "\u20ac" on Windows (16 bits wchar_t) or "\U0010ffff" on Linux (32 bits wchar_t) .

    @serhiy-storchaka
    Copy link
    Member

    The cache is released when the string is released. While the string exists it's wchar_t representation can be needed again. That is for what the cache exists.

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 5, 2014

    The cache is released when the string is released. While the string exists it's wchar_t representation can be needed again. That is for what the cache exists.

    I know. But I don't want to waste memory for this cache. I want to stop using it. IMO the performance overhead will be null.

    In which use case do you think that the overhead of not using the cache would be important enough?

    @serhiy-storchaka
    Copy link
    Member

    In which use case do you think that the overhead of not using the cache would be important enough?

    I suppose in the same use case that memory overhead of using the cache is important enough.

    We need results of performance and memory consumption effect of these changes in a wide range of programs.

    @vstinner vstinner closed this as completed Oct 2, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants