Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow changing difflib._file_template character encoding. #46328

Closed
josephoenix mannequin opened this issue Feb 8, 2008 · 15 comments
Closed

Allow changing difflib._file_template character encoding. #46328

josephoenix mannequin opened this issue Feb 8, 2008 · 15 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@josephoenix
Copy link
Mannequin

josephoenix mannequin commented Feb 8, 2008

BPO 2052
Nosy @terryjreedy, @ezio-melotti, @bitdancer, @berkerpeksag, @serhiy-storchaka
Files
  • issue2052.diff
  • issue2052_html5.diff
  • issue2052_html5_v2.diff
  • issue2052_v2.diff
  • issue2052_v3.diff
  • issue2052_v4.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/berkerpeksag'
    closed_at = <Date 2015-03-14.23:19:45.915>
    created_at = <Date 2008-02-08.21:21:53.520>
    labels = ['type-feature', 'library']
    title = 'Allow changing difflib._file_template character encoding.'
    updated_at = <Date 2015-03-14.23:19:45.914>
    user = 'https://bugs.python.org/josephoenix'

    bugs.python.org fields:

    activity = <Date 2015-03-14.23:19:45.914>
    actor = 'berker.peksag'
    assignee = 'berker.peksag'
    closed = True
    closed_date = <Date 2015-03-14.23:19:45.915>
    closer = 'berker.peksag'
    components = ['Library (Lib)']
    creation = <Date 2008-02-08.21:21:53.520>
    creator = 'josephoenix'
    dependencies = []
    files = ['35244', '35245', '35273', '38474', '38483', '38484']
    hgrepos = []
    issue_num = 2052
    keywords = ['patch']
    message_count = 15.0
    messages = ['62208', '62209', '62211', '116949', '117234', '184726', '184751', '184755', '218479', '218726', '237383', '238050', '238097', '238104', '238105']
    nosy_count = 8.0
    nosy_names = ['terry.reedy', 'josephoenix', 'ezio.melotti', 'r.david.murray', 'python-dev', 'berker.peksag', 'serhiy.storchaka', 'hashimo']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue2052'
    versions = ['Python 3.5']

    @josephoenix
    Copy link
    Mannequin Author

    josephoenix mannequin commented Feb 8, 2008

    When passed unicode strings, difflib.HtmlDiff.make_file and make_table
    fail with a UnicodeEncodeError. Also, the html outputted by make_file
    seems to be hardcoded to use charset=ISO-8859-1 (line 1584 of difflib.py)

    @josephoenix josephoenix mannequin added type-bug An unexpected behavior, bug, or error stdlib Python modules in the Lib dir labels Feb 8, 2008
    @josephoenix
    Copy link
    Mannequin Author

    josephoenix mannequin commented Feb 8, 2008

    Oops, please close this. Apparently was fixed in 2.5.1, and I'm just
    behind.

    @josephoenix
    Copy link
    Mannequin Author

    josephoenix mannequin commented Feb 8, 2008

    After installing 2.5.1, the UnicodeEncodeError is gone, but the charset is
    still hardcoded in difflib._file_template. So, I guess this is still a
    separate bug.

    @jafo jafo mannequin assigned tim-one Mar 18, 2008
    @jafo jafo mannequin changed the title Lack of difflib.HtmlDiff unicode support Allow changing difflib._file_template character encoding. Mar 18, 2008
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Sep 20, 2010

    difflib._file_template is still hard-coded in py3k SVN. I'm unsure as to whether this is a feature request, a behaviour issue or not an issue at all, can someone please advise, thanks.

    @bitdancer
    Copy link
    Member

    I believe that charset is the standard default for html, which would make this a feature request.

    @bitdancer bitdancer added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Sep 23, 2010
    @terryjreedy
    Copy link
    Member

    In 3.2, it is line 1629:
    content="text/html; charset=ISO-8859-1" />

    That charset was only standard for Western European documents limited to that charset. Now, even such limited-char docs often use 'utf-8' (python.org does). The result of putting an incorrect charset designation in an html file is that the browser will not display the file correctly.

    For instance, I tried an input sequence containing line 'c\u3333', which displays in IDLE as  'c㌳'. The string from HtmlDill.make_file() must be written to a file opened with encoding='utf-8', not the above or equivalent. Firefox then reads the three bytes of the utf-8 encoding as three separate characters and displays 'c㌳'. To check:
    >>> 'c㌳'.encode().decode(encoding='Latin-1')
    'cã\x8c³'

    To me the clear implication of "returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted." is that the resulting file will display correctly. The current template charset prevents that, changing to 'utf-8' results in a file that displays correctly (tested). So the current behavior and the code that causes it is to me clearly a bug. I would like to fix it before 2.7.4 comes out.

    @terryjreedy
    Copy link
    Member

    After thinking about it more, the real problem is that the charset setting must match the chars used and how they re encoded, so no one setting is right for all uses. An alternative to changing the default in existing versions is to at least document what it is and explain how to work around it with .replace -- for instance output.replace('ISO-8859-1', 'utf-8'). I agree that adding a parameter (charset=xxx) is a new feature.

    @ezio-melotti
    Copy link
    Member

    I haven't looked at the code, but if an HTML page is generated it should probably be updated to use HTML5 and <meta charset="utf-8">.

    @berkerpeksag
    Copy link
    Member

    Attaching two patches:

    bpo-2052.diff adds a "charset" keyword argument to HtmlDiff.make_file().

    issue2052_html5.diff also adds a "charset" keyword argument to HtmlDiff.make_file() and updates the markup of HtmlDiff() to HTML5. I tested it with Firefox 29 and Chrome 34.

    @berkerpeksag
    Copy link
    Member

    Attaching a new version of issue2052_html5.diff. Changes:

    • Switch from px to em in CSS
    • Cleanup markup a bit (e.g. delete redundant colgroup tags)

    @serhiy-storchaka
    Copy link
    Member

    May be updating the markup to HTML5 should be different issue. issue2052_html5_v2.diff not only adds charset in HTML5 format, it totally changes the template. This definitely a separate issue.

    @berkerpeksag
    Copy link
    Member

    Here is an updated patch. Thanks for the review, Serhiy. I will open a new issue for the HTML 5 part of the patch.

    @serhiy-storchaka
    Copy link
    Member

    LGTM

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Mar 14, 2015

    New changeset e058423d3ca4 by Berker Peksag in branch 'default':
    Issue bpo-2052: Add charset parameter to HtmlDiff.make_file().
    https://hg.python.org/cpython/rev/e058423d3ca4

    @berkerpeksag
    Copy link
    Member

    Thanks Serhiy.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants