This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author karlcow
Recipients ezio.melotti, karlcow, nowasky.jr, vstinner
Date 2021-01-03.08:22:14
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1609662134.54.0.633908039041.issue41748@roundup.psfhosted.org>
In-reply-to
Content
Ezio,

TL,DR: Testing in browsers and adding two tests for this issue. 
       Should I create a PR just for the tests?

https://github.com/python/cpython/blame/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/test/test_htmlparser.py#L479-L485


A: comma without spaces
-----------------------


Tests for browsers:
data:text/html,<!doctype html><div class=bar,baz=asd>text</div>

Serializations:
* Firefox, Gecko (86.0a1 (2020-12-28) (64-bit)) 
* Edge, Blink (Version 89.0.752.0 (Version officielle) Canary (64 bits))
* Safari, WebKit (Release 117 (Safari 14.1, WebKit 16611.1.7.2))

Same serialization in these 3 rendering engines
<div class="bar,baz=asd">text</div>


Adding:

    def test_comma_between_unquoted_attributes(self):
        # bpo 41748
        self._run_check('<div class=bar,baz=asd>',
                        [('starttag', 'div', [('class', 'bar,baz=asd')])])


❯ ./python.exe -m test -v test_htmlparser

…
test_comma_between_unquoted_attributes (test.test_htmlparser.HTMLParserTestCase) ... ok
…

Ran 47 tests in 0.168s

OK

== Tests result: SUCCESS ==

1 test OK.

Total duration: 369 ms
Tests result: SUCCESS


So this is working as expected for the first test.


B: comma with spaces
--------------------

Tests for browsers:
data:text/html,<!doctype html><div class=bar, baz=asd>text</div>

Serializations:
* Firefox, Gecko (86.0a1 (2020-12-28) (64-bit)) 
* Edge, Blink (Version 89.0.752.0 (Version officielle) Canary (64 bits))
* Safari, WebKit (Release 117 (Safari 14.1, WebKit 16611.1.7.2))

Same serialization in these 3 rendering engines
<div class="bar" ,baz="asd">text</div>


Adding
    def test_comma_with_space_between_unquoted_attributes(self):
        # bpo 41748
        self._run_check('<div class=bar ,baz=asd>',
                        [('starttag', 'div', [
                            ('class', 'bar'),
                            (',baz', 'asd')])])


❯ ./python.exe -m test -v test_htmlparser


This is failing.

======================================================================
FAIL: test_comma_with_space_between_unquoted_attributes (test.test_htmlparser.HTMLParserTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/karl/code/cpython/Lib/test/test_htmlparser.py", line 493, in test_comma_with_space_between_unquoted_attributes
    self._run_check('<div class=bar ,baz=asd>',
  File "/Users/karl/code/cpython/Lib/test/test_htmlparser.py", line 95, in _run_check
    self.fail("received events did not match expected events" +
AssertionError: received events did not match expected events
Source:
'<div class=bar ,baz=asd>'
Expected:
[('starttag', 'div', [('class', 'bar'), (',baz', 'asd')])]
Received:
[('data', '<div class=bar ,baz=asd>')]

----------------------------------------------------------------------


I started to look into the code of parser.py which I'm not familiar (yet) with.

https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/html/parser.py#L42-L52

Do you have a suggestion to fix it?
History
Date User Action Args
2021-01-03 08:22:14karlcowsetrecipients: + karlcow, vstinner, ezio.melotti, nowasky.jr
2021-01-03 08:22:14karlcowsetmessageid: <1609662134.54.0.633908039041.issue41748@roundup.psfhosted.org>
2021-01-03 08:22:14karlcowlinkissue41748 messages
2021-01-03 08:22:14karlcowcreate