Message 384259 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, karlcow, nowasky.jr, vstinner
Date	2021-01-03.08:49:49
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1609663789.45.0.809660769829.issue41748@roundup.psfhosted.org>
In-reply-to

Content
Writing tests that verify the expected behavior is a great first step. The expected output in the tests should match the behavior described by the HTML 5 specs (which should also correspond to the browsers' behavior), and should initially fail. You can start creating a PR with only the tests, clarifying that it's a work in progress, or wait until you have the fix too. The next step would be tweaking the regex and the code until both the new tests and all the other ones work (excluding the one with the commas you are fixing). You can then commit the fix in the same branch and push it -- GitHub will automatically update the PR. > Do you have a suggestion to fix it? If you are familiar enough with regexes, you could try to figure out whether it matches the invalid attributes or not, and if not why (I took a quick look and I didn't see anything immediately wrong in the regexes). Since the output of the failing test is [('data', '<div class=bar ,baz=asd>')], it's likely that the parser doesn't know how to handle it and passes it to one of the handle_data() in the goahead() method. You can figure out which one is being called and see which are the if-conditions that are leading the interpreter down this path rather than the usual path where the attributes are parsed correctly. If you have other questions let me know :)

Writing tests that verify the expected behavior is a great first step. The expected output in the tests should match the behavior described by the HTML 5 specs (which should also correspond to the browsers' behavior), and should initially fail. You can start creating a PR with only the tests, clarifying that it's a work in progress, or wait until you have the fix too.

The next step would be tweaking the regex and the code until both the new tests and all the other ones work (excluding the one with the commas you are fixing).  You can then commit the fix in the same branch and push it -- GitHub will automatically update the PR.


> Do you have a suggestion to fix it?

If you are familiar enough with regexes, you could try to figure out whether it matches the invalid attributes or not, and if not why (I took a quick look and I didn't see anything immediately wrong in the regexes).

Since the output of the failing test is [('data', '<div class=bar ,baz=asd>')], it's likely that the parser doesn't know how to handle it and passes it to one of the handle_data() in the goahead() method.  You can figure out which one is being called and see which are the if-conditions that are leading the interpreter down this path rather than the usual path where the attributes are parsed correctly.

If you have other questions let me know :)

History
Date	User	Action	Args
2021-01-03 08:49:49	ezio.melotti	set	recipients: + ezio.melotti, vstinner, karlcow, nowasky.jr
2021-01-03 08:49:49	ezio.melotti	set	messageid: <1609663789.45.0.809660769829.issue41748@roundup.psfhosted.org>
2021-01-03 08:49:49	ezio.melotti	link	issue41748 messages
2021-01-03 08:49:49	ezio.melotti	create