This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: `HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked
Type: Stage:
Components: Library (Lib) Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: Hibou57, ezio.melotti, xiang.zhang
Priority: normal Keywords:

Created on 2016-01-26 21:10 by Hibou57, last changed 2022-04-11 14:58 by admin.

Messages (7)
msg258973 - (view) Author: Yannick Duchêne (Hibou57) Date: 2016-01-26 21:10
`HTMLParser.handle_data` may be invoked although `HTMLParser.reset` was invoked. This occurs at least when `HTMLParser.reset` was invoked during `HTMLParser.handle_endtag`.

According to the documentation, `HTMLParser.reset` discard all data, so it should immediately stop the parser.

Additionally as an aside, it's strange `HTMLParser.reset` is invoked during object creation as it's invoking a method on an object which is potentially not entirely initialized (that matters with derived classes).
msg258992 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-01-27 03:11
reset just set some attributes to the initial states and it does not control the parsing process. So reading the gohead function, even if reset is called in handle_endtag and all data are discarded, it is still possible for the process to move forward.
msg259000 - (view) Author: Yannick Duchêne (Hibou57) Date: 2016-01-27 08:11
The documentation says:
> Reset the instance. Loses all unprocessed data.

How can parsing go ahead with all unprocessed data lost? This is the “Loses all unprocessed data” which made me believe it is to stop it.

May be the documentation is unclear.

By the way, if `reset` does not stop the parser, then a `stop` method is missing. I searched for it, and as there was nothing else and could not imagine the parser cannot be stopped, I though `reset` is the way to stop it.
msg259002 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-01-27 08:45
Actually it does move forward since in goahead, it first store a "copy" of the initial self.rawdata and use it to control the flow. If you make some change to self.rawdata when parsing, for example call reset, goahead can not feel it. But methods parse_* can. So the data conflicts.

I think it's not proper to change self.rawdata when parsing. You can easily get various errors by doing that.
msg259003 - (view) Author: Yannick Duchêne (Hibou57) Date: 2016-01-27 09:10
Thanks Xiang, for the clear explanations.

So an error should be triggered when `reset` is invoked while it should not. And remains the issue about how to stop the parser: should an exception be raised and caught at an outer invocation level? Something like raising StopIteration? (I don't enjoy using exceptions for flow control, but that seems to be the Python way, cheese).
msg259009 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-01-27 09:26
Hmm, I don't know whether I am right or not. Let's wait for a core member to clarify. If I am wrong, I am quite sorry.

I don't think invoking reset when parsing should raise an error(and I don't know how to achieve that). When to invoke a subroutine is determined by the programmer. You can always put a well-written subroutine in some wrong place and then cause error. And I don't see how to stop the process either.
msg259011 - (view) Author: Yannick Duchêne (Hibou57) Date: 2016-01-27 10:06
> And I don't see how to stop the process either.

I just did it with `raise StopIteration`, caught at a proper place (in the procedure which invokes `feed` and `close`), and it seems to be fine, I have no more strange behaviours. At least, I cannot see a cleaner way.

Now `reset` is invoked after end of parsing only (thus to be able to have a next round).
History
Date User Action Args
2022-04-11 14:58:26adminsetgithub: 70398
2016-01-27 10:11:03Hibou57setnosy: + ezio.melotti
2016-01-27 10:06:13Hibou57setmessages: + msg259011
2016-01-27 09:26:52xiang.zhangsetmessages: + msg259009
2016-01-27 09:10:23Hibou57setmessages: + msg259003
2016-01-27 08:45:08xiang.zhangsetmessages: + msg259002
2016-01-27 08:11:28Hibou57setmessages: + msg259000
2016-01-27 03:11:13xiang.zhangsetnosy: + xiang.zhang
messages: + msg258992
2016-01-26 21:10:08Hibou57create