Issue1087808
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-12-19 05:42 by titus, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
sgmllib.diff | titus, 2004-12-22 07:14 | diff against latest HEAD | ||
test-enc.zip | titus, 2004-12-22 07:15 | example code + html. |
Messages (11) | |||
---|---|---|---|
msg47383 - (view) | Author: Titus Brown (titus) | Date: 2004-12-19 05:42 | |
for example, in a form parsed by htmllib.HTMLParser (based on SGMLParser), <option value="5" big"> 5" big the value will not be unescaped to 5" while the printed option will be. Note that this behavior differs from that of HTMLParser.HTMLParser, which does a fine job. See attached test script/test input for an example. See attached patch for a fix: essentially I copied the code directly from HTMLParser.HTMLParser. I don't think this patch should break anything; I can't imagine people were relying on this behavior! |
|||
msg47384 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2004-12-22 07:06 | |
Logged In: YES user_id=21627 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. In addition, even if you *did* check this checkbox, a bug in SourceForge prevents attaching a file when *creating* an issue. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) |
|||
msg47385 - (view) | Author: Titus Brown (titus) | Date: 2004-12-22 07:13 | |
Logged In: YES user_id=23486 Whoops. Dangitall. Also at http://issola.caltech.edu/~t/transfer/sgmllib.diff |
|||
msg47386 - (view) | Author: Titus Brown (titus) | Date: 2004-12-22 07:15 | |
Logged In: YES user_id=23486 oh, and here's the example. |
|||
msg47387 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2004-12-22 07:57 | |
Logged In: YES user_id=21627 Please use unified or context diffs when submitting patches. The patch is incorrect: Instead of hard-coding the list of known entities, sgmllib should use self.entitydefs to determine the set of entity names that are supported. As a result, the algorithm should also replace, say, ä if it occurs in an HTML attribute. Then the question is what should happen on unknown entity references. One cannot really call unknown_entityref, since implementations of that will expect that the entity reference was in content, not in an attribute. So it would probably best to leave unknown entity references alone. Notice that in SGML (and HTML) the semicolon after the entity name is not mandatory, but can be omitted if the entity name is not followed by a letter or digit. So you probably should use the regular expression entityref to find references. Please also provide a documentation patch that explains precisely how the attribute value is created from what is in the input document (i.e. some entity references replaced, no character references replaced, etc). |
|||
msg47388 - (view) | Author: Titus Brown (titus) | Date: 2004-12-22 08:32 | |
Logged In: YES user_id=23486 I'm happy to do so -- note that this will expand the patch to include HTMLParser.py, as well. |
|||
msg47389 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2004-12-22 13:16 | |
Logged In: YES user_id=21627 Not necessarily: self.entityrefs is already updated in HTMLParser, so this should work for the base class. One issue might be with encodings, e.g. if the document encoding is not Latin-1. In this case, one might not want to replace ä with its Latin-1 equivalent, so you might need to provide a hook where a subclass can chose not to perform entity expansion, or perform more of it (perhaps also with a possibility to perform character reference expansion). |
|||
msg47390 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2005-03-04 19:33 | |
Logged In: YES user_id=21627 titus, are you still working on this? |
|||
msg47391 - (view) | Author: Titus Brown (titus) | Date: 2005-03-04 19:38 | |
Logged In: YES user_id=23486 yes, sorry, will submit patch this weekend. |
|||
msg47392 - (view) | Author: Rares Vernica (rvernica) | Date: 2006-04-01 01:00 | |
Logged In: YES user_id=1491427 This patch is continued in patch #1462498. Regarding the differences between patch #1462498 and this patch: - use self.entitydefs to determine the set of entity names that are supported; - unknown entities references are left alone; - the regular expression entityref is used to find references; - a documentation patch is not needed as the method is Internal. Ray |
|||
msg47393 - (view) | Author: Georg Brandl (georg.brandl) * ![]() |
Date: 2006-04-01 08:40 | |
Logged In: YES user_id=849994 Outdated with commit of patch #1462498. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:08 | admin | set | github: 41342 |
2004-12-19 05:42:20 | titus | create |