Message60894
According to HTML 4.0 specification it is possible to
have hexadecimal numeric character references, not only
decimal (see
http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1).
However sgmllib.SGMLparser does not recognize the
hexadecimal form.
More and more HTML pages now use entities with a high
codepoint, not in the official HTML entity list, so
proper handling to these references should be implemented.
A possible solution could be:
- improving the "charref" regular expression, so to
include exadecimal values;
- considering all numeric references valid: those with
n < 255 should be converted to the corresponding
characters, those above 255 should be left as numerical
charrefs. |
|
Date |
User |
Action |
Args |
2008-01-20 09:58:32 | admin | link | issue1459279 messages |
2008-01-20 09:58:32 | admin | create | |
|