This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: sgmllib - leading spaces in declaration
Type: Stage:
Components: Library (Lib) Versions:
process
Status: closed Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: dougfort, loewis
Priority: normal Keywords: patch

Created on 2001-05-31 22:26 by dougfort, last changed 2022-04-10 16:04 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
sgmllib.diff dougfort, 2001-05-31 22:26 Diff of changes to sgmllib.py
Messages (3)
msg36695 - (view) Author: Doug Fort (dougfort) Date: 2001-05-31 22:26
Some sites sloppily leave a space in their doctype
declaration:  i.e. <! doctype...>. The Python 2.1 sgml
parser raises an exception for this.  This patch
modifies sgmllib.py to allow leading whitespace in the
declaration.  It also adds a little information to the
exception message.

msg36696 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2001-06-07 20:05
Logged In: YES 
user_id=21627

I don't have an SGML spec, so I can only check the XML 
spec. In XML, such a DOCTYPE declaration is ill-formed; I 
expect the same to be true for SGML. Therefore, I 
recommend to reject this patch.

If you have a need to process such ill-formed documents, I 
recommend to derive from SGMLParser and replace 
parse_declaration appropriately. E.g. you could advance i 
until after the space, then call the base method.
msg36697 - (view) Author: Doug Fort (dougfort) Date: 2001-06-07 20:41
Logged In: YES 
user_id=6399

I have already overloaded parse_declaration. I will withdraw
the patch. However, I would like to make one final comment.

<rant>
A rigid interpretation of the RFCs is correct in servers,
but clients should be as flexible as possible, to handle
real servers.  Our system (http://www.stressmy.com) uses
heavily overloaded versions of sgmllib, httplib, and other
Python library modules because while they may adhere here to
some notion of academic purity, they just don't work very
well against real websites.
</rant>

Whew, I feel better now.  
History
Date User Action Args
2022-04-10 16:04:05adminsetgithub: 34569
2001-05-31 22:26:31dougfortcreate