This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ridgerat1611
Recipients eric.smith, ridgerat1611
Date 2021-03-17.16:56:50
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1616000210.8.0.758372363558.issue43483@roundup.psfhosted.org>
In-reply-to
Content
Sure...  I'll cut and paste some of the text I was organizing to go into a possible new issue page.

The only relevant documentation I could find was in the "xml.sax.handler" page in the Python 3.9.2 Documentation for the Python Standard Library (as it has been through many versions):

-----------
ContentHandler.characters(content) -- The Parser will call this method to report each chunk of character data.  SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks...
-----------

As an example, here is a typical snippet taken from Web page

     https://www.tutorialspoint.com/parsing-xml-with-sax-apis-in-python 

The application example records the tag name "type" in the "CurrentData" member, and shortly thereafter, the "type" tag's content is received:

   # Call when a character is read
   def characters(self, content):
      if self.CurrentData == "type":
         self.type = content

Suppose that the parser receives the following text line from the input file.  

<type>SciFi</type>

Though there seems no reason for it, the parser could decide to deliver the content text as "Sc" followed by "iFi".  In that case, a second invocation of the "characters" method would overwrite the characters received in the first invocation, and some of the content text seems "lost."  

Given how rarely it happens, I suspect that when internal processing reaches the end of a block of buffered text from the input file, the easiest thing to do is to report any fragments of text that happen to remain at the end, no matter how tiny, and start fresh with the next internal buffer. Easy for the implementer, but baffling to the application developer.  And rare enough to elude application testing.
History
Date User Action Args
2021-03-17 16:56:50ridgerat1611setrecipients: + ridgerat1611, eric.smith
2021-03-17 16:56:50ridgerat1611setmessageid: <1616000210.8.0.758372363558.issue43483@roundup.psfhosted.org>
2021-03-17 16:56:50ridgerat1611linkissue43483 messages
2021-03-17 16:56:50ridgerat1611create