New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sax.parser considers XML as text rather than bytes #47840
Comments
While porting Leo to Python 3.0, I found that passing any byte stream to while buffer != "": to: while buffer != "" and buffer != b"": at line 123 of xmlreader.py Here is the entire function: def parse(self, source):
from . import saxutils
source = saxutils.prepare_input_source(source)
self.prepareParser(source)
file = source.getByteStream()
buffer = file.read(self._bufsize)
### while buffer != "":
while buffer != "" and buffer != b"": ### EKR
self.feed(buffer)
buffer = file.read(self._bufsize)
self.close() For reference, here is the code in Leo that was hanging:: parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_external_ges,1)
handler = saxContentHandler(c,inputFileName,silent,inClipboard)
parser.setContentHandler(handler)
parser.parse(theFile) Looking at the test_expat_file function in test_sax.py, it appears that HTH. Edward |
It should probably be changed to just while buffer != b"" since it |
On Mon, Aug 18, 2008 at 10:09 AM, Benjamin Peterson
That was my guess as well. I added the extra test so as not to remove a Just to be clear, I am at present totally confused about io streams :-) Anyway, opening the file passed to parser.parse with 'r' mode looks like the Edward |
Python 3.0 distincts more clearly between unicode strings (called "str" Files opened in binary ("rb") mode returns byte strings, but files What is more worrying is that XML, until decoded, should be considered a Bumping this as critical because it needs a decision very soon (ideally |
From the discussion on the python-3000, it looks like it would be nice Edward, does your simple fix make sax.parser work entirely well with |
On Mon, Aug 18, 2008 at 1:51 PM, Antoine Pitrou <report@bugs.python.org>wrote:
No. The sax.parser seems to have other problems. Here is what I *think* I
Traceback (most recent call last):
Traceback (most recent call last): File "C:\leo.repo\leo-30\leo\core\leoFileCommands.py", line 1283, in File "c:\python30\lib\xml\sax\expatreader.py", line 107, in parse File "c:\python30\lib\xml\sax\xmlreader.py", line 121, in parse File "C:\Python30\lib\io.py", line 1670, in read File "C:\Python30\lib\io.py", line 1499, in _read_chunk File "C:\Python30\lib\io.py", line 1236, in decode File "C:\Python30\lib\encodings\cp1252.py", line 23, in decode UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 74: The same calls to sax read the file correctly on Python 2.5. It would be nice to have a message pinpoint the line and character offset of My vote would be for the code to work on both kinds of input streams. This Imo, now would be the most convenient time to attempt this--there is a Edward |
On Mon, Aug 18, 2008 at 11:00 AM, Antoine Pitrou <report@bugs.python.org>wrote:
Thanks for these remarks. They confirm what I suspected, but was unsure of,
Thanks for taking this seriously. Edward P.S. I love the new unicode plans. They are going to cause some pain at EKR |
What are those calls exactly? |
I guess that the file is simply opened in text mode ("r"). This uses the |
On Mon, Aug 18, 2008 at 4:15 PM, Antoine Pitrou <report@bugs.python.org>wrote:
parser = xml.sax.make_parser()
parser.setFeature(xml.sax.handler.feature_external_ges,1)
handler = saxContentHandler(c,inputFileName,silent,inClipboard)
parser.setContentHandler(handler)
parser.parse(theFile) As discussed in http://bugs.python.org/issue3590 theFile is a file opened with 'rb' attributes Edward -------------------------------------------------------------------- |
Ok, then xml.sax looks rather broken. (by the way, can you avoid sending HTML emails? each time you send one, |
This is a duplicate of bpo-2501. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: