Message 127148 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	akuchling, georg.brandl, giampaolo.rodola, holdenweb, lregebro, pitrou, r.david.murray, rhettinger, sdaoden, vstinner
Date	2011-01-26.21:49:57
SpamBayes Score	3.140821e-13
Marked as misclassified	No
Message-id	<1296078597.99.0.179450140516.issue9124@psf.upfronthosting.co.za>
In-reply-to

Content
pitrou> There's a missing conversion in mailbox.patch. pitrou> Running with -bb shows the issue. pitrou> Here is an updated patch. Good catch: test_mailbox now pass on Windows. -- Some remarks on mailbox2.patch. get_string() returns a bytes object: I propose to rename it to get_bytes(): """Return a byte string representation or raise a KeyError.""" The following comment is outdated, target have to be a binary file: def _dump_message(self, message, target, mangle_from_=False): # This assumes the target file is open in text mode ... get_file(): should we specify that the file-like object is a binary file? MH.get_sequences() and MH.set_sequences() opens .mh_sequences file in text mode from the locale encoding. I don't know if the locale encoding is a good choice. Does this file contain non-ASCII characters? Should we use ASCII or UTF-8 encoding instead, or parse the file in binary, and only decode requested values from ASCII? Since all tests of test_mailbox now pass on Windows, it looks like the "universal newline" thing still work. But how can I be sure? - from_line = 'From MAILER-DAEMON %s' % time.asctime(time.gmtime()) + from_line = b'From MAILER-DAEMON ' + time.asctime(time.gmtime()).encode() Is UTF-8 the right encoding to encode a timestamp? Or should we use something like "=?UTF-8?q?...?=" ? MH.set_sequences() does... sometimes... decode the sequence name from UTF-8. I don't understand why I had to add the following if: - f.write('%s:' % name) + if isinstance(name, bytes): + name = name.decode() + f.write(name + ':') Is it correct to decode the timestamp from UTF-8? And is the following change correct? ********* - maybe_date = ' '.join(self.get_from().split()[-5:]) + maybe_date = b' '.join(self.get_from().split()[-5:]) try: + maybe_date = maybe_date.decode('utf-8') message.set_date(calendar.timegm(time.strptime(maybe_date, '%a %b %d %H:%M:%S %Y'))) - except (ValueError, OverflowError): + except (ValueError, OverflowError, UnicodeDecodeError): pass ******* The following change is just enough to fix mailbox. But it would maybe be better to inherit from RawIOBase instead and implement all methods. _PartialFile class might be moved into the io module. All of this can be done later. ** + def readable(self): + return self._file.readable() + def writable(self): + return self._file.writable() + + def seekable(self): + return self._file.seekable() + + def flush(self): + return self._file.flush() + + @property + def closed(self): + return self._file.closed ****

pitrou> There's a missing conversion in mailbox.patch.
pitrou> Running with -bb shows the issue.
pitrou> Here is an updated patch.

Good catch: test_mailbox now pass on Windows.

--

Some remarks on mailbox2.patch.

get_string() returns a bytes object: I propose to rename it to get_bytes(): """Return a *byte* string representation or raise a KeyError."""

The following comment is outdated, target have to be a *binary* file:
    def _dump_message(self, message, target, mangle_from_=False):
        # This assumes the target file is open in *text* mode ...

get_file(): should we specify that the file-like object is a binary file?

MH.get_sequences() and MH.set_sequences() opens .mh_sequences file in text mode from the locale encoding. I don't know if the locale encoding is a good choice. Does this file contain non-ASCII characters? Should we use ASCII or UTF-8 encoding instead, or parse the file in binary, and only decode requested values from ASCII?

Since all tests of test_mailbox now pass on Windows, it looks like the "universal newline" thing still work. But how can I be sure?

-            from_line = 'From MAILER-DAEMON %s' % time.asctime(time.gmtime())
+            from_line = b'From MAILER-DAEMON ' + time.asctime(time.gmtime()).encode()

Is UTF-8 the right encoding to encode a timestamp? Or should we use something like "=?UTF-8?q?...?=" ?

MH.set_sequences() does... sometimes... decode the sequence name from UTF-8. I don't understand why I had to add the following if:

-                f.write('%s:' % name)
+                if isinstance(name, bytes):
+                    name = name.decode()
+                f.write(name + ':')

Is it correct to decode the timestamp from UTF-8? And is the following change correct?
***********
-            maybe_date = ' '.join(self.get_from().split()[-5:])
+            maybe_date = b' '.join(self.get_from().split()[-5:])
             try:
+                maybe_date = maybe_date.decode('utf-8')
                 message.set_date(calendar.timegm(time.strptime(maybe_date,
                                                       '%a %b %d %H:%M:%S %Y')))
-            except (ValueError, OverflowError):
+            except (ValueError, OverflowError, UnicodeDecodeError):
                 pass
***********


The following change is just enough to fix mailbox. But it would maybe be better to inherit from RawIOBase instead and implement all methods. _PartialFile class might be moved into the io module. All of this can be done later.
****** 
+    def readable(self):
+        return self._file.readable()
 
+    def writable(self):
+        return self._file.writable()
+
+    def seekable(self):
+        return self._file.seekable()
+
+    def flush(self):
+        return self._file.flush()
+
+    @property
+    def closed(self):
+        return self._file.closed
******

History
Date	User	Action	Args
2011-01-26 21:49:58	vstinner	set	recipients: + vstinner, akuchling, georg.brandl, rhettinger, holdenweb, pitrou, giampaolo.rodola, lregebro, r.david.murray, sdaoden
2011-01-26 21:49:57	vstinner	set	messageid: <1296078597.99.0.179450140516.issue9124@psf.upfronthosting.co.za>
2011-01-26 21:49:57	vstinner	link	issue9124 messages
2011-01-26 21:49:57	vstinner	create