This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients akuchling, georg.brandl, giampaolo.rodola, holdenweb, lregebro, pitrou, r.david.murray, rhettinger, sdaoden, vstinner
Date 2011-01-26.21:49:57
SpamBayes Score 3.140821e-13
Marked as misclassified No
Message-id <>
pitrou> There's a missing conversion in mailbox.patch.
pitrou> Running with -bb shows the issue.
pitrou> Here is an updated patch.

Good catch: test_mailbox now pass on Windows.


Some remarks on mailbox2.patch.

get_string() returns a bytes object: I propose to rename it to get_bytes(): """Return a *byte* string representation or raise a KeyError."""

The following comment is outdated, target have to be a *binary* file:
    def _dump_message(self, message, target, mangle_from_=False):
        # This assumes the target file is open in *text* mode ...

get_file(): should we specify that the file-like object is a binary file?

MH.get_sequences() and MH.set_sequences() opens .mh_sequences file in text mode from the locale encoding. I don't know if the locale encoding is a good choice. Does this file contain non-ASCII characters? Should we use ASCII or UTF-8 encoding instead, or parse the file in binary, and only decode requested values from ASCII?

Since all tests of test_mailbox now pass on Windows, it looks like the "universal newline" thing still work. But how can I be sure?

-            from_line = 'From MAILER-DAEMON %s' % time.asctime(time.gmtime())
+            from_line = b'From MAILER-DAEMON ' + time.asctime(time.gmtime()).encode()

Is UTF-8 the right encoding to encode a timestamp? Or should we use something like "=?UTF-8?q?...?=" ?

MH.set_sequences() does... sometimes... decode the sequence name from UTF-8. I don't understand why I had to add the following if:

-                f.write('%s:' % name)
+                if isinstance(name, bytes):
+                    name = name.decode()
+                f.write(name + ':')

Is it correct to decode the timestamp from UTF-8? And is the following change correct?
-            maybe_date = ' '.join(self.get_from().split()[-5:])
+            maybe_date = b' '.join(self.get_from().split()[-5:])
+                maybe_date = maybe_date.decode('utf-8')
                                                       '%a %b %d %H:%M:%S %Y')))
-            except (ValueError, OverflowError):
+            except (ValueError, OverflowError, UnicodeDecodeError):

The following change is just enough to fix mailbox. But it would maybe be better to inherit from RawIOBase instead and implement all methods. _PartialFile class might be moved into the io module. All of this can be done later.
+    def readable(self):
+        return self._file.readable()
+    def writable(self):
+        return self._file.writable()
+    def seekable(self):
+        return self._file.seekable()
+    def flush(self):
+        return self._file.flush()
+    @property
+    def closed(self):
+        return self._file.closed
Date User Action Args
2011-01-26 21:49:58vstinnersetrecipients: + vstinner, akuchling, georg.brandl, rhettinger, holdenweb, pitrou, giampaolo.rodola, lregebro, r.david.murray, sdaoden
2011-01-26 21:49:57vstinnersetmessageid: <>
2011-01-26 21:49:57vstinnerlinkissue9124 messages
2011-01-26 21:49:57vstinnercreate