classification
Title: test_email failures on Windows: end of line issue?
Type: behavior Stage: resolved
Components: Library (Lib), Tests Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: 1349106 Superseder:
Assigned To: r.david.murray Nosy List: barry, brian.curtin, r.david.murray
Priority: high Keywords: patch

Created on 2010-10-18 03:49 by vstinner, last changed 2010-11-21 16:57 by r.david.murray. This issue is now closed.

Files
File name Uploaded Description Edit
out.txt vstinner, 2010-10-18 03:48
windows_email_fix.patch r.david.murray, 2010-10-18 20:58 review
Messages (9)
msg118996 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-18 03:48
See attached file for the full output. One example:

== CPython 3.2a3+ (py3k:85660, Oct 17 2010, 21:57:48) [MSC v.1500 32 bit (Intel)]
==   Windows-XP-5.1.2600-SP3 little-endian
======================================================================
FAIL: test_MIME_digest (email.test.test_email.TestBytesGeneratorIdempotent)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\victor\py3k\lib\email\test\test_email.py", line 2016, in test_MIME_digest
    self._idempotent(msg, text)
  File "C:\victor\py3k\lib\email\test\test_email.py", line 2947, in _idempotent
    self.assertEqual(data, b.getvalue())
  File "C:\victor\py3k\lib\email\test\test_email.py", line 2952, in assertEqual
    self.assertListEqual(str1.split(b'\n'), str2.split(b'\n'))
AssertionError: Lists differ: [b'MIME-version: 1.0\r', b'Fro... != [b'MIME-version: 1.0', b'From:...

First differing element 0:
b'MIME-version: 1.0\r'
b'MIME-version: 1.0'

- [b'MIME-version: 1.0\r',
?                     --

+ [b'MIME-version: 1.0',
-  b'From: ppp-request@zzz.org\r',
?                             --

+  b'From: ppp-request@zzz.org',
-  b'Sender: ppp-admin@zzz.org\r',
?                             --

+  b'Sender: ppp-admin@zzz.org',
-  b'To: ppp@zzz.org\r',
?                   --

+  b'To: ppp@zzz.org',
-  b'Subject: Ppp digest, Vol 1 #2 - 5 msgs\r',
?                                          --

+  b'Subject: Ppp digest, Vol 1 #2 - 5 msgs',
-  b'Date: Fri, 20 Apr 2001 20:18:00 -0400 (EDT)\r',
?                                               --

+  b'Date: Fri, 20 Apr 2001 20:18:00 -0400 (EDT)',
-  b'X-Mailer: Mailman v2.0.4\r',
?                            --

+  b'X-Mailer: Mailman v2.0.4',
-  b'X-Mailman-Version: 2.0.4\r',
?                            --

+  b'X-Mailman-Version: 2.0.4',
-  b'Content-Type: multipart/mixed; boundary="192.168.1.2.889.32614.987812255.500.21814"\r',
?                                                                                       --

+  b'Content-Type: multipart/mixed; boundary="192.168.1.2.889.32614.987812255.500.21814"',
-  b'\r',
?    --

+  b'',
-  b'--192.168.1.2.889.32614.987812255.500.21814\r',
?                                               --

+  b'--192.168.1.2.889.32614.987812255.500.21814',
-  b'Content-type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-type: text/plain; charset=us-ascii',
-  b'Content-description: Masthead (Ppp digest, Vol 1 #2)\r',
?                                                        --

+  b'Content-description: Masthead (Ppp digest, Vol 1 #2)',
-  b'\r',
?    --

+  b'',
   b'Send Ppp mailing list submissions to\r',
   b'\tppp@zzz.org\r',
   b'\r',
   b'To subscribe or unsubscribe via the World Wide Web, visit\r',
   b'\thttp://www.zzz.org/mailman/listinfo/ppp\r',
   b"or, via email, send a message with subject or body 'help' to\r",
   b'\tppp-request@zzz.org\r',
   b'\r',
   b'You can reach the person managing the list at\r',
   b'\tppp-admin@zzz.org\r',
   b'\r',
   b'When replying, please edit your Subject line so it is more specific\r',
   b'than "Re: Contents of Ppp digest..."\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--192.168.1.2.889.32614.987812255.500.21814\r',
?                                               --

+  b'--192.168.1.2.889.32614.987812255.500.21814',
-  b'Content-type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-type: text/plain; charset=us-ascii',
-  b"Content-description: Today's Topics (5 msgs)\r",
?                                                --

+  b"Content-description: Today's Topics (5 msgs)",
-  b'\r',
?    --

+  b'',
   b"Today's Topics:\r",
   b'\r',
   b'   1. testing #1 (Barry A. Warsaw)\r',
   b'   2. testing #2 (Barry A. Warsaw)\r',
   b'   3. testing #3 (Barry A. Warsaw)\r',
   b'   4. testing #4 (Barry A. Warsaw)\r',
   b'   5. testing #5 (Barry A. Warsaw)\r',
-  b'\r',
?    --

+  b'',
-  b'--192.168.1.2.889.32614.987812255.500.21814\r',
?                                               --

+  b'--192.168.1.2.889.32614.987812255.500.21814',
-  b'Content-Type: multipart/digest; boundary="__--__--"\r',
?                                                       --

+  b'Content-Type: multipart/digest; boundary="__--__--"',
-  b'\r',
?    --

+  b'',
-  b'--__--__--\r',
?              --

+  b'--__--__--',
-  b'\r',
?    --

+  b'',
-  b'Message: 1\r',
?              --

+  b'Message: 1',
-  b'Content-Type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-Type: text/plain; charset=us-ascii',
-  b'Content-Transfer-Encoding: 7bit\r',
?                                   --

+  b'Content-Transfer-Encoding: 7bit',
-  b'Date: Fri, 20 Apr 2001 20:16:13 -0400\r',
?                                         --

+  b'Date: Fri, 20 Apr 2001 20:16:13 -0400',
-  b'To: ppp@zzz.org\r',
?                   --

+  b'To: ppp@zzz.org',
-  b'From: barry@digicool.com (Barry A. Warsaw)\r',
?                                              --

+  b'From: barry@digicool.com (Barry A. Warsaw)',
-  b'Subject: [Ppp] testing #1\r',
?                             --

+  b'Subject: [Ppp] testing #1',
-  b'Precedence: bulk\r',
?                    --

+  b'Precedence: bulk',
-  b'\r',
?    --

+  b'',
   b'\r',
   b'hello\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--__--__--\r',
?              --

+  b'--__--__--',
-  b'\r',
?    --

+  b'',
-  b'Message: 2\r',
?              --

+  b'Message: 2',
-  b'Date: Fri, 20 Apr 2001 20:16:21 -0400\r',
?                                         --

+  b'Date: Fri, 20 Apr 2001 20:16:21 -0400',
-  b'Content-Type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-Type: text/plain; charset=us-ascii',
-  b'Content-Transfer-Encoding: 7bit\r',
?                                   --

+  b'Content-Transfer-Encoding: 7bit',
-  b'To: ppp@zzz.org\r',
?                   --

+  b'To: ppp@zzz.org',
-  b'From: barry@digicool.com (Barry A. Warsaw)\r',
?                                              --

+  b'From: barry@digicool.com (Barry A. Warsaw)',
-  b'Precedence: bulk\r',
?                    --

+  b'Precedence: bulk',
-  b'\r',
?    --

+  b'',
   b'\r',
   b'hello\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--__--__--\r',
?              --

+  b'--__--__--',
-  b'\r',
?    --

+  b'',
-  b'Message: 3\r',
?              --

+  b'Message: 3',
-  b'Date: Fri, 20 Apr 2001 20:16:25 -0400\r',
?                                         --

+  b'Date: Fri, 20 Apr 2001 20:16:25 -0400',
-  b'Content-Type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-Type: text/plain; charset=us-ascii',
-  b'Content-Transfer-Encoding: 7bit\r',
?                                   --

+  b'Content-Transfer-Encoding: 7bit',
-  b'To: ppp@zzz.org\r',
?                   --

+  b'To: ppp@zzz.org',
-  b'From: barry@digicool.com (Barry A. Warsaw)\r',
?                                              --

+  b'From: barry@digicool.com (Barry A. Warsaw)',
-  b'Subject: [Ppp] testing #3\r',
?                             --

+  b'Subject: [Ppp] testing #3',
-  b'Precedence: bulk\r',
?                    --

+  b'Precedence: bulk',
-  b'\r',
?    --

+  b'',
   b'\r',
   b'hello\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--__--__--\r',
?              --

+  b'--__--__--',
-  b'\r',
?    --

+  b'',
-  b'Message: 4\r',
?              --

+  b'Message: 4',
-  b'Date: Fri, 20 Apr 2001 20:16:28 -0400\r',
?                                         --

+  b'Date: Fri, 20 Apr 2001 20:16:28 -0400',
-  b'Content-Type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-Type: text/plain; charset=us-ascii',
-  b'Content-Transfer-Encoding: 7bit\r',
?                                   --

+  b'Content-Transfer-Encoding: 7bit',
-  b'To: ppp@zzz.org\r',
?                   --

+  b'To: ppp@zzz.org',
-  b'From: barry@digicool.com (Barry A. Warsaw)\r',
?                                              --

+  b'From: barry@digicool.com (Barry A. Warsaw)',
-  b'Subject: [Ppp] testing #4\r',
?                             --

+  b'Subject: [Ppp] testing #4',
-  b'Precedence: bulk\r',
?                    --

+  b'Precedence: bulk',
-  b'\r',
?    --

+  b'',
   b'\r',
   b'hello\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--__--__--\r',
?              --

+  b'--__--__--',
-  b'\r',
?    --

+  b'',
-  b'Message: 5\r',
?              --

+  b'Message: 5',
-  b'Date: Fri, 20 Apr 2001 20:16:32 -0400\r',
?                                         --

+  b'Date: Fri, 20 Apr 2001 20:16:32 -0400',
-  b'Content-Type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-Type: text/plain; charset=us-ascii',
-  b'Content-Transfer-Encoding: 7bit\r',
?                                   --

+  b'Content-Transfer-Encoding: 7bit',
-  b'To: ppp@zzz.org\r',
?                   --

+  b'To: ppp@zzz.org',
-  b'From: barry@digicool.com (Barry A. Warsaw)\r',
?                                              --

+  b'From: barry@digicool.com (Barry A. Warsaw)',
-  b'Subject: [Ppp] testing #5\r',
?                             --

+  b'Subject: [Ppp] testing #5',
-  b'Precedence: bulk\r',
?                    --

+  b'Precedence: bulk',
-  b'\r',
?    --

+  b'',
   b'\r',
   b'hello\r',
   b'\r',
   b'\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--__--__----\r',
?                --

+  b'--__--__----',
-  b'--192.168.1.2.889.32614.987812255.500.21814\r',
?                                               --

+  b'--192.168.1.2.889.32614.987812255.500.21814',
-  b'Content-type: text/plain; charset=us-ascii\r',
?                                              --

+  b'Content-type: text/plain; charset=us-ascii',
-  b'Content-description: Digest Footer\r',
?                                      --

+  b'Content-description: Digest Footer',
-  b'\r',
?    --

+  b'',
   b'_______________________________________________\r',
   b'Ppp mailing list\r',
   b'Ppp@zzz.org\r',
   b'http://www.zzz.org/mailman/listinfo/ppp\r',
   b'\r',
-  b'\r',
?    --

+  b'',
-  b'--192.168.1.2.889.32614.987812255.500.21814--\r',
?                                                 --

+  b'--192.168.1.2.889.32614.987812255.500.21814--',
   b'\r',
   b'End of Ppp Digest\r',
   b'\r',
   b'']
msg119050 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-18 18:09
The interesting question is, why aren't the buildbots seeing this failure?  I can reproduce it in my Windows VM using 3.2a3, and will work on a fix (to the tests, the code under test is doing the "correct" thing, though that thing is somewhat broken by design).
msg119066 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-18 19:00
Drat, there's a real bug here, too.  The bytes parsing machinery doesn't correctly translate crlf on input.
msg119073 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-18 20:12
Here is a patch that adds a test of the underlying problem and fixes it.  I don't like this patch because it tries to detect the line ending style of the input stream and changes behavior based on that, but because email wants to use '\n' as the separator internally I don't see another way to fix it at the moment.  The ugliest part is that I changed the expected result of one existing test...but that test uses an artificial way of opening an input file in order to test the parser's universal newline handling, and I think the behavior tested is arguably incorrect.

I'm not sure this will fix all the windows failures, but it should fix most of them.
msg119081 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-18 20:58
Revised patch that seems to fix all the windows failures.

Still not sure why they were not showing up on the buildbots.  Victor was working from an svn checkout and I from the binary installer, so it's not just a difference in the svn eol handling.
msg119086 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-18 21:36
Clarification of my earlier comment on the patch: I think the behavior *originally* tested for by the changed test is arguably incorrect, given email's internal use of '\n' line endings.  So I think the patch improves things, but it is a potential behavior change.  Potential only, I hope, because it is unlikely anyone parsing emails in text mode would use anything other than universal newline handling in Python3.
msg119135 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-19 12:37
Having looked more carefully at the email4 (python2) code, I think I'm wrong.  The test I changed appears to codify the behavior expected when parsing a crlf file in binary mode.  This means that email4 code being ported to email5 may depend on that behavior.  So my initial statement was in fact correct: the email5 code as it stands is correct, it is the tests that are broken.

Fixing the tests so that they run on Windows may be hard enough that I end up just skipping that particular test class on Windows.  The difficulty arises from the fact that email uses '\n' when serializing headers, but the message bodies in binary mode will have the \r\n line endings if and only if the source used \r\n.  Since the source files for those tests are text, they have different line endings depending on the platform, and so the output of re-serializing the messages is different.  

Which means, in essence, that on Windows those test failures are correct failures:  binary parse/binary serialize are *not* inverses.  This is also, then, true for a binary parse of an RFC valid input stream, since the RFCs require \r\n line separators.  In Python2 this wasn't a problem because the file or data stream could be read as text using universal newline mode, thus obscuring the difference in the input line end discipline.  In Python3 this is not possible.

So, an alternative and perhaps better fix is to add a new feature to email5's BytesGenerator that allows the output line end character sequence to be specified.  This would have the additional advantage of closing issue 1349106, and would make it easier to interface with an as-yet-nonexistent binary input interface in smtplib.
msg121256 - (view) Author: Brian Curtin (brian.curtin) * (Python committer) Date: 2010-11-16 00:54
> Still not sure why they were not showing up on the buildbots.  Victor was working from an svn checkout and I from the binary installer, so it's not just a difference in the svn eol handling.

I too had only been seeing this in my checkout, but now that I setup a build slave I brought the bad luck there. http://www.python.org/dev/buildbot/all/builders/AMD64%20Windows%20Server%202008%203.x/builds/0 (nothing new, same failures as haypo uploaded).
msg121952 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-11-21 16:57
This is fixed by r86642.  The remaining failing tests were pointing to bugs in the implementation of the linesep argument to generator.flatten.  I had to add an additional test to catch all the related bugs, though.

The tests now run the inversion tests with the source files terminated both with \n and with \r\n, so if there are bugs in this area in the future they should now show up on all platforms.

I've tested this on windows, so I'm closing the bug. Hopefully Brian's buildbot will agree with me :)
History
Date User Action Args
2010-11-21 16:57:48r.david.murraysetstatus: open -> closed
resolution: fixed
messages: + msg121952

stage: needs patch -> resolved
2010-11-16 13:05:08vstinnersetnosy: - vstinner
2010-11-16 00:54:27brian.curtinsetnosy: + brian.curtin
messages: + msg121256
2010-10-20 01:27:31r.david.murraysetdependencies: + email.Generator does not separate headers with "\r\n"
2010-10-19 12:37:16r.david.murraysetnosy: + barry
messages: + msg119135
2010-10-18 21:36:51r.david.murraysetmessages: + msg119086
2010-10-18 20:58:32r.david.murraysetfiles: - windows_email_fix.patch
2010-10-18 20:58:19r.david.murraysetfiles: + windows_email_fix.patch

messages: + msg119081
2010-10-18 20:12:57r.david.murraysetfiles: + windows_email_fix.patch
keywords: + patch
messages: + msg119073
2010-10-18 19:00:15r.david.murraysetmessages: + msg119066
2010-10-18 18:09:35r.david.murraysetassignee: r.david.murray
2010-10-18 18:09:21r.david.murraysetmessages: + msg119050
priority: normal -> high
components: + Tests
type: behavior
stage: needs patch
2010-10-18 03:49:02vstinnercreate