This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: doctest _load_testfile function -- newline handling seems incorrect
Type: behavior Stage: resolved
Components: Tests Versions: Python 3.9, Python 3.8, Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, eric.araujo, miss-islington, pdonis, zach.ware
Priority: low Keywords: easy, patch

Created on 2008-01-12 05:57 by pdonis, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
doctest-fixes.diff pdonis, 2008-01-12 05:57 Diff against revision 59907 showing suggested fix
doctest-fixes1.diff pdonis, 2008-01-12 18:32 Revised diff against revision 59932
doctest-fixes2.diff pdonis, 2010-07-19 13:51 Revised diff against revision 82970
doctest-fixes3.diff pdonis, 2010-07-19 18:37 Revised diff against revision 82970, fixes patch apply issue
doctest-fixes-py3k.diff pdonis, 2010-07-19 22:24 Diff against py3k branch, revision 82984
doctest_testfile.txt pdonis, 2010-07-19 22:37 Raw doctest_testfile.txt for python 2.7 trunk
doctest_testfile.txt.py3k pdonis, 2010-07-19 22:38 Raw doctest_testfile.txt for py3k branch
doctest-fixes4.diff pdonis, 2010-07-22 01:44 Diff against trunk with improved test method
doctest-fixes5.diff pdonis, 2010-07-22 02:07 Diff against trunk rev 83044 with improved test method, py3k comments removed
doctest-fixes5-py3k.diff pdonis, 2010-07-22 02:37 Diff against py3k branch rev 83044 with improved test method
doctest-fixes6.diff pdonis, 2010-07-22 21:07 Diff against release27-maint branch rev 83060 with further improvments
doctest-fixes6-py3k.diff pdonis, 2010-07-22 21:16 Diff against py3k branch rev 83060 with further improvements
doctest-fixes7-py3k.diff pdonis, 2012-07-25 02:07 Updated patch against Mercurial default branch review
doctest-fixes8-py3k.diff pdonis, 2012-07-25 03:44 Updated patch to ensure tests pass with -v flag review
Pull Requests
URL Status Linked Edit
PR 17385 merged pdonis, 2019-11-26 00:41
PR 19175 merged miss-islington, 2020-03-26 15:53
PR 19176 merged miss-islington, 2020-03-26 15:53
PR 24359 merged pdonis, 2021-01-28 02:13
Messages (38)
msg59801 - (view) Author: Peter Donis (pdonis) * Date: 2008-01-12 05:57
When running doctest.testfile on a Linux machine, testing a txt file 
saved on a Windows machine, doctest raised a SyntaxError exception for 
each Windows newline in the txt file. On examining the code in the 
_load_testfile function, it looks to me like there are actually two 
issues:

(1) When there is no package keyword argument given, or the package 
argument points to a package without a __loader__.get_data method, the 
txt file data is retrieved by calling open(filename).read(); this is 
the code path that my Windows-saved file triggered. However, since the 
default file mode for open is 'r', not 'rU', there is no universal 
newline conversion done on the file data. This was the issue I 
initially observed.

(2) When there is a package.__loader__.get_data method found, that 
method reads the data (using file mode 'rb'), and then newline 
conversion is performed by this line:

return file_contents.replace(os.linesep, '\n')

This does not seem to match what universal newline conversion does; 
that is supposed to convert both '\r\n' and '\r' to '\n', but running 
on Linux, os.linesep is '\n', so the replace operation in the current 
code is a no-op, even if the txt file was saved with Windows or Mac 
newlines. It seems to me that the correct operation would be:

for linesep in ('\r\n', '\r'):
   file_contents = file_contents.replace(linesep, '\n')

I have attached a diff against the current svn trunk showing my 
suggested fix for both of the above issues.

Peter Donis
msg59802 - (view) Author: Peter Donis (pdonis) * Date: 2008-01-12 06:01
Edit: I should have said that the attached diff also includes changes 
to test_doctest.py to test for the correct newline behavior. Because 
the test setup is a little complex, I added an auxiliary script, 
doctest_testfile.py, and an accompanying text file, 
doctest_testfile.txt, which intentionally contains mismatched newlines 
for use in the test.
msg59834 - (view) Author: Peter Donis (pdonis) * Date: 2008-01-12 18:32
I've uploaded a revised diff with two small improvements:

(1) Removed a redundant os.isfile check in 
PackageLoaderTestImporter.get_data() in doctest_testfile.py. (The 
open() builtin already raises IOError if the file can't be opened.)

(2) Added doctest.master = None in three places in doctest_testfile.py 
so the expected output is now nothing instead of three lines of

*** DocTestRunner.merge: 'doctest_testfile.txt' in both testers; 
summing outcomes.

Changed the expected output in test_doctest.py to correspond.
msg110706 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-19 01:53
Mark L,

This could use some shaking.  Please take a look.
msg110755 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 13:51
I saw that this issue was bumped and re-tested against the current trunk (r82970). A further change in doctest_testfile.py was needed to make the test pass when called from regrtest.py: the test importer for the loader.get_data test case now stores the absolute path where doctest_testfile.txt is located, so that it can always find it. I've uploaded a 2nd revised diff with this change.
msg110767 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-19 15:34
Patched files work fine on Windows against the 2.7 maintainance branch.
doctest (doctest) ... 66 tests with zero failures
doctest (test.test_doctest) ... 440 tests with zero failures

They fail on py3k, it's a unicode problem that's nothing to do with this patch.

Could someone try this on a Linux box to confirm my findings?
msg110779 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 16:49
You'll probably want someone else to confirm, but for the record, my testing was on a Linux box (SuSE 11.2) using Python 2.7 built from the SVN trunk:

peter@Powerspec:~/.local/lib/python2.7/test> uname -a
Linux Powerspec 2.6.31.12-0.2-desktop #1 SMP PREEMPT 2010-03-16 21:25:39 +0100 i686 i686 i386 GNU/Linux
peter@Powerspec:~/.local/lib/python2.7/test> python2.7
Python 2.7 (trunk:82970M, Jul 19 2010, 12:44:35)
[GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
peter@Powerspec:~/.local/lib/python2.7/test> python2.7 test_doctest.py
doctest (doctest) ... 66 tests with zero failures
doctest (test.test_doctest) ... 443 tests with zero failures
msg110795 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 18:37
I realized on comparing doctest-fixes2.diff with doctest-fixes1.diff that doctest-fixes2.diff doesn't capture the different newlines correctly, so patch on my machine wouldn't apply the diff (I had done testing from the modified svn checkout without reverting and then re-applying the patch). Uploaded doctest-fixes3.diff which is generated using the external diff command (svn diff --diff-cmd diff) so that patch will apply it cleanly. Not sure if this is something specific to my machine.
msg110808 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-19 20:50
I've again tried running the test against the 2.7 maintainance release and got:-

ValueError: line 12 of the docstring for doctest_testfile.txt has inconsistent leading whitespace: 'This doctest verifies universal newline support; each command'

Could another patch please be provided that corrects this?  I suspect that I tried to edit the previous version to overcome this, screwed up the line endings and forgot to mention it, sorry about that.
msg110813 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 22:18
Have you tried the doctest-fixes3.diff I uploaded just a little while ago? You may have run up against the same issue I did when I tried to revert and re-apply the previous patch. See my msg110795.
msg110815 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 22:24
Uploaded a diff implementing the fix for the head of the py3k branch. Test passes on my Linux box:

peter@Powerspec:~/.local/lib/python3.2/test> python3.2
Python 3.2a0 (py3k:82984, Jul 19 2010, 16:03:06)
[GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
peter@Powerspec:~/.local/lib/python3.2/test> python3.2 test_doctest.py
doctest (doctest) ... 66 tests with zero failures
doctest (test.test_doctest) ... 415 tests with zero failures
msg110822 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 22:37
Re msg110808, on thinking it over I realized it may not be so simple to get diff and patch to behave properly with a file like doctest_testfile.txt, where we want to intentionally mismatch newlines. We basically need to treat the file as binary rather than text (which also means that, since the file as I've uploaded it contains all Unix newlines except for the Windows and Mac specific lines, on a Windows or Mac system most of the file, including all the comment lines, ends up being part of the test). Maybe there's a way to set the svn:mime-type property to do that.

In the meantime, I've uploaded the raw doctest_testfile.txt file, so if necessary it can just be copied into the appropriate directory for testing.
msg110824 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 22:38
Uploading py3k version of doctest_testfile.txt as well, in case it's needed for testing.
msg110825 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-19 22:41
Also, can someone please clear the spam flag on my msg110813?
msg111143 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-22 01:44
Re my msg110822, I think I have a better solution:
have the test create a temporary txt file with
intentionally mismatched newlines, and use that as
the doctest. That means we can control the exact byte
by byte content of the txt file, without worrying about
how it is stored/transferred, etc. I've attached a
Python 2.7 patch, doctest-fixes4.diff, which implements
this test method (same actual tests as before, just the
setup/cleanup is changed); the test passes on my machine.
The test code in this patch also includes comments
explaining why this approach is taken.
msg111144 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-22 02:07
Uploaded doctest-fixes5.diff with one minor correction:
removed some comments that were reminders for the py3k
version (which I'll upload shortly).
msg111148 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-22 02:37
Uploaded doctest-fixes5-py3k.diff, diff against py3k
branch implementing same improved test method as
doctest-fixes5.diff.
msg111180 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-22 14:51
The py3k stuff is fine on Windows but the 2.7 maintainance branch now fails.

    1 items had failures:
       1 of   6 in doctest_testfile.txt
    ***Test Failed*** 1 failures.
msg111215 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-22 21:07
I don't normally run Windows, so it will take a little time
for me to set up a Windows build environment. However, I've
made a number of other improvements as a result of further testing
on Linux, and I've uploaded the improved patch as doctest-fixes6.diff.
When I apply this patch to a regular Python 2.7 installation on
Windows (Windows 2000 running under VirtualBox on Linux), the tests
pass (as well as on my Linux box when applied against the
release27-maint SVN branch).

Output from testing on Windows:

C:\Python27\Lib\test>python test_doctest.py
doctest (doctest) ... 66 tests with zero failures
doctest (test.test_doctest) ... 443 tests with zero failures

C:\Python27\Lib\test>python
Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.

Hopefully the improved patch will test OK on your box as well. If
not, I'll work on testing it in a Windows build environment against
an SVN checkout.

Improvements in doctest-fixes6.diff:

- Uses with statements to guard all file reads (in earlier patches
  writes were guarded but reads were not);

- Saves and restores sys.path the same way as test_importhooks;

- Checks for byte-compiled files in __pycache__ when deleting
  temporary files (this was in the py3k patch already, but reading
  PEP 3147 it looks like this feature may be backported to 2.7 as
  well);

- Test setup/cleanup is now done in a TestFixture class, for clarity
  and because it's easier that way to store the original sys.path
  in the setup and restore it in the cleanup.
msg111218 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-22 21:16
Uploaded revised diff against py3k branch, doctest-fixes6-py3k.diff,
with same improvements as doctest-fixes6.diff. Tests still pass on
my Linux box.
msg111299 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-23 12:33
@Peter again the py3k patch is fine and the 2.7 fails.  Is this worth all your effort?
msg111304 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-23 12:48
@Mark, I'm probably stubborn, yes. :-) Could you post verbose
output from your testing on Windows? I'd at least like to be
able to duplicate your findings; it's possible there's
something simple I'm missing.
msg111314 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-23 13:26
@Peter apologies hereby offered, the 2.7 patch is fine, I'd screwed up doctest_testfile.txt.  As the patch is ok and has been very thoroughly tested please can this be committed.
msg111351 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-07-23 16:02
I'll take a look.


Mark,

"commit review" is a stage after a patch was reviewed by a committer and is deemed appropriate.  Usually this comes with an "accepted" resolution.  Sometimes "commit review" is skipped for simple patches, but in most cases, at least two committers should approve a patch before it is committed.

See http://python.org/dev/workflow/.
msg111361 - (view) Author: Peter Donis (pdonis) * Date: 2010-07-23 16:41
@Mark, no problem, thanks for keeping up with all my patches. :-)
msg111392 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-07-23 21:50
@Alexander: If I understand this correctly it means that there is effectively no distinction betwen "patch review" and "commit review".  Hence it is perfectly possible that the work that myself and Peter have put in goes down the drain?  This is not acceptable.  Either the patch gets committed accepting the work that Peter and myself have put in or it gets rejected.  Please make your mind up.  Also note that I've several more issues in the pipeline that need a bit of TLC.  I've a note of every issue number, and if they don't get committed within the next few days I will be asking why.
msg112980 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2010-08-05 14:09
can committers take a look please.
msg166348 - (view) Author: Peter Donis (pdonis) * Date: 2012-07-25 02:07
I recently noticed that there has been a minor code change in the _load_testfile function in doctest, so I generated a new patch against the latest pull from Mercurial (cpython). No actual changes to the issue fix, but this patch should apply cleanly against a checkout from Mercurial. This patch also adds my name to the Misc/ACKS file. :-)

I have not re-done patches in Mercurial's format against any other branches except the default. If this fix is still under consideration, I can generate patches against other branches and test them.
msg166352 - (view) Author: Peter Donis (pdonis) * Date: 2012-07-25 03:44
Updated patch to ensure that tests pass when the -v flag is set running the test suite. This is done by having the helper script, doctest_testfile.py, call doctest.testfile with verbose=False to ensure there is no output if the test passes (which is what the master test_doctest script expects) even if the -v flag is set on the command line.
msg357468 - (view) Author: Peter Donis (pdonis) * Date: 2019-11-26 00:43
I have submitted pull request #17385 regarding this issue:

https://github.com/python/cpython/pull/17385
msg360436 - (view) Author: Peter Donis (pdonis) * Date: 2020-01-21 22:24
Pinging as a reminder that there is a pull request for this issue awaiting review.
msg361926 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2020-02-13 01:21
Allow me to ask a maybe very naive question:

Wouldn't it be possible to fix the problem at the source, that is, use `open(filename, "rU")` in the two spots you found rather than doing ad-hoc line ending translation?

(I get that the second one uses `rb` at the moment, so switching from bytes to str is not just a one-liner there)
msg361927 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2020-02-13 01:25
Ah, I got it wrong, get_data itself uses `rb`, not default `r`.
It makes sense given that package data could be any kind of file, not necessarily text.  Too bad there isn’t a `get_text(encoding)` method in the loader API!

With Python 3’s rich IO system, isn’t there a quick and correct way to get a text file from a bytes stream?  TextIOWrapper?
msg365086 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-03-26 15:53
New changeset e0b8101492f6c61dee831425b4d3dae39a953599 by Peter Donis in branch 'master':
bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385)
https://github.com/python/cpython/commit/e0b8101492f6c61dee831425b4d3dae39a953599
msg365090 - (view) Author: miss-islington (miss-islington) Date: 2020-03-26 16:18
New changeset b05fbe9f374bc660af3c589a1146b2026606d442 by Miss Islington (bot) in branch '3.8':
bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385)
https://github.com/python/cpython/commit/b05fbe9f374bc660af3c589a1146b2026606d442
msg365091 - (view) Author: miss-islington (miss-islington) Date: 2020-03-26 16:18
New changeset 9387678f8a580726aca5f836b2c50456f236ecbb by Miss Islington (bot) in branch '3.7':
bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385)
https://github.com/python/cpython/commit/9387678f8a580726aca5f836b2c50456f236ecbb
msg365092 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2020-03-26 16:26
12 years later, we finally landed your fix :).  Thanks for the patch, and for bearing with us (and me in particular for the past couple months).

As mentioned in the PR, there is probably opportunity to clean up the cleanup code in the new test a bit, and as Éric and Ammar mentioned here and in review, it could be interesting to use TextIOWrapper instead (though we'd probably only make that change in 3.9).  If you're interesting in doing a follow-up PR, it can either be attached to this issue (in spite of it being closed) or a new issue opened.
msg387940 - (view) Author: Zachary Ware (zach.ware) * (Python committer) Date: 2021-03-02 17:06
New changeset b36349a647b2bf8174f0e736a4fc347e92ae204e by Peter Donis in branch 'master':
bpo-43049: Use io.IncrementalNewlineDecoder for doctest newline conversion (GH-24359)
https://github.com/python/cpython/commit/b36349a647b2bf8174f0e736a4fc347e92ae204e
History
Date User Action Args
2022-04-11 14:56:29adminsetgithub: 46137
2021-03-02 17:06:28zach.waresetmessages: + msg387940
2021-01-28 02:13:46pdonissetpull_requests: + pull_request23184
2020-03-26 16:26:07zach.waresetstatus: open -> closed
versions: + Python 3.7, Python 3.8
messages: + msg365092

resolution: fixed
stage: patch review -> resolved
2020-03-26 16:18:59miss-islingtonsetmessages: + msg365091
2020-03-26 16:18:56miss-islingtonsetmessages: + msg365090
2020-03-26 15:53:40miss-islingtonsetpull_requests: + pull_request18536
2020-03-26 15:53:32miss-islingtonsetnosy: + miss-islington
pull_requests: + pull_request18535
2020-03-26 15:53:24zach.waresetmessages: + msg365086
2020-02-13 01:25:34eric.araujosetmessages: + msg361927
2020-02-13 01:21:23eric.araujosetnosy: + eric.araujo
messages: + msg361926
2020-02-12 07:02:38zach.waresetnosy: + zach.ware
2020-01-21 22:24:15pdonissetmessages: + msg360436
2019-11-26 00:44:18pdonissetversions: + Python 3.9, - Python 2.7, Python 3.4, Python 3.5
2019-11-26 00:43:38pdonissetmessages: + msg357468
2019-11-26 00:41:20pdonissetpull_requests: + pull_request16867
2014-08-30 04:25:06terry.reedysetversions: + Python 3.4, Python 3.5, - Python 3.1, Python 3.2, Python 3.3
2014-06-29 22:35:20belopolskysetassignee: belopolsky ->
2014-02-03 19:10:16BreamoreBoysetnosy: - BreamoreBoy
2012-07-25 03:44:38pdonissetfiles: + doctest-fixes8-py3k.diff

messages: + msg166352
2012-07-25 02:07:28pdonissetfiles: + doctest-fixes7-py3k.diff

messages: + msg166348
versions: + Python 3.3
2010-08-05 14:09:18BreamoreBoysetmessages: + msg112980
2010-07-23 21:50:39BreamoreBoysetmessages: + msg111392
2010-07-23 16:41:47pdonissetmessages: + msg111361
2010-07-23 16:02:18belopolskysetpriority: normal -> low
assignee: belopolsky
messages: + msg111351

stage: commit review -> patch review
2010-07-23 13:26:34BreamoreBoysetmessages: + msg111314
stage: commit review
2010-07-23 12:48:22pdonissetmessages: + msg111304
2010-07-23 12:33:36BreamoreBoysetmessages: + msg111299
2010-07-22 21:16:53pdonissetfiles: + doctest-fixes6-py3k.diff

messages: + msg111218
2010-07-22 21:07:40pdonissetfiles: + doctest-fixes6.diff

messages: + msg111215
2010-07-22 14:51:53BreamoreBoysetmessages: + msg111180
2010-07-22 02:37:55pdonissetfiles: + doctest-fixes5-py3k.diff

messages: + msg111148
2010-07-22 02:07:39pdonissetfiles: + doctest-fixes5.diff

messages: + msg111144
2010-07-22 01:44:35pdonissetfiles: + doctest-fixes4.diff

messages: + msg111143
2010-07-19 22:41:58pdonissetmessages: + msg110825
2010-07-19 22:38:30pdonissetfiles: + doctest_testfile.txt.py3k

messages: + msg110824
2010-07-19 22:37:58pdonissetfiles: + doctest_testfile.txt

messages: + msg110822
2010-07-19 22:24:13pdonissetfiles: + doctest-fixes-py3k.diff

messages: + msg110815
2010-07-19 22:18:11pdonissetmessages: + msg110813
2010-07-19 20:50:20BreamoreBoysetmessages: + msg110808
2010-07-19 18:37:30pdonissetfiles: + doctest-fixes3.diff
keywords: + patch
messages: + msg110795
2010-07-19 16:49:07pdonissetmessages: + msg110779
2010-07-19 15:34:40BreamoreBoysetkeywords: - patch

messages: + msg110767
versions: + Python 3.1, Python 2.7, Python 3.2, - Python 2.6
2010-07-19 13:51:11pdonissetfiles: + doctest-fixes2.diff
keywords: + patch
messages: + msg110755
2010-07-19 01:53:50belopolskysetnosy: + BreamoreBoy, belopolsky
messages: + msg110706
2008-01-12 18:32:32pdonissetfiles: + doctest-fixes1.diff
messages: + msg59834
2008-01-12 16:58:27christian.heimessetkeywords: + easy
2008-01-12 16:58:19christian.heimessetpriority: normal
2008-01-12 06:01:23pdonissetmessages: + msg59802
2008-01-12 05:57:33pdoniscreate