Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doctest _load_testfile function -- newline handling seems incorrect #46137

Closed
pdonis mannequin opened this issue Jan 12, 2008 · 38 comments
Closed

doctest _load_testfile function -- newline handling seems incorrect #46137

pdonis mannequin opened this issue Jan 12, 2008 · 38 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes easy tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error

Comments

@pdonis
Copy link
Mannequin

pdonis mannequin commented Jan 12, 2008

BPO 1812
Nosy @abalkin, @pdonis, @merwok, @zware, @miss-islington
PRs
  • bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method. #17385
  • [3.8] bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385) #19175
  • [3.7] bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385) #19176
  • bpo-43049: Use io.IncrementalNewlineDecoder for doctest newline conversion #24359
  • Files
  • doctest-fixes.diff: Diff against revision 59907 showing suggested fix
  • doctest-fixes1.diff: Revised diff against revision 59932
  • doctest-fixes2.diff: Revised diff against revision 82970
  • doctest-fixes3.diff: Revised diff against revision 82970, fixes patch apply issue
  • doctest-fixes-py3k.diff: Diff against py3k branch, revision 82984
  • doctest_testfile.txt: Raw doctest_testfile.txt for python 2.7 trunk
  • doctest_testfile.txt.py3k: Raw doctest_testfile.txt for py3k branch
  • doctest-fixes4.diff: Diff against trunk with improved test method
  • doctest-fixes5.diff: Diff against trunk rev 83044 with improved test method, py3k comments removed
  • doctest-fixes5-py3k.diff: Diff against py3k branch rev 83044 with improved test method
  • doctest-fixes6.diff: Diff against release27-maint branch rev 83060 with further improvments
  • doctest-fixes6-py3k.diff: Diff against py3k branch rev 83060 with further improvements
  • doctest-fixes7-py3k.diff: Updated patch against Mercurial default branch
  • doctest-fixes8-py3k.diff: Updated patch to ensure tests pass with -v flag
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-03-26.16:26:07.378>
    created_at = <Date 2008-01-12.05:57:33.772>
    labels = ['easy', 'type-bug', '3.8', '3.9', '3.7', 'tests']
    title = 'doctest _load_testfile function -- newline handling seems incorrect'
    updated_at = <Date 2021-03-02.17:06:28.028>
    user = 'https://github.com/pdonis'

    bugs.python.org fields:

    activity = <Date 2021-03-02.17:06:28.028>
    actor = 'zach.ware'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-03-26.16:26:07.378>
    closer = 'zach.ware'
    components = ['Tests']
    creation = <Date 2008-01-12.05:57:33.772>
    creator = 'pdonis'
    dependencies = []
    files = ['9138', '9142', '18063', '18066', '18070', '18071', '18072', '18117', '18118', '18119', '18133', '18134', '26505', '26508']
    hgrepos = []
    issue_num = 1812
    keywords = ['patch', 'easy']
    message_count = 38.0
    messages = ['59801', '59802', '59834', '110706', '110755', '110767', '110779', '110795', '110808', '110813', '110815', '110822', '110824', '110825', '111143', '111144', '111148', '111180', '111215', '111218', '111299', '111304', '111314', '111351', '111361', '111392', '112980', '166348', '166352', '357468', '360436', '361926', '361927', '365086', '365090', '365091', '365092', '387940']
    nosy_count = 5.0
    nosy_names = ['belopolsky', 'pdonis', 'eric.araujo', 'zach.ware', 'miss-islington']
    pr_nums = ['17385', '19175', '19176', '24359']
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue1812'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jan 12, 2008

    When running doctest.testfile on a Linux machine, testing a txt file
    saved on a Windows machine, doctest raised a SyntaxError exception for
    each Windows newline in the txt file. On examining the code in the
    _load_testfile function, it looks to me like there are actually two
    issues:

    (1) When there is no package keyword argument given, or the package
    argument points to a package without a __loader__.get_data method, the
    txt file data is retrieved by calling open(filename).read(); this is
    the code path that my Windows-saved file triggered. However, since the
    default file mode for open is 'r', not 'rU', there is no universal
    newline conversion done on the file data. This was the issue I
    initially observed.

    (2) When there is a package.__loader__.get_data method found, that
    method reads the data (using file mode 'rb'), and then newline
    conversion is performed by this line:

    return file_contents.replace(os.linesep, '\n')

    This does not seem to match what universal newline conversion does;
    that is supposed to convert both '\r\n' and '\r' to '\n', but running
    on Linux, os.linesep is '\n', so the replace operation in the current
    code is a no-op, even if the txt file was saved with Windows or Mac
    newlines. It seems to me that the correct operation would be:

    for linesep in ('\r\n', '\r'):
       file_contents = file_contents.replace(linesep, '\n')

    I have attached a diff against the current svn trunk showing my
    suggested fix for both of the above issues.

    Peter Donis

    @pdonis pdonis mannequin added tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error labels Jan 12, 2008
    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jan 12, 2008

    Edit: I should have said that the attached diff also includes changes
    to test_doctest.py to test for the correct newline behavior. Because
    the test setup is a little complex, I added an auxiliary script,
    doctest_testfile.py, and an accompanying text file,
    doctest_testfile.txt, which intentionally contains mismatched newlines
    for use in the test.

    @tiran tiran added the easy label Jan 12, 2008
    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jan 12, 2008

    I've uploaded a revised diff with two small improvements:

    (1) Removed a redundant os.isfile check in
    PackageLoaderTestImporter.get_data() in doctest_testfile.py. (The
    open() builtin already raises IOError if the file can't be opened.)

    (2) Added doctest.master = None in three places in doctest_testfile.py
    so the expected output is now nothing instead of three lines of

    *** DocTestRunner.merge: 'doctest_testfile.txt' in both testers;
    summing outcomes.

    Changed the expected output in test_doctest.py to correspond.

    @abalkin
    Copy link
    Member

    abalkin commented Jul 19, 2010

    Mark L,

    This could use some shaking. Please take a look.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    I saw that this issue was bumped and re-tested against the current trunk (r82970). A further change in doctest_testfile.py was needed to make the test pass when called from regrtest.py: the test importer for the loader.get_data test case now stores the absolute path where doctest_testfile.txt is located, so that it can always find it. I've uploaded a 2nd revised diff with this change.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 19, 2010

    Patched files work fine on Windows against the 2.7 maintainance branch.
    doctest (doctest) ... 66 tests with zero failures
    doctest (test.test_doctest) ... 440 tests with zero failures

    They fail on py3k, it's a unicode problem that's nothing to do with this patch.

    Could someone try this on a Linux box to confirm my findings?

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    You'll probably want someone else to confirm, but for the record, my testing was on a Linux box (SuSE 11.2) using Python 2.7 built from the SVN trunk:

    peter@Powerspec:/.local/lib/python2.7/test> uname -a
    Linux Powerspec 2.6.31.12-0.2-desktop #1 SMP PREEMPT 2010-03-16 21:25:39 +0100 i686 i686 i386 GNU/Linux
    peter@Powerspec:
    /.local/lib/python2.7/test> python2.7
    Python 2.7 (trunk:82970M, Jul 19 2010, 12:44:35)
    [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
    Type "help", "copyright", "credits" or "license" for more information.

    >>
    peter@Powerspec:~/.local/lib/python2.7/test> python2.7 test_doctest.py
    doctest (doctest) ... 66 tests with zero failures
    doctest (test.test_doctest) ... 443 tests with zero failures

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    I realized on comparing doctest-fixes2.diff with doctest-fixes1.diff that doctest-fixes2.diff doesn't capture the different newlines correctly, so patch on my machine wouldn't apply the diff (I had done testing from the modified svn checkout without reverting and then re-applying the patch). Uploaded doctest-fixes3.diff which is generated using the external diff command (svn diff --diff-cmd diff) so that patch will apply it cleanly. Not sure if this is something specific to my machine.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 19, 2010

    I've again tried running the test against the 2.7 maintainance release and got:-

    ValueError: line 12 of the docstring for doctest_testfile.txt has inconsistent leading whitespace: 'This doctest verifies universal newline support; each command'

    Could another patch please be provided that corrects this? I suspect that I tried to edit the previous version to overcome this, screwed up the line endings and forgot to mention it, sorry about that.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    Have you tried the doctest-fixes3.diff I uploaded just a little while ago? You may have run up against the same issue I did when I tried to revert and re-apply the previous patch. See my msg110795.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    Uploaded a diff implementing the fix for the head of the py3k branch. Test passes on my Linux box:

    peter@Powerspec:~/.local/lib/python3.2/test> python3.2
    Python 3.2a0 (py3k:82984, Jul 19 2010, 16:03:06)
    [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2
    Type "help", "copyright", "credits" or "license" for more information.

    >>
    peter@Powerspec:~/.local/lib/python3.2/test> python3.2 test_doctest.py
    doctest (doctest) ... 66 tests with zero failures
    doctest (test.test_doctest) ... 415 tests with zero failures

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    Re msg110808, on thinking it over I realized it may not be so simple to get diff and patch to behave properly with a file like doctest_testfile.txt, where we want to intentionally mismatch newlines. We basically need to treat the file as binary rather than text (which also means that, since the file as I've uploaded it contains all Unix newlines except for the Windows and Mac specific lines, on a Windows or Mac system most of the file, including all the comment lines, ends up being part of the test). Maybe there's a way to set the svn:mime-type property to do that.

    In the meantime, I've uploaded the raw doctest_testfile.txt file, so if necessary it can just be copied into the appropriate directory for testing.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    Uploading py3k version of doctest_testfile.txt as well, in case it's needed for testing.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 19, 2010

    Also, can someone please clear the spam flag on my msg110813?

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 22, 2010

    Re my msg110822, I think I have a better solution:
    have the test create a temporary txt file with
    intentionally mismatched newlines, and use that as
    the doctest. That means we can control the exact byte
    by byte content of the txt file, without worrying about
    how it is stored/transferred, etc. I've attached a
    Python 2.7 patch, doctest-fixes4.diff, which implements
    this test method (same actual tests as before, just the
    setup/cleanup is changed); the test passes on my machine.
    The test code in this patch also includes comments
    explaining why this approach is taken.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 22, 2010

    Uploaded doctest-fixes5.diff with one minor correction:
    removed some comments that were reminders for the py3k
    version (which I'll upload shortly).

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 22, 2010

    Uploaded doctest-fixes5-py3k.diff, diff against py3k
    branch implementing same improved test method as
    doctest-fixes5.diff.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 22, 2010

    The py3k stuff is fine on Windows but the 2.7 maintainance branch now fails.

    1 items had failures:
       1 of   6 in doctest_testfile.txt
    ***Test Failed*** 1 failures.
    

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 22, 2010

    I don't normally run Windows, so it will take a little time
    for me to set up a Windows build environment. However, I've
    made a number of other improvements as a result of further testing
    on Linux, and I've uploaded the improved patch as doctest-fixes6.diff.
    When I apply this patch to a regular Python 2.7 installation on
    Windows (Windows 2000 running under VirtualBox on Linux), the tests
    pass (as well as on my Linux box when applied against the
    release27-maint SVN branch).

    Output from testing on Windows:

    C:\Python27\Lib\test>python test_doctest.py
    doctest (doctest) ... 66 tests with zero failures
    doctest (test.test_doctest) ... 443 tests with zero failures

    C:\Python27\Lib\test>python
    Python 2.7 (r27:82525, Jul 4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.

    Hopefully the improved patch will test OK on your box as well. If
    not, I'll work on testing it in a Windows build environment against
    an SVN checkout.

    Improvements in doctest-fixes6.diff:

    • Uses with statements to guard all file reads (in earlier patches
      writes were guarded but reads were not);

    • Saves and restores sys.path the same way as test_importhooks;

    • Checks for byte-compiled files in __pycache__ when deleting
      temporary files (this was in the py3k patch already, but reading
      PEP-3147 it looks like this feature may be backported to 2.7 as
      well);

    • Test setup/cleanup is now done in a TestFixture class, for clarity
      and because it's easier that way to store the original sys.path
      in the setup and restore it in the cleanup.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 22, 2010

    Uploaded revised diff against py3k branch, doctest-fixes6-py3k.diff,
    with same improvements as doctest-fixes6.diff. Tests still pass on
    my Linux box.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 23, 2010

    @peter again the py3k patch is fine and the 2.7 fails. Is this worth all your effort?

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 23, 2010

    @mark, I'm probably stubborn, yes. :-) Could you post verbose
    output from your testing on Windows? I'd at least like to be
    able to duplicate your findings; it's possible there's
    something simple I'm missing.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 23, 2010

    @peter apologies hereby offered, the 2.7 patch is fine, I'd screwed up doctest_testfile.txt. As the patch is ok and has been very thoroughly tested please can this be committed.

    @abalkin
    Copy link
    Member

    abalkin commented Jul 23, 2010

    I'll take a look.

    Mark,

    "commit review" is a stage after a patch was reviewed by a committer and is deemed appropriate. Usually this comes with an "accepted" resolution. Sometimes "commit review" is skipped for simple patches, but in most cases, at least two committers should approve a patch before it is committed.

    See http://python.org/dev/workflow/.

    @abalkin abalkin self-assigned this Jul 23, 2010
    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 23, 2010

    @mark, no problem, thanks for keeping up with all my patches. :-)

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Jul 23, 2010

    @alexander: If I understand this correctly it means that there is effectively no distinction betwen "patch review" and "commit review". Hence it is perfectly possible that the work that myself and Peter have put in goes down the drain? This is not acceptable. Either the patch gets committed accepting the work that Peter and myself have put in or it gets rejected. Please make your mind up. Also note that I've several more issues in the pipeline that need a bit of TLC. I've a note of every issue number, and if they don't get committed within the next few days I will be asking why.

    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Aug 5, 2010

    can committers take a look please.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 25, 2012

    I recently noticed that there has been a minor code change in the _load_testfile function in doctest, so I generated a new patch against the latest pull from Mercurial (cpython). No actual changes to the issue fix, but this patch should apply cleanly against a checkout from Mercurial. This patch also adds my name to the Misc/ACKS file. :-)

    I have not re-done patches in Mercurial's format against any other branches except the default. If this fix is still under consideration, I can generate patches against other branches and test them.

    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jul 25, 2012

    Updated patch to ensure that tests pass when the -v flag is set running the test suite. This is done by having the helper script, doctest_testfile.py, call doctest.testfile with verbose=False to ensure there is no output if the test passes (which is what the master test_doctest script expects) even if the -v flag is set on the command line.

    @abalkin abalkin removed their assignment Jun 29, 2014
    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Nov 26, 2019

    I have submitted pull request bpo-17385 regarding this issue:

    #17385

    @pdonis pdonis mannequin added the 3.9 only security fixes label Nov 26, 2019
    @pdonis
    Copy link
    Mannequin Author

    pdonis mannequin commented Jan 21, 2020

    Pinging as a reminder that there is a pull request for this issue awaiting review.

    @merwok
    Copy link
    Member

    merwok commented Feb 13, 2020

    Allow me to ask a maybe very naive question:

    Wouldn't it be possible to fix the problem at the source, that is, use open(filename, "rU") in the two spots you found rather than doing ad-hoc line ending translation?

    (I get that the second one uses rb at the moment, so switching from bytes to str is not just a one-liner there)

    @merwok
    Copy link
    Member

    merwok commented Feb 13, 2020

    Ah, I got it wrong, get_data itself uses rb, not default r.
    It makes sense given that package data could be any kind of file, not necessarily text. Too bad there isn’t a get_text(encoding) method in the loader API!

    With Python 3’s rich IO system, isn’t there a quick and correct way to get a text file from a bytes stream? TextIOWrapper?

    @zware
    Copy link
    Member

    zware commented Mar 26, 2020

    New changeset e0b8101 by Peter Donis in branch 'master':
    bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385)
    e0b8101

    @miss-islington
    Copy link
    Contributor

    New changeset b05fbe9 by Miss Islington (bot) in branch '3.8':
    bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385)
    b05fbe9

    @miss-islington
    Copy link
    Contributor

    New changeset 9387678 by Miss Islington (bot) in branch '3.7':
    bpo-1812: Fix newline conversion when doctest.testfile loads from a package whose loader has a get_data method (GH-17385)
    9387678

    @zware
    Copy link
    Member

    zware commented Mar 26, 2020

    12 years later, we finally landed your fix :). Thanks for the patch, and for bearing with us (and me in particular for the past couple months).

    As mentioned in the PR, there is probably opportunity to clean up the cleanup code in the new test a bit, and as Éric and Ammar mentioned here and in review, it could be interesting to use TextIOWrapper instead (though we'd probably only make that change in 3.9). If you're interesting in doing a follow-up PR, it can either be attached to this issue (in spite of it being closed) or a new issue opened.

    @zware zware added 3.7 (EOL) end of life 3.8 only security fixes labels Mar 26, 2020
    @zware zware closed this as completed Mar 26, 2020
    @zware
    Copy link
    Member

    zware commented Mar 2, 2021

    New changeset b36349a by Peter Donis in branch 'master':
    bpo-43049: Use io.IncrementalNewlineDecoder for doctest newline conversion (GH-24359)
    b36349a

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes easy tests Tests in the Lib/test dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants