Issue 32509: doctest syntax ambiguity between continuation line and ellipsis

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/76690

classification

Title:	doctest syntax ambiguity between continuation line and ellipsis
Type:	behavior	Stage:
Components:	Library (Lib)	Versions:	Python 3.8

process

Status:	open	Resolution:
Dependencies:		Superseder:
Assigned To:		Nosy List:	jaraco, mblahay, r.david.murray, steven.daprano, tim.peters
Priority:	normal	Keywords:

Created on 2018-01-07 04:05 by jaraco, last changed 2022-04-11 14:58 by admin.

Messages (15)
msg309599 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2018-01-07 04:05
I'm trying to write a doctest that prints the hash and filename of a directory. The input is the test dir, but due to the unordered nature of file systems, the doctest checks for one known file: def hash_files(root): """ >>> res = hash_files(Path(__file__).dirname()) Discovering documents Hashing documents ... >>> print(res) ... d41d8cd98f00b204e9800998ecf8427e __init__.py ... """ However, this test fails with: ――――――――――――――――――――――――― [doctest] jaraco.financial.records.hash_files ―――――――――――――――――――――――――― 047 048 >>> res = hash_files(Path(__file__).dirname()) 049 Discovering documents 050 Hashing documents 051 ... 052 >>> print(res) Expected: d41d8cd98f00b204e9800998ecf8427e __init__.py ... Got: e1f9390d13c90c7ed601afffd1b9a9f9 records.py 6a116973e8f29c923a08c2be69b11859 ledger.py d41d8cd98f00b204e9800998ecf8427e __init__.py b83c8a54d6b71e28ccb556a828e3fa5e qif.py ac2d598f65b6debe9888aafe51e9570f ofx.py 9f2572f761342d38239a1394f4337165 msmoney.py <BLANKLINE> The first ellipsis is interpreted as a degenerate continuation of the input line, and it seems it's not possible to have an ellipsis at the beginning of the expected input. Is there any workaround for this issue?
msg309600 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2018-01-07 04:10
I did find [this ugly workaround](https://github.com/jaraco/jaraco.financial/commit/9b866ab7117d1cfc26d7cdcec10c63a608662b46): >>> print('x' + res) x...
msg309601 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-01-07 04:11
What happens if you print a placeholder line first, before your test output? I'm not sure it will work, I seem to remember something about an ellipses starting a line just not being supported, but it was a long time ago... So, that doesn't work, maybe do something like res = ['x' + l for l in res] so that you can use x...?
msg309602 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2018-01-07 04:12
Ah, I see my answer crossed with your post :)
msg309603 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2018-01-07 04:35
Here's a simple demonstration of the issue: # --- cut %< --- import doctest def hash_files(): """ >>> hash_files() # doctest: +ELLIPSIS ... d41d8cd98f00b204e9800998ecf8427e __init__.py ... """ print("""\ e1f9390d13c90c7ed601afffd1b9a9f9 records.py 6a116973e8f29c923a08c2be69b11859 ledger.py d41d8cd98f00b204e9800998ecf8427e __init__.py b83c8a54d6b71e28ccb556a828e3fa5e qif.py ac2d598f65b6debe9888aafe51e9570f ofx.py 9f2572f761342d38239a1394f4337165 msmoney.py """) doctest.run_docstring_examples(hash_files, globals()) # --- cut %< --- The documentation does say that output must follow the final >>> or ... https://docs.python.org/3/library/doctest.html#how-are-docstring-examples-recognized so I believe this is expected behaviour and not a bug. Here is a workaround. Change the doctest to something like this: >>> print('#', end=''); hash_files() # doctest: +ELLIPSIS #... d41d8cd98f00b204e9800998ecf8427e __init__.py ... But a more elegant solution would be to add a new directive to tell doctest to interpret the ... or >>> as output, not input, or to add a new symbol similar to <BLANKLINE>. I'm changing this to an enhancement request as I think this would be useful.
msg309604 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2018-01-07 04:37
Oops, somehow managed to accidentally unsubscribe r.david.murray
msg309605 - (view)	Author: Tim Peters (tim.peters) *	Date: 2018-01-07 04:41
Right, "..." immediately after a ">>>" line is taken to indicate a code continuation line, and there's no way to stop that short of rewriting the parser. The workaround you already found could be made more palatable if you weren't determined to make it impenetrable ;-) For example, """ >>> print("not an ellipsis\\n" + res) #doctest:+ELLIPSIS not an ellipsis ... d41d8cd98f00b204e9800998ecf8427e __init__.py ... """ Or if this is a one-off, some suitable variant of this is simple: """ >>> "d41d8cd98f00b204e9800998ecf8427e __init__.py" in res True """ I'd prefer that, since it directly says you don't care about anything other than that `res` contains a specific substring (in the original way, that has to be _inferred_ from the pattern of ellipses).
msg309606 - (view)	Author: Tim Peters (tim.peters) *	Date: 2018-01-07 04:57
And I somehow managed to unsubscribe Steven :-(
msg309611 - (view)	Author: Steven D'Aprano (steven.daprano) *	Date: 2018-01-07 08:36
Tim Peters said: > Right, "..." immediately after a ">>>" line is taken to indicate a code continuation line, and there's no way to stop that short of rewriting the parser. I haven't gone through the source in detail, but it seems to me that we could change OutputChecker.check_output to support this without touching the parser. Ignoring issues of backwards compatibility for the moment, suppose we accept either '...' or '<ELLIPSIS>' as the wild card in the output section. Jason's example would then become: >>> print(res) # docstring: +ELLIPSIS <ELLIPSIS> d41d8cd98f00b204e9800998ecf8427e __init__.py ... check_output could replace the substring '<ELLIPSIS>' with three dots before doing anything else, and Bob's yer uncle. Or in this case, Uncle Timmy's yer uncle :-) There's probably a million details I haven't thought of, but it seems like a promising approach to me. I did a quick hack of doctest, adding want = want.replace('<ELLIPSIS>', '...') to the start of OutputChecker.check_output and it seems to work. If this is acceptable, we'll probably need a directive to activate it, for the sake of backwards compatibility. Thoughts?
msg309622 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2018-01-07 15:27
Thank you Steven for creating a reproduction of the issue; I should have done that in the first place. I have the +ELLIPSIS enabled elsewhere in the test suite, which is why it didn't appear in my example. I should clarify - what I thought was a suitable workaround turns out is not, in part because the ellipsis must match _something_ and cannot be a degenerate match, leading to [this failure](https://travis-ci.org/jaraco/jaraco.financial/jobs/325955523). So the workaround I thought I'd devised was only suitable in some environments (where some content did appear before the target content). I conclude that trying to match only a single line from a non-deterministically-ordered list of lines isn't a function for which ellipsis is well suited. I'll be adapting the test to simply test for the presence of the expected substring. Therefore, the use-case I presented is invalid (at least while ellipsis must match at least one character). Still, I suspect I haven't been the only person to encounter the reported ambiguity, and I appreciate the progress toward addressing it. I like Steven's approach, as it's simple and directly addresses the ambiguity. It does have the one downside that for the purposes of the documentation, it's a little less elegant, as a literal "<ELLIPSIS>" appears in the docstring. Perhaps instead of "ELLIPSIS", the indicator should be "ANYTHING" or similar, acting more as a first-class feature rather than a stand-in for an ellipsis. That would save the human reader the distraction and trouble of translating "<ELLIPSIS>" to "..." before interpreting the value (even if that's what the doctest interpreter does under the hood). Alternatively, consider "<...>" as the syntax. I'm liking that because it almost looks like it's intention, avoiding much of the distraction. As I think about it more, I'm pretty sure such and approach is not viable, as it's a new syntax (non-alpha in the directive) and highly likely to conflict with existing doctests in the wild. Another way to think about this problem is that the literal "..." is only non-viable when it's the first content in the expected output. Perhaps all that's needed is a signal that the output is starting, with something like "<OUTPUT>" or "<START>" or "<EXPECT>" or "<NULL>" or "<EMPTY>", a token like "<BLANKLINE>" except it's an empty match specifically designed to make the transition. Such a token would specifically address the issue at the border of the test and the output and would _also_ address the issue if the expected output begins with a _literal_ "...". Consider this case: # --- cut %< --- import doctest def print_3_dot(): """ >>> print_3_dot() ... """ print('...') doctest.run_docstring_examples(print_3_dot, globals()) # --- cut %< --- In that case, "<ELLIPSIS>" may also work, but only because a literal substitution is being made. One _might_ be surprised when "<ELLIPSIS>" does't match anything (when +ELLIPSIS is not enabled). Overall, I'm now thinking the "<ELLIPSIS>" solution is suitable and clear enough.
msg309630 - (view)	Author: Tim Peters (tim.peters) *	Date: 2018-01-07 17:51
Jason, an ellipsis will match an empty string. But if your expected output is: """ x... abcd ... """ you're asking for output that: - starts with "x" - followed by 0 or more of anything - FOLLOWED BY A NEWLINE (I think you're overlooking this part) - followed by "abcd" and a newline - followed by 0 or more of anything - followed by (and ending) with a newline So, e.g., "xabcd\n" doesn't match - not because of the ellipsis, but because of the newline following the first ellipsis. You can repair that by changing the expected output like so: """ x...abcd ... """ This still requires that "abcd" is _followed_ by a newline, but puts no constraints on what appears before it. In your specific context, it seems you want to say that your expected line has to appear _as_ its own line in your output, so that it must appear either at the start of the output _or_ immediately following a newline. Neither ellipses nor a simple string search is sufficient to capture that notion. Fancier code can do it, or a regexp search, or, e.g., what_i_want_without_the_trailing_newline in output.splitlines()
msg309631 - (view)	Author: Tim Peters (tim.peters) *	Date: 2018-01-07 18:52
By the way, going back to your original problem, "the usual" solution to that different platforms can list directories in different orders is simply to sort the listing yourself. That's pretty easy in Python ;-) Then your test can verify the hashes and names of _every_ file of interest - and would be clearer on the face of it than anything you could do to try to ignore every line save one.
msg342372 - (view)	Author: Michael Blahay (mblahay) *	Date: 2019-05-13 18:22
At the end of msg309603 it was stated that this issue is being changed to an enhancement. Later on, Tim Peters changed it Type back to behavior, but didn't provide any detail about why. Should this issue still be considered an enhancement?
msg385154 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2021-01-17 03:28
I've encountered this issue again with a different use-case. I'm attempting to add a doctest to a routine that emits the paths of the files it processes. I want to use ellipses to ignore the prefixes of the output because they're not pertinent to the test. Here's the test that might have worked: https://github.com/python/importlib_resources/commit/ca9d014e1b884ff7f8cee63a436832a3e6e809fb, but failed with: ``` _______________________________________ ERROR collecting importlib_resources/tests/update-zips.py _______________________________________ /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:939: in find self._find(tests, obj, name, module, source_lines, globs, {}) .tox/python/lib/python3.9/site-packages/_pytest/doctest.py:522: in _find doctest.DocTestFinder._find( # type: ignore /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:1001: in _find self._find(tests, val, valname, module, source_lines, .tox/python/lib/python3.9/site-packages/_pytest/doctest.py:522: in _find doctest.DocTestFinder._find( # type: ignore /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:989: in _find test = self._get_test(obj, name, module, globs, source_lines) /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:1073: in _get_test return self._parser.get_doctest(docstring, globs, name, /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:675: in get_doctest return DocTest(self.get_examples(string, name), globs, /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:689: in get_examples return [x for x in self.parse(string, name) /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:651: in parse self._parse_example(m, name, lineno) /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:709: in _parse_example self._check_prompt_blank(source_lines, indent, name, lineno) /Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/doctest.py:793: in _check_prompt_blank raise ValueError('line %r of the docstring for %s ' E ValueError: line 6 of the docstring for importlib_resources.tests.update-zips.main lacks blank after ...: '.../data01/utf-16.file -> ziptestdata/utf-16.file' ``` I was able to work around the issue by injecting a newline into the output (https://github.com/python/importlib_resources/commit/b8d48d5a86a9f5bd391c18e1acb39b5697f7ca40). I notice also that in some environments that the test still fails due to the arbitrary ordering of the output, but that test does pass in some environments.
msg386610 - (view)	Author: Jason R. Coombs (jaraco) *	Date: 2021-02-08 00:11
Today I encountered another situation where it would be convenient to allow an ellipsis at the beginning of the syntax: >>> pathlib.Path('abc') ...Path('abc') Because pathlib.Path resolves to `PosixPath` and `WindowsPath` depending on the platform, it would be nice to match both.

History
Date	User	Action	Args
2022-04-11 14:58:56	admin	set	github: 76690
2021-02-08 00:11:09	jaraco	set	messages: + msg386610
2021-01-17 03:28:19	jaraco	set	messages: + msg385154
2019-05-13 18:22:19	mblahay	set	nosy: + mblahay messages: + msg342372
2018-01-07 18:52:45	tim.peters	set	messages: + msg309631
2018-01-07 17:51:14	tim.peters	set	messages: + msg309630
2018-01-07 15:27:09	jaraco	set	messages: + msg309622
2018-01-07 08:36:17	steven.daprano	set	messages: + msg309611
2018-01-07 04:57:44	tim.peters	set	nosy: + steven.daprano messages: + msg309606
2018-01-07 04:41:16	tim.peters	set	nosy: - steven.daprano type: enhancement -> behavior messages: + msg309605
2018-01-07 04:37:04	steven.daprano	set	nosy: + r.david.murray messages: + msg309604
2018-01-07 04:35:34	steven.daprano	set	versions: + Python 3.8 nosy: + tim.peters, steven.daprano, - r.david.murray messages: + msg309603 components: + Library (Lib) type: behavior -> enhancement
2018-01-07 04:12:11	r.david.murray	set	messages: + msg309602
2018-01-07 04:11:16	r.david.murray	set	nosy: + r.david.murray messages: + msg309601
2018-01-07 04:10:21	jaraco	set	messages: + msg309600
2018-01-07 04:05:17	jaraco	create