classification
Title: Inconsistency between dangling '%' handling in time.strftime() and datetime.strftime()
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: belopolsky, eric.smith, matrixise, miss-islington, mjsaah, p-ganssle, pablogsal, terry.reedy, thatiparthy, vstinner, xtreak
Priority: normal Keywords: patch

Created on 2018-10-25 14:33 by mjsaah, last changed 2019-01-14 10:41 by vstinner.

Pull Requests
URL Status Linked Edit
PR 10692 merged python-dev, 2018-11-23 22:18
PR 11550 merged miss-islington, 2019-01-14 10:24
PR 11550 merged miss-islington, 2019-01-14 10:24
PR 11550 merged miss-islington, 2019-01-14 10:24
Messages (21)
msg328443 - (view) Author: Michael Saah (mjsaah) * Date: 2018-10-25 14:33
A call to
time.strftime('%')
returns
'%'

A similar call to
datetime.utcfromtimestamp(int(time.time()).strftime('%')
raises
ValueError: strftime format ends with raw %

Similar inputs like '%D %' behave similarly.

I might take a crack at fixing this, but first I wanted to see what the official guidance is. Seems to me like similar error handling behavior between the functions  would be desirable.
msg328449 - (view) Author: Stéphane Wirtel (matrixise) * (Python triager) Date: 2018-10-25 15:28
for me, yep normally we should provide the same behavior.

now, if you want, you can submit a PR but before your PR, you have to sign the CLA.

thanks
msg328457 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-10-25 17:11
I think it would be a good idea to make this more consistent. We should run through a multi-release deprecation cycle, since it might break existing, working code. And we could only start that in 3.8.
msg328458 - (view) Author: Michael Saah (mjsaah) * Date: 2018-10-25 17:21
Ok, seems reasonable. What branch would I submit a PR against?

On Thu, Oct 25, 2018 at 1:11 PM Eric V. Smith <report@bugs.python.org>
wrote:

>
> Eric V. Smith <eric@trueblade.com> added the comment:
>
> I think it would be a good idea to make this more consistent. We should
> run through a multi-release deprecation cycle, since it might break
> existing, working code. And we could only start that in 3.8.
>
> ----------
> nosy: +eric.smith
> versions: +Python 3.8 -Python 3.5, Python 3.6, Python 3.7
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue35066>
> _______________________________________
>
msg328459 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-10-25 17:22
I am not sure time.strftime("%") should raise an error. There is an explicit test case and it's mentioned as platform dependent in the comment to raise a ValueError or succeed.  So I don't know if it should be changed despite the inconsistency and there is any reason behind this.

The error regarding datetime module comes from SVN version and I couldn't get to know the original reason behind it and why the same was not carried over to time module. 

I agree with Eric that raising a DeprecationWarning for this and then removing it in later versions if we are going forward with this since we are making a platform dependent error as an expected error across platforms.

In the below test case "%" doesn't raise ValueError on my Mac OS and Ubuntu machine.

https://github.com/python/cpython/blob/9e95eb0d609cee23e6c9915c0bef243585b8c14b/Lib/test/test_time.py#L240

def test_strftime_format_check(self):
    # Test that strftime does not crash on invalid format strings
    # that may trigger a buffer overread. When not triggered,
    # strftime may succeed or raise ValueError depending on
    # the platform.
    for x in [ '', 'A', '%A', '%AA' ]:
        for y in range(0x0, 0x10):
            for z in [ '%', 'A%', 'AA%', '%A%', 'A%A%', '%#' ]:
                try:
                    time.strftime(x * y + z)
                except ValueError:
                    pass


I am adding @belopolsky who might have thoughts on the change.

Thanks for the report.
msg328460 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-10-25 17:26
Hmm, if there's a test for this, then that does complicate the decision. Is this behavior documented anywhere? If so, then we shouldn't change it.

If we do decide to go forward with a change, it should be in the master branch, which will become 3.8.
msg328461 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-10-25 17:27
After a little more thinking: maybe we should just document this behavior, make it official, and not change it.
msg328462 - (view) Author: Michael Saah (mjsaah) * Date: 2018-10-25 17:28
From a pure usability standpoint I'd prefer for datetime to match the time
behavior you're demonstrating, that is to not fail on a dangling %.

Of course I defer to the dev team on this, but I want to make clear where
I'm coming from.

On Thu, Oct 25, 2018 at 1:22 PM Karthikeyan Singaravelan <
report@bugs.python.org> wrote:

>
> Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment:
>
> I am not sure time.strftime("%") should raise an error. There is an
> explicit test case and it's mentioned as platform dependent in the comment
> to raise a ValueError or succeed.  So I don't know if it should be changed
> despite the inconsistency and there is any reason behind this.
>
> The error regarding datetime module comes from SVN version and I couldn't
> get to know the original reason behind it and why the same was not carried
> over to time module.
>
> I agree with Eric that raising a DeprecationWarning for this and then
> removing it in later versions if we are going forward with this since we
> are making a platform dependent error as an expected error across platforms.
>
> In the below test case "%" doesn't raise ValueError on my Mac OS and
> Ubuntu machine.
>
>
> https://github.com/python/cpython/blob/9e95eb0d609cee23e6c9915c0bef243585b8c14b/Lib/test/test_time.py#L240
>
> def test_strftime_format_check(self):
>     # Test that strftime does not crash on invalid format strings
>     # that may trigger a buffer overread. When not triggered,
>     # strftime may succeed or raise ValueError depending on
>     # the platform.
>     for x in [ '', 'A', '%A', '%AA' ]:
>         for y in range(0x0, 0x10):
>             for z in [ '%', 'A%', 'AA%', '%A%', 'A%A%', '%#' ]:
>                 try:
>                     time.strftime(x * y + z)
>                 except ValueError:
>                     pass
>
>
> I am adding @belopolsky who might have thoughts on the change.
>
> Thanks for the report.
>
> ----------
> nosy: +belopolsky, xtreak
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue35066>
> _______________________________________
>
msg328466 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-10-25 18:35
Michael: I understand the inconsistency but since there is a test that says ValueError is  platform dependent then making it as an intentional error there might be breakage. I am not against changing this but if it's done then it should be done with DeprecationWarning for 3.8 and then later removed on other versions.

Some more information : 

Further, I looked into timemodule.c in CPython that says that it supports some common formats and "Other codes may be available on your platform.  See documentation for the C library strftime function." . I looked into freebsd strftime there is an explicit comment if conversion char is undefined then the behavior is also undefined and to just print it out. Related issue that has the patch to an external implementation that refers to the same comment : https://bugs.python.org/issue3173 

Meanwhile datetime strftime uses wrap_strftime that defines the custom error message when format ends with raw % and does some more error reporting.

# datetime strftime error : https://github.com/python/cpython/blob/9e95eb0d609cee23e6c9915c0bef243585b8c14b/Modules/_datetimemodule.c#L1518

# Freebsd https://github.com/freebsd/freebsd/blob/277918494930ec3fb0c7fdbd4d35060a3bc6d181/lib/libc/stdtime/strftime.c#L572
# Same comment on Apple's source : https://opensource.apple.com/source/Libc/Libc-166/string.subproj/strftime.c


case '%':
/*
* X311J/88-090 (4.12.3.5): if conversion char is
* undefined, behavior is undefined. Print out the
* character itself as printf(3) also does.
*/
default:
    break;

Initially I thought this is the relevant code that is printing the '%' but looking at the loop itself if the first character is "%" followed by '\0' indicating that it's just '%' then it breaks out of the loop and just returns '%' which I hope is happening on my system. I don't think the above case of printing out the character itself in the comment i.e. "%" is done here.

The above are based on my limited knowledge of C though so feel free to correct me if I am wrong on the above or took it out of context. So maybe this can be documented that for time.strftime the behavior is undefined when the conversion char is undefined and is based on the underlying operating system internals. Also a note that time.strftime with just '%' is system dependent meanwhile datetime.strftime '%' produces a ValueError. I think the same is noted in the test that this platform dependent depending on the implementation of strftime like in Windows. So if we are going to make '%' as an error from Python like datetime.strftime in time.strftime too then lies the breakage since Python behaves different from the underlying OS strftime implementation it uses for time module.

Hope it helps and maybe someone else with a better understanding of C has a better explanation.
msg328473 - (view) Author: Michael Saah (mjsaah) * Date: 2018-10-25 19:52
Did a little digging. Seems that there are two versions of the datetime
module, a C version (looks like an accelerator module) and a Py version.

Both define a wrap_strftime function that replace %z, %Z and %f format
codes before handing off to the timemodule.c code, where the actual
strftime function is called (aliased as format_time).

Here's the strange thing. The C datetime module raises a ValueError on a
dangling %, while the Python version does not. The C code can be seen here:
https://github.com/python/cpython/blob/3df85404d4bf420db3362eeae1345f2cad948a71/Modules/_datetimemodule.c#L1517-L1520
and the python version is here
https://github.com/python/cpython/blob/9e95eb0d609cee23e6c9915c0bef243585b8c14b/Lib/datetime.py#L196

So to summarize, it seems unnecessary to throw an error on a dangling % in
a higher-order module (_datetimemodule.c) when the lower-order module
(timemodule.c) doesn't do the check, and that lower-order module readily
accepts external input. This seems to be further corroborated by the fact
that the equivalent python version of the high-order module (datetime.py)
does not do the check either.

Let me know if I'm off base here, or if this is a fair assessment.

On Thu, Oct 25, 2018 at 2:35 PM Karthikeyan Singaravelan <
report@bugs.python.org> wrote:

>
> Karthikeyan Singaravelan <tir.karthi@gmail.com> added the comment:
>
> Michael: I understand the inconsistency but since there is a test that
> says ValueError is  platform dependent then making it as an intentional
> error there might be breakage. I am not against changing this but if it's
> done then it should be done with DeprecationWarning for 3.8 and then later
> removed on other versions.
>
> Some more information :
>
> Further, I looked into timemodule.c in CPython that says that it supports
> some common formats and "Other codes may be available on your platform.
> See documentation for the C library strftime function." . I looked into
> freebsd strftime there is an explicit comment if conversion char is
> undefined then the behavior is also undefined and to just print it out.
> Related issue that has the patch to an external implementation that refers
> to the same comment : https://bugs.python.org/issue3173
>
> Meanwhile datetime strftime uses wrap_strftime that defines the custom
> error message when format ends with raw % and does some more error
> reporting.
>
> # datetime strftime error :
> https://github.com/python/cpython/blob/9e95eb0d609cee23e6c9915c0bef243585b8c14b/Modules/_datetimemodule.c#L1518
>
> # Freebsd
> https://github.com/freebsd/freebsd/blob/277918494930ec3fb0c7fdbd4d35060a3bc6d181/lib/libc/stdtime/strftime.c#L572
> # Same comment on Apple's source :
> https://opensource.apple.com/source/Libc/Libc-166/string.subproj/strftime.c
>
>
> case '%':
> /*
> * X311J/88-090 (4.12.3.5): if conversion char is
> * undefined, behavior is undefined. Print out the
> * character itself as printf(3) also does.
> */
> default:
>     break;
>
> Initially I thought this is the relevant code that is printing the '%' but
> looking at the loop itself if the first character is "%" followed by '\0'
> indicating that it's just '%' then it breaks out of the loop and just
> returns '%' which I hope is happening on my system. I don't think the above
> case of printing out the character itself in the comment i.e. "%" is done
> here.
>
> The above are based on my limited knowledge of C though so feel free to
> correct me if I am wrong on the above or took it out of context. So maybe
> this can be documented that for time.strftime the behavior is undefined
> when the conversion char is undefined and is based on the underlying
> operating system internals. Also a note that time.strftime with just '%' is
> system dependent meanwhile datetime.strftime '%' produces a ValueError. I
> think the same is noted in the test that this platform dependent depending
> on the implementation of strftime like in Windows. So if we are going to
> make '%' as an error from Python like datetime.strftime in time.strftime
> too then lies the breakage since Python behaves different from the
> underlying OS strftime implementation it uses for time module.
>
> Hope it helps and maybe someone else with a better understanding of C has
> a better explanation.
>
> ----------
>
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue35066>
> _______________________________________
>
msg328590 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2018-10-26 18:51
Michael Saah, when you reply by email, *please* delete the quoted post you are replying to (except possibly for a relevant line or two.).  The quotation duplicates what is already on the web page and makes it harder to scroll through posts on the web page.
msg328591 - (view) Author: Michael Saah (mjsaah) * Date: 2018-10-26 18:55
Appologies, will do.
msg328594 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-10-26 19:07
Thanks for the details. The C implementation should be same as Python implementation which in this case differs as per your analysis if I am understanding it right and IIRC there is a PEP (PEP 399 I think) to enforce that C and Python implementation should behave the same.
msg330364 - (view) Author: Michael Saah (mjsaah) * Date: 2018-11-23 22:30
Summary to accompany my patch:

Modules/_datetimemodule.c and Lib/datetime.py do not behave identically.
Specifically, the strftime functions do not match when passed a format
string
terminated with a '%'. The C function performs an explicit check for this
condition, and raises a ValueError on it. The Py version does not perform
this check. Both pass the
format string (after doing substitutions for %z, %Z, and %f tags) to the
system strftime or wcfstime, depending on platform. These live within the
python time module. The
time module wrapper function does not perform this check.

This situation leads to a scenario in which, for example, "%D %" passed to
datetime.strftime (with the C extension included) raises a value error. The
same string passed to
time.strftime returns "mm/dd/yy %", at least on OSX. Furthermore, if Python
is built without the C module, "mm/dd/yy %" is returned when
datetime.strftime is called.

To summarise, there are two problems: (1) datetime does not comply with
PEP-399, and (2) a higher-order module raises an exception on a case that
the (exposed) lower-order
module has no problem with, causing a mismatch in behavior between
datetime.strftime and time.strftime.

This PR attempts to fix this problem by removing the case check from the
datetime C module. This solves both (1) and (2).

There was much talk on the issue thread about there existing a test case
for time.strftime that documented a platform-dependent failure on a
dangling '%'. I wish to note
that my patch does not touch the time module at all, it only removes a
seemingly unnecessary check in the datetime C module.
msg333331 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-09 16:39
Paul Ganssle asked me to look at PR 10692. This issue is about consistency, so I don't understand this part of the change:

        try:
            _time.strftime('%')
        except ValueError:
self.skipTest('time module does not support trailing %')

Would why datetime have the same behavior on all platforms, but time.strftime('%') may or may not raise an exception depending on the libc?

Can't we get the same behavior on all platforms and the same behavior in time and datetime module. Honestly, I have no preference between always raising an exception or always success (just copy trailing "%").

This issue reminds me the old bpo-16322: time.strftime("%z") fails to format properly the timezone name. I would suggest to "preprocess" the input string passed to the C function strftime() / wcsftime() to replace %z or %Z with the timezone name, but only pass format substrings?

Something similar can be done for the trailing "%": pass a substring (without the trailing %) to strftime() / wcsftime(), and later append "%".
msg333336 - (view) Author: Michael Saah (mjsaah) * Date: 2019-01-09 17:13
Hi Victor, thanks for taking a look.

> Would why datetime have the same behavior on all platforms, but time.strftime('%') may or may not raise an exception depending on the libc?

If I understand the call stack correctly, datetime does not have the
same behavior on all platforms. datetime does some preprocessing and
then hands the resulting format string down to time.strftime, which in
turn passes it down to the system. The time module does not check for
trailing %.

To be honest, I can't claim to understand the strftime
system-dependence, as I couldn't find good documentation of it nor
could I find error handling code. The C version of datetime.strftime
really just said "There's a lone trailing %; doesn't make sense." when
making the check. The python version of datetime did not make this
check, and neither does any version of the time module's strftime.

> Something similar can be done for the trailing "%": pass a substring (without the trailing %) to strftime() / wcsftime(), and later append "%".

I like this idea, as it gets around the ill-defined parameters of
system-dependence that I'm working with. This change would need to
made to the time module, and would be in addition to the changes I've
already made.
msg333379 - (view) Author: Paul Ganssle (p-ganssle) * Date: 2019-01-10 14:10
I agree with Victor on this. In the future, I'd really like to see us do our best to add cross-platform uniformity to Python's strftime and strptime support. If there really is a platform out there that doesn't support a trailing `%`, I like the idea of stripping it off before passing it to the system strftime/wcstrftime.

That said, I don't think this should be a blocker on Michael's PR. I think that his contribution by itself improves on the current state of things and there's no pressing *need* to solve them both at the same time. Unless I'm misunderstanding, I think the existing PR is a prerequisite for solving the problem on all platforms anyway.

Michael - do you think you can / would you like to add the functionality that Victor mentioned to your existing PR? If not, I recommend we merge the current PR and open a new issue for "Lone trailing % not supported on all platforms".
msg333384 - (view) Author: Michael Saah (mjsaah) * Date: 2019-01-10 14:41
> Michael - do you think you can / would you like to add the functionality that Victor mentioned to your existing PR? If not, I recommend we merge the current PR and open a new issue for "Lone trailing % not supported on all platforms".

I'd be happy to do so, but can't commit to a timeline at the moment.
As long as there's no worry that the branch goes stale in the
meantime, I'd say you can leave it open. Maybe it would be best though
to merge and open a new issue, given the independence of the two
fixes.

I'll leave that as a judgement call to you.
msg333599 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-14 10:21
The behavior of strftime() with non-ASCII is not portable: bpo-34512.

A solution to make time.strftime() more portable would be to split the format string, format each "%xxx" substring separately but don't pass substrings between "%xxx" to strftime().
msg333600 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-14 10:23
New changeset 454b3d4ea246e8751534e105548d141ed7b0b032 by Victor Stinner (MichaelSaah) in branch 'master':
bpo-35066: _dateime.datetime.strftime copies trailing '%' (GH-10692)
https://github.com/python/cpython/commit/454b3d4ea246e8751534e105548d141ed7b0b032
msg333602 - (view) Author: miss-islington (miss-islington) Date: 2019-01-14 10:41
New changeset 26122de1a80d1618ee80862cf3b8f73f8ec7d9cf by Miss Islington (bot) in branch '3.7':
bpo-35066: _dateime.datetime.strftime copies trailing '%' (GH-10692)
https://github.com/python/cpython/commit/26122de1a80d1618ee80862cf3b8f73f8ec7d9cf
msg333603 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2019-01-14 10:41
I proposed two different implementations to make time.strftime() more portable, so it seems like it's more complex than what I expected. I merged the datetime change since this one is self-sufficient, so someone can work on a time change on top of it.
History
Date User Action Args
2019-01-14 10:41:59vstinnersetmessages: + msg333603
2019-01-14 10:41:37miss-islingtonsetnosy: + miss-islington
messages: + msg333602
2019-01-14 10:24:51miss-islingtonsetpull_requests: + pull_request11177
2019-01-14 10:24:33miss-islingtonsetpull_requests: + pull_request11176
2019-01-14 10:24:17miss-islingtonsetpull_requests: + pull_request11175
2019-01-14 10:23:48vstinnersetmessages: + msg333600
2019-01-14 10:21:14vstinnersetmessages: + msg333599
2019-01-10 14:41:46mjsaahsetmessages: + msg333384
2019-01-10 14:10:03p-gansslesetmessages: + msg333379
2019-01-09 17:13:55mjsaahsetmessages: + msg333336
2019-01-09 16:39:06vstinnersetnosy: + vstinner
messages: + msg333331
2018-11-23 22:30:11mjsaahsetmessages: + msg330364
2018-11-23 22:18:53python-devsetkeywords: + patch
stage: patch review
pull_requests: + pull_request9944
2018-10-30 15:38:56pablogsalsetnosy: + pablogsal
2018-10-30 15:04:23p-gansslesetnosy: + p-ganssle
2018-10-26 19:07:56xtreaksetmessages: + msg328594
2018-10-26 18:55:22mjsaahsetmessages: + msg328591
2018-10-26 18:51:21terry.reedysetnosy: + terry.reedy
messages: + msg328590
2018-10-25 19:52:02mjsaahsetmessages: + msg328473
2018-10-25 18:35:15xtreaksetmessages: + msg328466
2018-10-25 17:41:27thatiparthysetnosy: + thatiparthy
2018-10-25 17:28:57mjsaahsetmessages: + msg328462
2018-10-25 17:27:19eric.smithsetmessages: + msg328461
2018-10-25 17:26:20eric.smithsetmessages: + msg328460
2018-10-25 17:22:41xtreaksetnosy: + belopolsky, xtreak
messages: + msg328459
2018-10-25 17:21:53mjsaahsetmessages: + msg328458
2018-10-25 17:11:43eric.smithsetnosy: + eric.smith

messages: + msg328457
versions: + Python 3.8, - Python 3.5, Python 3.6, Python 3.7
2018-10-25 15:28:18matrixisesetnosy: + matrixise
messages: + msg328449
2018-10-25 14:33:30mjsaahcreate