This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: cookielib/cookiejar cookies' Expires date parse fails with long month names
Type: behavior Stage: patch review
Components: Library (Lib) Versions: Python 3.6, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alb_moral, lpopil, martin.panter, xtreak
Priority: normal Keywords: patch

Created on 2018-10-10 08:05 by alb_moral, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 19393 open lpopil, 2020-04-06 23:11
Messages (7)
msg327461 - (view) Author: Alberto Moral (alb_moral) Date: 2018-10-10 08:05
http.cookiejar (cookielib, for python2.*) does not parse some cookies' Expires date.

For  example: "Friday, 1-August-1997 00:00:00 GMT" does not work (while: "Fri, 01 Aug 1997 00:00:00 GMT" works fine)

This is basically due to long names of months (it is compared with MONTHS_LOWER: list of 3-letter months). So, I propose a small change in the definition of LOOSE_HTTP_DATE_RE (see fifth line):

LOOSE_HTTP_DATE_RE = re.compile(
    (\d\d?)            # day
    (\w{3})\w*         # month (3 first letters only)

Instead of:
LOOSE_HTTP_DATE_RE = re.compile(
    (\d\d?)            # day
    (\w+)              # month

I've tested only http.cookiejar (python 3.6), but I suposse the same change will work on cookielib

Thanks in advance
msg327475 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-10 14:01
Thanks for the report. As far as I can see from the RFC month seems to follow three letter code. Is there a part of RFC where Python is not compliant? I can't find any related issues or RFC links allowing month format specified in the report. Can you please add the relevant part of RFC or links if any? 

Date RFC 6265 5.1.1 :
msg327482 - (view) Author: Alberto Moral (alb_moral) Date: 2018-10-10 17:17
Thanks for your answer. I have not found any RFCs with full month names either. I'm afraid I'm not an expert here.

But the case is that I get them in my work. Here is an example of response header:

  HTTP/1.1 200 OK
  Server: Oracle-iPlanet-Web-Server/7.0
  Date: Tue, 10 Oct 2018 14:29:44 GMT
  Version-auth-credencial: v.3.0.1 Iplanet - Sun Solaris - Contexto Multiple
  Set-cookie: JSESSIONIDE=Del; expires=Friday, 1-August-1997 00:00:00 GMT; domain=...

I do not know if it's an old date format (?)... or if it is a quite rare case...

I have created some previous bash scripts using wget and they work fine, but I have had problems with python3 (and requests module) till I realized this issue. And it was not very easy: I am very new with python :( 

That's the reason of my proposal. It's just to be coherent: if we compare 3 letters of a month with MONTHS_LOWER, let's use just 3 (first) letters.

Perhaps modifying LOOSE_HTTP_DATE_RE is not a good idea. Another option could be to truncate the month variable (mon).

It could be done inside the _str2time funtion, for example:

def _str2time(day, mon, yr, hr, min, sec, tz):
    mon = mon[:3]  # assure 3 letters
    yr = int(yr)

Anyway, I'll try to find why those long month names appear.

Thank you
msg327484 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-10-10 17:26
No problem, I am also not an expert and I just skimmed through the RFC and cannot find any point related to month full name. So I just wanted to check if there are any recent changes I am missing or if the server is configured to set cookie expiration with full month name since there was no related issues raised as far as I have searched in the bug tracker. I will wait for others comment on this.

msg327486 - (view) Author: Alberto Moral (alb_moral) Date: 2018-10-10 17:38
Yes, I was thinking that it could be a matter of configuration of the server (?).

By the way, and just for fun, I've just realized that truncating mon at the begining of the _str2time funtion is a very bad idea because mon could also be an int.

A better place is when looking the MONTHS_LOWER array index (and possible exception is handle):
        mon = MONTHS_LOWER.index(mon[:3].lower())+1

(perhaps in 2 sentences for clarity)

OK, waiting for experts' comments.

I'm really enjoying Python.
msg327491 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-10-10 20:27
RFC 6265 says that only the first three letters of the month are significant, and the rest of the token should be ignored. See <>:

month = ( "jan" / "feb" / "mar" / "apr" /
    "may" / "jun" / "jul" / "aug" /
    "sep" / "oct" / "nov" / "dec" ) *OCTET

I have not heard of an Expires field syntax with a numeric month.
msg365882 - (view) Author: Liubomyr Popil (lpopil) * Date: 2020-04-06 23:11
I found this issue as most related to problem I was discovered:
a long name of day doesn't parsed.
According to

      Sun, 06 Nov 1994 08:49:37 GMT  ; RFC 822, updated by RFC 1123
      Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
      Sun Nov  6 08:49:37 1994       ; ANSI C's asctime() format

HTTP/1.1 clients and servers that parse the date value MUST accept
   all three formats (for compatibility with HTTP/1.0), though they MUST
   only generate the RFC 1123 format for representing HTTP-date values
   in header fields.

month format is correct, but for day part should be a both types.

 - Liubomyr
Date User Action Args
2022-04-11 14:59:07adminsetgithub: 79132
2020-04-06 23:11:22lpopilsetnosy: + lpopil
messages: + msg365882
pull_requests: + pull_request18763

keywords: + patch
stage: patch review
2018-10-10 20:27:23martin.pantersetnosy: + martin.panter
messages: + msg327491
2018-10-10 17:38:50alb_moralsetmessages: + msg327486
2018-10-10 17:26:34xtreaksetmessages: + msg327484
2018-10-10 17:17:17alb_moralsetmessages: + msg327482
2018-10-10 14:01:11xtreaksetmessages: + msg327475
2018-10-10 13:35:21xtreaksetnosy: + xtreak
2018-10-10 08:05:41alb_moralcreate