Title: cookielib/cookiejar cookies' Expires date parse fails with long month names
Type: behavior Stage:
Components: Library (Lib) Versions: Python 3.6, Python 2.7
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: alb_moral, martin.panter, xtreak
Priority: normal Keywords:

Created on 2018-10-10 08:05 by alb_moral, last changed 2018-10-10 20:27 by martin.panter.

Messages (6)
msg327461 - (view) Author: Alberto Moral (alb_moral) Date: 2018-10-10 08:05
http.cookiejar (cookielib, for python2.*) does not parse some cookies' Expires date.

For  example: "Friday, 1-August-1997 00:00:00 GMT" does not work (while: "Fri, 01 Aug 1997 00:00:00 GMT" works fine)

This is basically due to long names of months (it is compared with MONTHS_LOWER: list of 3-letter months). So, I propose a small change in the definition of LOOSE_HTTP_DATE_RE (see fifth line):

LOOSE_HTTP_DATE_RE = re.compile(
    (\d\d?)            # day
    (\w{3})\w*         # month (3 first letters only)

Instead of:
LOOSE_HTTP_DATE_RE = re.compile(
    (\d\d?)            # day
    (\w+)              # month

I've tested only http.cookiejar (python 3.6), but I suposse the same change will work on cookielib

Thanks in advance
msg327475 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-10-10 14:01
Thanks for the report. As far as I can see from the RFC month seems to follow three letter code. Is there a part of RFC where Python is not compliant? I can't find any related issues or RFC links allowing month format specified in the report. Can you please add the relevant part of RFC or links if any? 

Date RFC 6265 5.1.1 :
msg327482 - (view) Author: Alberto Moral (alb_moral) Date: 2018-10-10 17:17
Thanks for your answer. I have not found any RFCs with full month names either. I'm afraid I'm not an expert here.

But the case is that I get them in my work. Here is an example of response header:

  HTTP/1.1 200 OK
  Server: Oracle-iPlanet-Web-Server/7.0
  Date: Tue, 10 Oct 2018 14:29:44 GMT
  Version-auth-credencial: v.3.0.1 Iplanet - Sun Solaris - Contexto Multiple
  Set-cookie: JSESSIONIDE=Del; expires=Friday, 1-August-1997 00:00:00 GMT; domain=...

I do not know if it's an old date format (?)... or if it is a quite rare case...

I have created some previous bash scripts using wget and they work fine, but I have had problems with python3 (and requests module) till I realized this issue. And it was not very easy: I am very new with python :( 

That's the reason of my proposal. It's just to be coherent: if we compare 3 letters of a month with MONTHS_LOWER, let's use just 3 (first) letters.

Perhaps modifying LOOSE_HTTP_DATE_RE is not a good idea. Another option could be to truncate the month variable (mon).

It could be done inside the _str2time funtion, for example:

def _str2time(day, mon, yr, hr, min, sec, tz):
    mon = mon[:3]  # assure 3 letters
    yr = int(yr)

Anyway, I'll try to find why those long month names appear.

Thank you
msg327484 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python triager) Date: 2018-10-10 17:26
No problem, I am also not an expert and I just skimmed through the RFC and cannot find any point related to month full name. So I just wanted to check if there are any recent changes I am missing or if the server is configured to set cookie expiration with full month name since there was no related issues raised as far as I have searched in the bug tracker. I will wait for others comment on this.

msg327486 - (view) Author: Alberto Moral (alb_moral) Date: 2018-10-10 17:38
Yes, I was thinking that it could be a matter of configuration of the server (?).

By the way, and just for fun, I've just realized that truncating mon at the begining of the _str2time funtion is a very bad idea because mon could also be an int.

A better place is when looking the MONTHS_LOWER array index (and possible exception is handle):
        mon = MONTHS_LOWER.index(mon[:3].lower())+1

(perhaps in 2 sentences for clarity)

OK, waiting for experts' comments.

I'm really enjoying Python.
msg327491 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2018-10-10 20:27
RFC 6265 says that only the first three letters of the month are significant, and the rest of the token should be ignored. See <>:

month = ( "jan" / "feb" / "mar" / "apr" /
    "may" / "jun" / "jul" / "aug" /
    "sep" / "oct" / "nov" / "dec" ) *OCTET

I have not heard of an Expires field syntax with a numeric month.
Date User Action Args
2018-10-10 20:27:23martin.pantersetnosy: + martin.panter
messages: + msg327491
2018-10-10 17:38:50alb_moralsetmessages: + msg327486
2018-10-10 17:26:34xtreaksetmessages: + msg327484
2018-10-10 17:17:17alb_moralsetmessages: + msg327482
2018-10-10 14:01:11xtreaksetmessages: + msg327475
2018-10-10 13:35:21xtreaksetnosy: + xtreak
2018-10-10 08:05:41alb_moralcreate