This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author mgdelmonte
Recipients Lukasa, barry, demian.brecht, icordasc, martin.panter, mgdelmonte, piotr.dobrogost, r.david.murray
Date 2015-06-03.14:36:26
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1433342186.84.0.340444077082.issue24363@psf.upfronthosting.co.za>
In-reply-to
Content
Given that obs-fold is technically valid, then can I recommend reading the entire header first (reading to the first blank line) and then tokenizing the individual headers using a regular expression rather than line by line?  That would solve the problem more elegantly and easily than attempting to read lines on the fly and then "unreading" the nonconforming lines.

Here's my recommendation:

    def readheaders(self):
        self.dict = {}
        self.unixfrom = ''
        self.headers = hlist = []
        self.status = ''
        # read entire header (read until first blank line)
        while True:
            line = self.fp.readline(_MAXLINE+1)
            if not line:
                self.status = 'EOF in headers'
                break
            if len(line) > _MAXLINE:
                raise LineTooLong("header line")
            hlist.append(line)
            if line in ('\n', '\r\n'):
                break
            if len(hlist) > _MAXHEADERS:
                raise HTTPException("got more than %d headers" % _MAXHEADERS)
        # reproduce and parse as string
        header = "\n".join(hlist)
        self.headers = re.findall(r"[^ \n][^\n]+\n(?: +[^\n]+\n)*", header)
        firstline = True
        for line in self.headers:
            if firstline and line.startswith('From '):
                self.unixfrom = self.unixfrom + line
                continue
            firstline = False
            if ':' in line:
                k,v = line.split(':',1)
                self.addheader(k, re.sub("\n +", " ", v.strip()))
            else:
                self.status = 'Non-header line where header expected' if self.dict else 'No headers'


I think this handles everything you're trying to do.  I don't understand the unixfrom bit, but I think I have it right.

As for Cory's concern re: smuggling, _MAXLINE and _MAXHEADERS should help with that.  The regexp guarantees that every line plus continuation appears as a single header.

I use re.sub("\n +", " ", v.strip()) to clean the value and remove the continuation.
History
Date User Action Args
2015-06-03 14:36:26mgdelmontesetrecipients: + mgdelmonte, barry, r.david.murray, martin.panter, piotr.dobrogost, icordasc, demian.brecht, Lukasa
2015-06-03 14:36:26mgdelmontesetmessageid: <1433342186.84.0.340444077082.issue24363@psf.upfronthosting.co.za>
2015-06-03 14:36:26mgdelmontelinkissue24363 messages
2015-06-03 14:36:26mgdelmontecreate