classification
Title: HTML calendar is broken
Type: behavior Stage: resolved
Components: Library (Lib) Versions: Python 3.2
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: belopolsky Nosy List: belopolsky, catherine, doerwalter, eric.araujo, ezio.melotti, georg.brandl, gpolo, lambacck, orsenthil, pitrou, python-dev, r.david.murray, rhettinger, ron_adam
Priority: normal Keywords: patch

Created on 2010-10-13 15:43 by belopolsky, last changed 2011-08-13 00:36 by orsenthil. This issue is now closed.

Files
File name Uploaded Description Edit
Calendar for 2010.pdf belopolsky, 2010-10-13 15:43
calendar.diff doerwalter, 2010-10-13 16:20 review
calendar_bytes_output.patch lambacck, 2010-11-22 03:25 review
calendar_bytes_output_v2.patch lambacck, 2010-11-22 16:55 review
test_calendar_byteoutput.patch catherine, 2011-07-31 03:34 patch to test_calendar.py review
Messages (31)
msg118532 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-13 15:43
$ python3 -m calendar --type=html 2010

Produces HTML which renders into the attached PDF file.  It looks like bytes vs. strings issue.
msg118533 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-13 16:00
The issue is clearly with using print() for bytes' output: 

        if len(args) == 1:
            print(cal.formatyearpage(datetime.date.today().year, **optdict))

I am not sure, however how this can be fixed because calendar interface allows user to specify the encoding to be used for HTML and that encoding may not be the same or even compatible with sys.stdout encoding.

I would suggest adding -o/--output option to calendar CLI to specify the output file and make it required when specified encoding is not "compatible" with that of sys.stdout.  What constitutes "compatible" still needs to be defined.
msg118536 - (view) Author: Walter Dörwald (doerwalter) * (Python committer) Date: 2010-10-13 16:20
Does the following patch fix your problems?
msg118542 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-13 16:42
On Wed, Oct 13, 2010 at 12:20 PM, Walter Dörwald <report@bugs.python.org> wrote:
..
> Does the following patch fix your problems?
>

Probably not because with your patch print() will encode HTML stream
with the system default encoding which may be different from that
specified by -e/--encode argument to calendar.  The result would be an
ill-formed XML stream with encoding specified by the meta-tag not
matching the actual encoding.
msg118555 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-13 18:13
There is a similar issue with printing text calendar with specified encoding:

$ python3 -m calendar --encoding=utf8
b'                                  2010\n\n ..
msg118576 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-10-13 20:13
Note that .decode(encoding) calls have been deliberately removed in py3k: see issue 3059.
msg122082 - (view) Author: Chris Lambacher (lambacck) * Date: 2010-11-22 03:25
I have attached a patch that conditionally uses sys.stdout.buffer.write to output binary (encoded) data to stdout.
msg122084 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-22 03:35
It may be acceptable, but your patch elides the line breaks in output.  Can you add unit tests?
msg122131 - (view) Author: Chris Lambacher (lambacck) * Date: 2010-11-22 14:48
I don't understand what you mean by "elides the line breaks in output". They are still there, you just don't see them as \n because print() is no longer implicitly converting the bytes to a string (which appears to actually result in a repr).
msg122136 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-22 15:27
On Mon, Nov 22, 2010 at 9:48 AM, Chris Lambacher <report@bugs.python.org> wrote:
..
> I don't understand what you mean by "elides the line breaks in output".

It is actually not that bad:

$ ./python.exe -m calendar -t html| wc -l
     121
$ python2.7 -m calendar -t html| wc -l
     122

At first, I thought that html was printed one line at a time, but now
I realize that it is prepared in-memory and printed in one shot.  The
extra '\n' in python2.7 is probably a bug in 2.7.

I think this is a reasonable approach.    Just a few comments on the patch:

1. Unit tests needed.
2. It may be appropriate to add a warning to the documentation stating
that using -e option may mess up the terminal.
3.   In the following snippet, p is an unconditional shortcut to
sys.stdout.buffer.write.  I would just call it "write"

         optdict = dict(encoding=encoding, css=options.css)
+        p = sys.stdout.buffer.write

4.  While white space consistency with 2.7 is not very important, the
following should be fixed, IMO:

$ ./python.exe -m calendar -e ascii| wc -l
      35
$ ./python.exe -m calendar| wc -l
      36

5. I wonder how ./python.exe -m calendar -e ascii will look on
Windows.  I don't think cmd window is smart about displaying unix line
endings.
msg122147 - (view) Author: Chris Lambacher (lambacck) * Date: 2010-11-22 16:55
I am attaching a new patch which fixes the majority of the comments raised.

1. Any suggestions about how to test the output of the console program (the case that this bug affects) would be appreciated.

2. Agreed, included in the output for --help

3. Agreed, included in new patch

4. Okay. That is because of using write vs print. Changed to always use write

5. Doesn't seem to cause any issues on windows terminal.
msg122148 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-22 17:14
On Mon, Nov 22, 2010 at 11:55 AM, Chris Lambacher
<report@bugs.python.org> wrote:
..
> 1. Any suggestions about how to test the output of the console program (the case that this bug affects)
> would be appreciated.

For month displays, a doctest may be most appropriate. See
Lib/test/test_syntax.py for one example of using doctests in regrtest
suite.  Note that sys.stdout being StringIO may present a challenge to
your approach.  You can also take a look at test_pydoc which also
tests console-oriented text and html output.
msg122149 - (view) Author: Chris Lambacher (lambacck) * Date: 2010-11-22 17:24
The test_pydoc method looks workable, but I'll need to come back to it later because I don't have any more time to work on it today.
msg122191 - (view) Author: Alexander Belopolsky (belopolsky) * (Python committer) Date: 2010-11-23 05:03
I wonder: do we really need "encoding" option to the calendar output?  It is really not the job of the calendar module to deal with encodings.  The calendar module should produce text calendars as unicode strings and HTML calendars as UTF-8 bytes.  If someone wants to save them in a specific encoding, there are plenty of ways to achieve that.
msg122219 - (view) Author: Chris Lambacher (lambacck) * Date: 2010-11-23 15:11
I don't think we *need* to have the encoding in the HTML calendar, but I doubt we could remove it at this point, deprecate maybe, but since I don't use the module I don't have a sense for how often the need for encoding comes up.

The one thing that the encoding is doing in the HTML calendar is ensuring that Unicode characters that are not directly translatable in the target encoding are replaced with html character references (the "xmlcharrefreplace" option to the errors argument in the encode method).

The encoding may be a holdover from Python 2x where the expected result was a string. The docs say HTMLCalendar was added in Python 2.5 at the same time that the Local versions were added.
msg124409 - (view) Author: Ron Adam (ron_adam) * Date: 2010-12-20 22:13
The problem is in the following line...

    return ''.join(v).encode(encoding, "xmlcharrefreplace")

The .encode(encoding, "xmlcharrefreplace") is returning a bytes object.

Here is the simplest change to resolve the problem.

    return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
                encoding=encoding)

But maybe a different way to implement "xmlcharrefreplace" is what is needed.  Would it be better if it were a function or method in the xml (or html) module?

Just because it can be implemented as an encoding doesn't mean it should be.
msg124410 - (view) Author: Chris Lambacher (lambacck) * Date: 2010-12-20 22:38
Sorry in advance for the long winded response.

Ron, have you looked at my patch?

The underlying issue is that the semantics for print() between Python 1 and 3. print() does not accept a bytes type in Python 3. In Python 2 str was a "bytes" type and so print happily sent encoded strings to stdout. 

This presents an issue for both --type=html and the text version if an encoding is asked for. Just using print() will result in repr being called on the byte string and you get either an invalid HTML file or a text file with extra junk in it (same junk in both).

If you ask for an encoding, you are going to get bytes. Changing it back into a string to mask that effect does not actually fix things for you because once you do print() you are back to a default encoding and therefore more broken because you are not doing what the user asked for (which is a particular encoding).

In order for:
    return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
                encoding=encoding)

to solve the issue, you would also need to take away the ability for the user to specify an encoding (at the command line and via the API). It's already a string, why make it a byte and then a string again? If you don't want to deal with encoding, then return a string and leave it up to the consumer of the API to handle the desired encoding (and the "xmlcharrefreplace", maybe with a note in the docs).

If you do want to deal with encoding (which I think we are stuck with), then solve the real issue by not using print() (see my patch).

I think the only reason that my patch was not accepted, and why this is still languishing is that I said I would provide tests and have not had time to do so. 

Please feel free to correct me if I am wrong about any of the above.
msg124414 - (view) Author: Ron Adam (ron_adam) * Date: 2010-12-21 02:06
Oops.  You're right.  

I miss understood how the encode method works in this particular case. ;-/

I agree with your comments as well.
msg141457 - (view) Author: Catherine Devlin (catherine) Date: 2011-07-31 03:34
Very simplistic patch to test_calendar.py that adds a test for this fix.  This fails on the unpatched trunk b/c the unpatched output begins with b"b'<?xm" instead of b"<?xml ".  Test needs a quick review... I'm not familiar with the conventions yet!
msg141518 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-08-01 14:37
Patch looks good.  I made a few very minor style comments on Rietveld (follow the “review” link).
msg141732 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-08-07 01:00
FWIW, I think removing the encoding facility (more so because it can interfere with the terminal encoding) and just dealing with strings to output the html version of calendar seems a neater way than definiing write = sys.stdout.buffer.write and writing the bytes to the STDOUT.

Is this not feasible at the moment?
msg141734 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2011-08-07 01:23
BTW, accessing sys.stdout.buffer is not really safe, because if sys.stdout has been replaced with some other object, the buffer attribute might not be available (IIRC StringIO doesn't have it).
msg141736 - (view) Author: Chris Lambacher (lambacck) * Date: 2011-08-07 01:40
Senthil: I think that would fundamentally make things worse. The HTML calendar apparently always provides a bytes type, but lets assume it provided you with unicode and we used sys.stdout.write on the output. Fundamentally you would get the same behavior as the patch provides since if you don't provide an encoding it uses sys.getdefaultencoding() which, IIUC is fundamentally what happens if you sys.stdout.write a unicode string.

Ezio: I think it is highly unlikely that someone would be fiddling around with sys.stdout and then calling the main() function on calendar.main() (after setting up sys.args so that option parser gets the right parameters).
msg141738 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-08-07 06:06
Chris, as you already suggested in msg124410, I would prefer the removal of encoding specification from command line. Providing the encoding argument via API with deprecating warning is a good idea.

The output should be string and it should be just be printed. It should not be converted to bytes and then decoded back to string.
msg141744 - (view) Author: Chris Lambacher (lambacck) * Date: 2011-08-07 15:13
Senthil: I wasn't advocating the removal of the ability to specify encoding, only stating that avoiding the use of sys.stdout.buffer.write could only sensibly be done if we also removed the ability to specify an encoding (I can see how my wording might have been confusing). Earlier (msg122219) I said that we probably can't remove the encoding option because that would break backwards compatibility.

FWIW, I think that the encoding parameter to the console program is probably sensible because there are no guarantees that the sys.getdefaultencoding() is actually set to the console encoding and I would guess that most people using the html option are not sending output to the console anyway. They are likely redirecting the output somewhere else because there is no "output this to a file" option.
msg141746 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-07 19:31
I don't think "WARNING: this may corrupt your terminal" is warranted. Also, I think we can safely remove the --encoding option. Calling a module from the command-line is intended as a demonstration of the module's functionality, it's not an official API that we have to keep compatible.

Apart from that, Chris' latest patch and Catherine's minimal test look ok to me.
msg141765 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-08-08 10:46
I agree with Chris's point that people using html format in console are probably redirecting to a file and encoding parameter should be helpful.

What do we settle upon? Leave the encoding argument as such?

Antoine - I was specifically interested to know if sys.stdout.buffer.write as in the patch was good(/proper) way to go forward writing the bytes?
msg141839 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-09 19:17
Le Mon, 08 Aug 2011 10:46:20 +0000,
Senthil Kumaran <report@bugs.python.org> a écrit :
> 
> Antoine - I was specifically interested to know if
> sys.stdout.buffer.write as in the patch was good(/proper) way to go
> forward writing the bytes?

Yes, it is.
msg141889 - (view) Author: Roundup Robot (python-dev) Date: 2011-08-11 01:25
New changeset 9fc7ef60ea06 by Senthil Kumaran in branch '3.2':
Fix closes Issue10087 -  fixing the output of calendar display in the html format. Patch by Chris Lambacher. Test Contributed by catherine.
http://hg.python.org/cpython/rev/9fc7ef60ea06

New changeset 23316468ed4f by Senthil Kumaran in branch '3.2':
News item for Issue10087.
http://hg.python.org/cpython/rev/23316468ed4f
msg141973 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2011-08-12 16:55
There were comments by Ezio and me on Rietveld.

Also, the commit adds a period after the help text for --encoding, but all other help text aren’t capitalized and don’t use periods, as is usual in help messages.
msg142002 - (view) Author: Senthil Kumaran (orsenthil) * (Python committer) Date: 2011-08-13 00:36
Hello Éric,

I might have ignored some minor stylistic comments. The '.' in the help text
and , after the last TestName, I am not sure if it is of concern.

I think, to update the stylistic comments, if the submitters (if they
care) could have updated the patch, or a separate commit on style
changes can be done (if they are worth it).

And yeah, it is bad to ignore substantiative comments. I hope, I did
not overlook any. Ezio's main point was he was worried about using
sys.stdout.buffer (as I was too) and we got that clarified. 

Thanks,
Senthil
History
Date User Action Args
2011-08-13 00:36:55orsenthilsetmessages: + msg142002
2011-08-12 16:55:23eric.araujosetmessages: + msg141973
2011-08-11 01:25:32python-devsetstatus: open -> closed

nosy: + python-dev
messages: + msg141889

resolution: fixed
stage: test needed -> resolved
2011-08-09 19:17:00pitrousetmessages: + msg141839
2011-08-08 11:19:25hayposetnosy: - haypo
2011-08-08 10:46:19orsenthilsetmessages: + msg141765
2011-08-07 19:31:28pitrousetmessages: + msg141746
2011-08-07 15:13:36lambaccksetmessages: + msg141744
2011-08-07 06:06:42orsenthilsetnosy: + pitrou
messages: + msg141738
2011-08-07 01:40:11lambaccksetmessages: + msg141736
2011-08-07 01:23:33ezio.melottisetmessages: + msg141734
2011-08-07 01:00:59orsenthilsetnosy: + orsenthil
messages: + msg141732
2011-08-01 14:37:35eric.araujosetmessages: + msg141518
2011-07-31 13:56:43r.david.murraysetnosy: + r.david.murray
2011-07-31 03:34:11catherinesetfiles: + test_calendar_byteoutput.patch
nosy: + catherine
messages: + msg141457

2010-12-21 02:06:11ron_adamsetnosy: doerwalter, georg.brandl, rhettinger, belopolsky, haypo, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck
messages: + msg124414
2010-12-20 22:38:29lambaccksetnosy: doerwalter, georg.brandl, rhettinger, belopolsky, haypo, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck
messages: + msg124410
2010-12-20 22:13:20ron_adamsetnosy: doerwalter, georg.brandl, rhettinger, belopolsky, haypo, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck
messages: + msg124409
2010-12-20 16:55:24ron_adamsetnosy: + ron_adam
2010-11-23 15:11:16lambaccksetmessages: + msg122219
2010-11-23 05:03:14belopolskysetmessages: + msg122191
2010-11-22 17:24:27lambaccksetmessages: + msg122149
2010-11-22 17:14:33belopolskysetmessages: + msg122148
2010-11-22 16:55:43lambaccksetfiles: + calendar_bytes_output_v2.patch

messages: + msg122147
2010-11-22 15:27:45belopolskysetmessages: + msg122136
2010-11-22 14:48:54ezio.melottisetnosy: + ezio.melotti
2010-11-22 14:48:17lambaccksetmessages: + msg122131
2010-11-22 03:35:58belopolskysetmessages: + msg122084
2010-11-22 03:25:39lambaccksetfiles: + calendar_bytes_output.patch
nosy: + lambacck
messages: + msg122082

2010-11-22 01:54:52belopolskysetnosy: + rhettinger
2010-10-18 16:22:48eric.araujosetnosy: + eric.araujo
2010-10-13 20:13:17belopolskysetnosy: + georg.brandl, gpolo
messages: + msg118576
2010-10-13 18:13:49belopolskysetmessages: + msg118555
2010-10-13 16:42:36belopolskysetmessages: + msg118542
2010-10-13 16:20:35doerwaltersetfiles: + calendar.diff

nosy: + doerwalter
messages: + msg118536

keywords: + patch
2010-10-13 16:00:24belopolskysetnosy: + haypo
messages: + msg118533
2010-10-13 15:43:26belopolskycreate