msg118532 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-10-13 15:43 |
$ python3 -m calendar --type=html 2010
Produces HTML which renders into the attached PDF file. It looks like bytes vs. strings issue.
|
msg118533 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-10-13 16:00 |
The issue is clearly with using print() for bytes' output:
if len(args) == 1:
print(cal.formatyearpage(datetime.date.today().year, **optdict))
I am not sure, however how this can be fixed because calendar interface allows user to specify the encoding to be used for HTML and that encoding may not be the same or even compatible with sys.stdout encoding.
I would suggest adding -o/--output option to calendar CLI to specify the output file and make it required when specified encoding is not "compatible" with that of sys.stdout. What constitutes "compatible" still needs to be defined.
|
msg118536 - (view) |
Author: Walter Dörwald (doerwalter) * |
Date: 2010-10-13 16:20 |
Does the following patch fix your problems?
|
msg118542 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-10-13 16:42 |
On Wed, Oct 13, 2010 at 12:20 PM, Walter Dörwald <report@bugs.python.org> wrote:
..
> Does the following patch fix your problems?
>
Probably not because with your patch print() will encode HTML stream
with the system default encoding which may be different from that
specified by -e/--encode argument to calendar. The result would be an
ill-formed XML stream with encoding specified by the meta-tag not
matching the actual encoding.
|
msg118555 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-10-13 18:13 |
There is a similar issue with printing text calendar with specified encoding:
$ python3 -m calendar --encoding=utf8
b' 2010\n\n ..
|
msg118576 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-10-13 20:13 |
Note that .decode(encoding) calls have been deliberately removed in py3k: see issue 3059.
|
msg122082 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2010-11-22 03:25 |
I have attached a patch that conditionally uses sys.stdout.buffer.write to output binary (encoded) data to stdout.
|
msg122084 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-22 03:35 |
It may be acceptable, but your patch elides the line breaks in output. Can you add unit tests?
|
msg122131 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2010-11-22 14:48 |
I don't understand what you mean by "elides the line breaks in output". They are still there, you just don't see them as \n because print() is no longer implicitly converting the bytes to a string (which appears to actually result in a repr).
|
msg122136 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-22 15:27 |
On Mon, Nov 22, 2010 at 9:48 AM, Chris Lambacher <report@bugs.python.org> wrote:
..
> I don't understand what you mean by "elides the line breaks in output".
It is actually not that bad:
$ ./python.exe -m calendar -t html| wc -l
121
$ python2.7 -m calendar -t html| wc -l
122
At first, I thought that html was printed one line at a time, but now
I realize that it is prepared in-memory and printed in one shot. The
extra '\n' in python2.7 is probably a bug in 2.7.
I think this is a reasonable approach. Just a few comments on the patch:
1. Unit tests needed.
2. It may be appropriate to add a warning to the documentation stating
that using -e option may mess up the terminal.
3. In the following snippet, p is an unconditional shortcut to
sys.stdout.buffer.write. I would just call it "write"
optdict = dict(encoding=encoding, css=options.css)
+ p = sys.stdout.buffer.write
4. While white space consistency with 2.7 is not very important, the
following should be fixed, IMO:
$ ./python.exe -m calendar -e ascii| wc -l
35
$ ./python.exe -m calendar| wc -l
36
5. I wonder how ./python.exe -m calendar -e ascii will look on
Windows. I don't think cmd window is smart about displaying unix line
endings.
|
msg122147 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2010-11-22 16:55 |
I am attaching a new patch which fixes the majority of the comments raised.
1. Any suggestions about how to test the output of the console program (the case that this bug affects) would be appreciated.
2. Agreed, included in the output for --help
3. Agreed, included in new patch
4. Okay. That is because of using write vs print. Changed to always use write
5. Doesn't seem to cause any issues on windows terminal.
|
msg122148 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-22 17:14 |
On Mon, Nov 22, 2010 at 11:55 AM, Chris Lambacher
<report@bugs.python.org> wrote:
..
> 1. Any suggestions about how to test the output of the console program (the case that this bug affects)
> would be appreciated.
For month displays, a doctest may be most appropriate. See
Lib/test/test_syntax.py for one example of using doctests in regrtest
suite. Note that sys.stdout being StringIO may present a challenge to
your approach. You can also take a look at test_pydoc which also
tests console-oriented text and html output.
|
msg122149 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2010-11-22 17:24 |
The test_pydoc method looks workable, but I'll need to come back to it later because I don't have any more time to work on it today.
|
msg122191 - (view) |
Author: Alexander Belopolsky (belopolsky) * |
Date: 2010-11-23 05:03 |
I wonder: do we really need "encoding" option to the calendar output? It is really not the job of the calendar module to deal with encodings. The calendar module should produce text calendars as unicode strings and HTML calendars as UTF-8 bytes. If someone wants to save them in a specific encoding, there are plenty of ways to achieve that.
|
msg122219 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2010-11-23 15:11 |
I don't think we *need* to have the encoding in the HTML calendar, but I doubt we could remove it at this point, deprecate maybe, but since I don't use the module I don't have a sense for how often the need for encoding comes up.
The one thing that the encoding is doing in the HTML calendar is ensuring that Unicode characters that are not directly translatable in the target encoding are replaced with html character references (the "xmlcharrefreplace" option to the errors argument in the encode method).
The encoding may be a holdover from Python 2x where the expected result was a string. The docs say HTMLCalendar was added in Python 2.5 at the same time that the Local versions were added.
|
msg124409 - (view) |
Author: Ron Adam (ron_adam) * |
Date: 2010-12-20 22:13 |
The problem is in the following line...
return ''.join(v).encode(encoding, "xmlcharrefreplace")
The .encode(encoding, "xmlcharrefreplace") is returning a bytes object.
Here is the simplest change to resolve the problem.
return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
encoding=encoding)
But maybe a different way to implement "xmlcharrefreplace" is what is needed. Would it be better if it were a function or method in the xml (or html) module?
Just because it can be implemented as an encoding doesn't mean it should be.
|
msg124410 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2010-12-20 22:38 |
Sorry in advance for the long winded response.
Ron, have you looked at my patch?
The underlying issue is that the semantics for print() between Python 1 and 3. print() does not accept a bytes type in Python 3. In Python 2 str was a "bytes" type and so print happily sent encoded strings to stdout.
This presents an issue for both --type=html and the text version if an encoding is asked for. Just using print() will result in repr being called on the byte string and you get either an invalid HTML file or a text file with extra junk in it (same junk in both).
If you ask for an encoding, you are going to get bytes. Changing it back into a string to mask that effect does not actually fix things for you because once you do print() you are back to a default encoding and therefore more broken because you are not doing what the user asked for (which is a particular encoding).
In order for:
return str(''.join(v).encode(encoding, "xmlcharrefreplace"),
encoding=encoding)
to solve the issue, you would also need to take away the ability for the user to specify an encoding (at the command line and via the API). It's already a string, why make it a byte and then a string again? If you don't want to deal with encoding, then return a string and leave it up to the consumer of the API to handle the desired encoding (and the "xmlcharrefreplace", maybe with a note in the docs).
If you do want to deal with encoding (which I think we are stuck with), then solve the real issue by not using print() (see my patch).
I think the only reason that my patch was not accepted, and why this is still languishing is that I said I would provide tests and have not had time to do so.
Please feel free to correct me if I am wrong about any of the above.
|
msg124414 - (view) |
Author: Ron Adam (ron_adam) * |
Date: 2010-12-21 02:06 |
Oops. You're right.
I miss understood how the encode method works in this particular case. ;-/
I agree with your comments as well.
|
msg141457 - (view) |
Author: Catherine Devlin (catherine) |
Date: 2011-07-31 03:34 |
Very simplistic patch to test_calendar.py that adds a test for this fix. This fails on the unpatched trunk b/c the unpatched output begins with b"b'<?xm" instead of b"<?xml ". Test needs a quick review... I'm not familiar with the conventions yet!
|
msg141518 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-08-01 14:37 |
Patch looks good. I made a few very minor style comments on Rietveld (follow the “review” link).
|
msg141732 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2011-08-07 01:00 |
FWIW, I think removing the encoding facility (more so because it can interfere with the terminal encoding) and just dealing with strings to output the html version of calendar seems a neater way than definiing write = sys.stdout.buffer.write and writing the bytes to the STDOUT.
Is this not feasible at the moment?
|
msg141734 - (view) |
Author: Ezio Melotti (ezio.melotti) * |
Date: 2011-08-07 01:23 |
BTW, accessing sys.stdout.buffer is not really safe, because if sys.stdout has been replaced with some other object, the buffer attribute might not be available (IIRC StringIO doesn't have it).
|
msg141736 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2011-08-07 01:40 |
Senthil: I think that would fundamentally make things worse. The HTML calendar apparently always provides a bytes type, but lets assume it provided you with unicode and we used sys.stdout.write on the output. Fundamentally you would get the same behavior as the patch provides since if you don't provide an encoding it uses sys.getdefaultencoding() which, IIUC is fundamentally what happens if you sys.stdout.write a unicode string.
Ezio: I think it is highly unlikely that someone would be fiddling around with sys.stdout and then calling the main() function on calendar.main() (after setting up sys.args so that option parser gets the right parameters).
|
msg141738 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2011-08-07 06:06 |
Chris, as you already suggested in msg124410, I would prefer the removal of encoding specification from command line. Providing the encoding argument via API with deprecating warning is a good idea.
The output should be string and it should be just be printed. It should not be converted to bytes and then decoded back to string.
|
msg141744 - (view) |
Author: Chris Lambacher (lambacck) * |
Date: 2011-08-07 15:13 |
Senthil: I wasn't advocating the removal of the ability to specify encoding, only stating that avoiding the use of sys.stdout.buffer.write could only sensibly be done if we also removed the ability to specify an encoding (I can see how my wording might have been confusing). Earlier (msg122219) I said that we probably can't remove the encoding option because that would break backwards compatibility.
FWIW, I think that the encoding parameter to the console program is probably sensible because there are no guarantees that the sys.getdefaultencoding() is actually set to the console encoding and I would guess that most people using the html option are not sending output to the console anyway. They are likely redirecting the output somewhere else because there is no "output this to a file" option.
|
msg141746 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-08-07 19:31 |
I don't think "WARNING: this may corrupt your terminal" is warranted. Also, I think we can safely remove the --encoding option. Calling a module from the command-line is intended as a demonstration of the module's functionality, it's not an official API that we have to keep compatible.
Apart from that, Chris' latest patch and Catherine's minimal test look ok to me.
|
msg141765 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2011-08-08 10:46 |
I agree with Chris's point that people using html format in console are probably redirecting to a file and encoding parameter should be helpful.
What do we settle upon? Leave the encoding argument as such?
Antoine - I was specifically interested to know if sys.stdout.buffer.write as in the patch was good(/proper) way to go forward writing the bytes?
|
msg141839 - (view) |
Author: Antoine Pitrou (pitrou) * |
Date: 2011-08-09 19:17 |
Le Mon, 08 Aug 2011 10:46:20 +0000,
Senthil Kumaran <report@bugs.python.org> a écrit :
>
> Antoine - I was specifically interested to know if
> sys.stdout.buffer.write as in the patch was good(/proper) way to go
> forward writing the bytes?
Yes, it is.
|
msg141889 - (view) |
Author: Roundup Robot (python-dev) |
Date: 2011-08-11 01:25 |
New changeset 9fc7ef60ea06 by Senthil Kumaran in branch '3.2':
Fix closes Issue10087 - fixing the output of calendar display in the html format. Patch by Chris Lambacher. Test Contributed by catherine.
http://hg.python.org/cpython/rev/9fc7ef60ea06
New changeset 23316468ed4f by Senthil Kumaran in branch '3.2':
News item for Issue10087.
http://hg.python.org/cpython/rev/23316468ed4f
|
msg141973 - (view) |
Author: Éric Araujo (eric.araujo) * |
Date: 2011-08-12 16:55 |
There were comments by Ezio and me on Rietveld.
Also, the commit adds a period after the help text for --encoding, but all other help text aren’t capitalized and don’t use periods, as is usual in help messages.
|
msg142002 - (view) |
Author: Senthil Kumaran (orsenthil) * |
Date: 2011-08-13 00:36 |
Hello Éric,
I might have ignored some minor stylistic comments. The '.' in the help text
and , after the last TestName, I am not sure if it is of concern.
I think, to update the stylistic comments, if the submitters (if they
care) could have updated the patch, or a separate commit on style
changes can be done (if they are worth it).
And yeah, it is bad to ignore substantiative comments. I hope, I did
not overlook any. Ezio's main point was he was worried about using
sys.stdout.buffer (as I was too) and we got that clarified.
Thanks,
Senthil
|
|
Date |
User |
Action |
Args |
2022-04-11 14:57:07 | admin | set | github: 54296 |
2011-08-13 00:36:55 | orsenthil | set | messages:
+ msg142002 |
2011-08-12 16:55:23 | eric.araujo | set | messages:
+ msg141973 |
2011-08-11 01:25:32 | python-dev | set | status: open -> closed
nosy:
+ python-dev messages:
+ msg141889
resolution: fixed stage: test needed -> resolved |
2011-08-09 19:17:00 | pitrou | set | messages:
+ msg141839 |
2011-08-08 11:19:25 | vstinner | set | nosy:
- vstinner
|
2011-08-08 10:46:19 | orsenthil | set | messages:
+ msg141765 |
2011-08-07 19:31:28 | pitrou | set | messages:
+ msg141746 |
2011-08-07 15:13:36 | lambacck | set | messages:
+ msg141744 |
2011-08-07 06:06:42 | orsenthil | set | nosy:
+ pitrou messages:
+ msg141738
|
2011-08-07 01:40:11 | lambacck | set | messages:
+ msg141736 |
2011-08-07 01:23:33 | ezio.melotti | set | messages:
+ msg141734 |
2011-08-07 01:00:59 | orsenthil | set | nosy:
+ orsenthil messages:
+ msg141732
|
2011-08-01 14:37:35 | eric.araujo | set | messages:
+ msg141518 |
2011-07-31 13:56:43 | r.david.murray | set | nosy:
+ r.david.murray
|
2011-07-31 03:34:11 | catherine | set | files:
+ test_calendar_byteoutput.patch nosy:
+ catherine messages:
+ msg141457
|
2010-12-21 02:06:11 | ron_adam | set | nosy:
doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck messages:
+ msg124414 |
2010-12-20 22:38:29 | lambacck | set | nosy:
doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck messages:
+ msg124410 |
2010-12-20 22:13:20 | ron_adam | set | nosy:
doerwalter, georg.brandl, rhettinger, belopolsky, vstinner, ron_adam, gpolo, ezio.melotti, eric.araujo, lambacck messages:
+ msg124409 |
2010-12-20 16:55:24 | ron_adam | set | nosy:
+ ron_adam
|
2010-11-23 15:11:16 | lambacck | set | messages:
+ msg122219 |
2010-11-23 05:03:14 | belopolsky | set | messages:
+ msg122191 |
2010-11-22 17:24:27 | lambacck | set | messages:
+ msg122149 |
2010-11-22 17:14:33 | belopolsky | set | messages:
+ msg122148 |
2010-11-22 16:55:43 | lambacck | set | files:
+ calendar_bytes_output_v2.patch
messages:
+ msg122147 |
2010-11-22 15:27:45 | belopolsky | set | messages:
+ msg122136 |
2010-11-22 14:48:54 | ezio.melotti | set | nosy:
+ ezio.melotti
|
2010-11-22 14:48:17 | lambacck | set | messages:
+ msg122131 |
2010-11-22 03:35:58 | belopolsky | set | messages:
+ msg122084 |
2010-11-22 03:25:39 | lambacck | set | files:
+ calendar_bytes_output.patch nosy:
+ lambacck messages:
+ msg122082
|
2010-11-22 01:54:52 | belopolsky | set | nosy:
+ rhettinger
|
2010-10-18 16:22:48 | eric.araujo | set | nosy:
+ eric.araujo
|
2010-10-13 20:13:17 | belopolsky | set | nosy:
+ georg.brandl, gpolo messages:
+ msg118576
|
2010-10-13 18:13:49 | belopolsky | set | messages:
+ msg118555 |
2010-10-13 16:42:36 | belopolsky | set | messages:
+ msg118542 |
2010-10-13 16:20:35 | doerwalter | set | files:
+ calendar.diff
nosy:
+ doerwalter messages:
+ msg118536
keywords:
+ patch |
2010-10-13 16:00:24 | belopolsky | set | nosy:
+ vstinner messages:
+ msg118533
|
2010-10-13 15:43:26 | belopolsky | create | |