Message 144803 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ezio.melotti
Recipients	ezio.melotti, gvanrossum, lemburg, loewis, mrabarnett, tchrist, terry.reedy
Date	2011-10-03.04:15:49
SpamBayes Score	0.0
Marked as misclassified	No
Message-id	<1317615350.8.0.846192018045.issue12753@psf.upfronthosting.co.za>
In-reply-to

Content
> But it still has to happen at compile time, of course, so I don't know > what you could do in Python. Is there any way to change how the compiler > behaves even vaguely along these lines? I think things like "from __future__ import ..." do something similar, but I'm not sure it will work in this case (also because you will have to provide the list of aliases somehow). >> Really? White space makes things harder to read? I thought Pythonistas >> believed the opposite of that. Whitespace is very useful for cognitive >> chunking: you see how things logically group together. > I was surprised at that too ;-). One person's opinion in a specific > context. Don't generaliza. Also don't generalize my opinion regarding where whitespace makes thing less readable: I was just talking about regex. What I was trying to say here is best summarized by a quote from Paul Graham's article "Succinctness is Power": """ If you're used to reading novels and newspaper articles, your first experience of reading a math paper can be dismaying. It could take half an hour to read a single page. And yet, I am pretty sure that the notation is not the problem, even though it may feel like it is. The math paper is hard to read because the ideas are hard. If you expressed the same ideas in prose (as mathematicians had to do before they evolved succinct notations), they wouldn't be any easier to read, because the paper would grow to the size of a book. """ Try replacing s/novels and newspaper articles\|prose/Python code/g s/single page/single regex/ s/math paper/regex/g. To provide an example, I find: # define a function to capitalize s def my_capitalize(s): """This function capitalizes the argument s and returns it""" the_first_letter = s[0] # 0 means the first char the_rest_of_s = s[1:] # 1: means from the second till the end the_first_letter_uppercased = the_first_letter.upper() # upper makes the string uppercase the_rest_of_s_lowercased = the_rest_of_s.lower() # lower makes the string lowercase s_capitalized = the_first_letter_uppercased + the_rest_of_s_lowercased # + concatenates return s_capitalized less readable than: def my_capitalize(s): return s[0].upper() + s[1:].lower() You could argue that the first is much more explicit and in a way clearer, but overall I think you agree with me that is less readable. Also this clearly depends on how well you know the notation you are reading: if you don't know it very well, you might still prefer the commented/verbose/extended/redundant version. Another important thing to mention, is that notation of regular expressions is fairly simple (especially if you leave out look-arounds and Unicode-related things that are not used too often), but having a similar succinct notation for a whole programming language (like Perl) might not work as well (I'm not picking on Perl here, as you said you can write readable programs if you don't abuse the notation, and the succinctness offered by the language has some advantages, but with Python we prefer more readable, even if we have to be a little more verbose). Another example of a trade-off between verbosity and succinctness is the new string formatting mini-language. > That really isn't right. A cased character is one with the Unicode "Cased" > property, and a lowercase character is one wiht the Unicode "Lowercase" > property. The General Category is actually immaterial here. You might want to take a look and possibly add a comment on #12204 about this. > I've spent all bloody day trying to model Python's islower, isupper, and istitle > functions, but I get all kinds of errors, both in the definitions and in the > models of the definitions. If by "model" you mean "trying to figure out how they work", it's probably easier to look at the implementation (I assume you know enough C to understand what they do). You can find the code for str.istitle() at http://hg.python.org/cpython/file/default/Objects/unicodeobject.c#l10358 and the actual implementation of some macros like Py_UNICODE_ISTITLE at http://hg.python.org/cpython/file/default/Objects/unicodectype.c. > I really don't understand any of these functions. I'm very sad. I think they are > wrong, but maybe I am. It is extremely confusing. > Shall I file a separate bug report? If after reading the code and/or the documentation you still think they are broken and/or that they can be improved, then you can open another issue. BTW, instead of writing custom scripts to test things, it might be better to use unittest (see http://docs.python.org/py3k/library/unittest.html#basic-example), or even better write a patch for Lib/test/test_unicode.py. Using unittest has the advantage that is then easy to integrate those tests within our test suite, but on the other hand as soon as something fails the failure is returned without evaluating the following assertions in the method. This as the advantage that

> But it still has to happen at compile time, of course, so I don't know
> what you could do in Python.  Is there any way to change how the compiler
> behaves even vaguely along these lines?

I think things like "from __future__ import ..." do something similar, but I'm not sure it will work in this case (also because you will have to provide the list of aliases somehow).

>> Really?  White space makes things harder to read?  I thought Pythonistas
>> believed the opposite of that.  Whitespace is very useful for cognitive
>> chunking: you see how things logically group together.

> I was surprised at that too ;-). One person's opinion in a specific 
> context. Don't generaliza.

Also don't generalize my opinion regarding *where* whitespace makes thing less readable: I was just talking about regex.
What I was trying to say here is best summarized by a quote from Paul Graham's article "Succinctness is Power":
"""
If you're used to reading novels and newspaper articles, your first experience of reading a math paper can be dismaying. It could take half an hour to read a single page. And yet, I am pretty sure that the notation is not the problem, even though it may feel like it is. The math paper is hard to read because the ideas are hard. If you expressed the same ideas in prose (as mathematicians had to do before they evolved succinct notations), they wouldn't be any easier to read, because the paper would grow to the size of a book.
"""
Try replacing
  s/novels and newspaper articles|prose/Python code/g
  s/single page/single regex/
  s/math paper/regex/g.

To provide an example, I find:

# define a function to capitalize s
def my_capitalize(s):
    """This function capitalizes the argument s and returns it"""
    the_first_letter = s[0]  # 0 means the first char
    the_rest_of_s = s[1:]  # 1: means from the second till the end
    the_first_letter_uppercased = the_first_letter.upper()  # upper makes the string uppercase
    the_rest_of_s_lowercased = the_rest_of_s.lower()  # lower makes the string lowercase
    s_capitalized = the_first_letter_uppercased + the_rest_of_s_lowercased  # + concatenates
    return s_capitalized

less readable than:

def my_capitalize(s):
    return s[0].upper() + s[1:].lower()

You could argue that the first is much more explicit and in a way clearer, but overall I think you agree with me that is less readable.  Also this clearly depends on how well you know the notation you are reading: if you don't know it very well, you might still prefer the commented/verbose/extended/redundant version.  Another important thing to mention, is that notation of regular expressions is fairly simple (especially if you leave out look-arounds and Unicode-related things that are not used too often), but having a similar succinct notation for a whole programming language (like Perl) might not work as well (I'm not picking on Perl here, as you said you can write readable programs if you don't abuse the notation, and the succinctness offered by the language has some advantages, but with Python we prefer more readable, even if we have to be a little more verbose).  Another example of a trade-off between verbosity and succinctness is the new string formatting mini-language.

> That really isn't right.  A cased character is one with the Unicode "Cased"
> property, and a lowercase character is one wiht the Unicode "Lowercase"
> property.  The General Category is actually immaterial here.

You might want to take a look and possibly add a comment on #12204 about this.

> I've spent all bloody day trying to model Python's islower, isupper, and istitle
> functions, but I get all kinds of errors, both in the definitions and in the
> models of the definitions.

If by "model" you mean "trying to figure out how they work", it's probably easier to look at the implementation (I assume you know enough C to understand what they do).  You can find the code for str.istitle() at http://hg.python.org/cpython/file/default/Objects/unicodeobject.c#l10358 and the actual implementation of some macros like Py_UNICODE_ISTITLE at http://hg.python.org/cpython/file/default/Objects/unicodectype.c.

> I really don't understand any of these functions.  I'm very sad.  I think they are
> wrong, but maybe I am.  It is extremely confusing.

> Shall I file a separate bug report?

If after reading the code and/or the documentation you still think they are broken and/or that they can be improved, then you can open another issue.

BTW, instead of writing custom scripts to test things, it might be better to use unittest (see http://docs.python.org/py3k/library/unittest.html#basic-example), or even better write a patch for Lib/test/test_unicode.py.
Using unittest has the advantage that is then easy to integrate those tests within our test suite, but on the other hand as soon as something fails the failure is returned without evaluating the following assertions in the method.
This as the advantage that

History
Date	User	Action	Args
2011-10-03 04:15:51	ezio.melotti	set	recipients: + ezio.melotti, lemburg, gvanrossum, loewis, terry.reedy, mrabarnett, tchrist
2011-10-03 04:15:50	ezio.melotti	set	messageid: <1317615350.8.0.846192018045.issue12753@psf.upfronthosting.co.za>
2011-10-03 04:15:50	ezio.melotti	link	issue12753 messages
2011-10-03 04:15:49	ezio.melotti	create