Message 143199 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Matthew.Boehm
Recipients	Matthew.Boehm, docs@python, vstinner
Date	2011-08-30.04:45:18
SpamBayes Score	2.8610626e-07
Marked as misclassified	No
Message-id	<1314679519.7.0.470795314113.issue12855@psf.upfronthosting.co.za>
In-reply-to

Content
I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks: """ Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings. In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029" """ Additional thoughts: * Would it be better to put this note in a different place? * It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs. Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.

I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks:

"""
Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029"
"""

Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.

History
Date	User	Action	Args
2011-08-30 04:45:19	Matthew.Boehm	set	recipients: + Matthew.Boehm, vstinner, docs@python
2011-08-30 04:45:19	Matthew.Boehm	set	messageid: <1314679519.7.0.470795314113.issue12855@psf.upfronthosting.co.za>
2011-08-30 04:45:19	Matthew.Boehm	link	issue12855 messages
2011-08-30 04:45:18	Matthew.Boehm	create