This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Matthew.Boehm
Recipients Matthew.Boehm, docs@python, vstinner
Date 2011-08-30.04:45:18
SpamBayes Score 2.8610626e-07
Marked as misclassified No
Message-id <1314679519.7.0.470795314113.issue12855@psf.upfronthosting.co.za>
In-reply-to
Content
I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks:

"""
Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029"
"""

Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.
History
Date User Action Args
2011-08-30 04:45:19Matthew.Boehmsetrecipients: + Matthew.Boehm, vstinner, docs@python
2011-08-30 04:45:19Matthew.Boehmsetmessageid: <1314679519.7.0.470795314113.issue12855@psf.upfronthosting.co.za>
2011-08-30 04:45:19Matthew.Boehmlinkissue12855 messages
2011-08-30 04:45:18Matthew.Boehmcreate