This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Matthew.Boehm
Recipients Matthew.Boehm, docs@python, vstinner
Date 2011-08-30.04:45:18
SpamBayes Score 2.86106e-07
Marked as misclassified No
Message-id <>
I've attached a patch for python2.7 that adds a small not to library/stdtypes.html#str.splitlines explaining which sequences are treated as line breaks:

Note: Python recognizes "\r", "\n", and "\r\n" as line boundaries for strings.

In addition to these, Unicode strings can have line boundaries of u"\x0b", u"\x0c", u"\x85", u"\u2028", and u"\u2029"

Additional thoughts:

* Would it be better to put this note in a different place?

* It looks like \x0b and \x0c (vertical tab and form feed) were first considered line breaks in Python 2.7, probably related to this note from "What's New in 2.7": "The Unicode database provided by the unicodedata module is now used internally to determine which characters are numeric, whitespace, or represent line breaks." It might be worth putting a "changed in 2.7" note somewhere in the docs.

Please let me know of any thoughts you have and I'll be glad to make any desired changes and submit a new patch.
Date User Action Args
2011-08-30 04:45:19Matthew.Boehmsetrecipients: + Matthew.Boehm, vstinner, docs@python
2011-08-30 04:45:19Matthew.Boehmsetmessageid: <>
2011-08-30 04:45:19Matthew.Boehmlinkissue12855 messages
2011-08-30 04:45:18Matthew.Boehmcreate