Index: Doc/library/unicodedata.rst
===================================================================
--- Doc/library/unicodedata.rst	(revision 87144)
+++ Doc/library/unicodedata.rst	(working copy)
@@ -13,56 +13,164 @@
    single: character
    pair: Unicode; database
 
-This module provides access to the Unicode Character Database which defines
-character properties for all Unicode characters. The data in this database is
-based on the :file:`UnicodeData.txt` file version 5.2.0 which is publicly
-available from ftp://ftp.unicode.org/.
+This module provides access to the Unicode Character Database (UCD) which
+defines character properties for all Unicode characters. The data contained in
+this database is compiled from the `UCD version 6.0.0
+<http://www.unicode.org/Public/6.0.0/ucd>`_.
 
-The module uses the same names and symbols as defined by the UnicodeData File
-Format 5.2.0 (see http://www.unicode.org/reports/tr44/tr44-4.html).
-It defines the following functions:
+The module uses the same names and symbols as defined by Unicode Standard Annex
+#44, `"Unicode Character Database (UCD)"
+<http://www.unicode.org/reports/tr44/tr44-6.html>`_.  It defines the following
+functions:
 
 
 .. function:: lookup(name)
 
-   Look up character by name.  If a character with the given name is found, return
-   the corresponding character.  If not found, :exc:`KeyError` is raised.
+   Look up character by name.  If a character with the given name is found,
+   return the corresponding character.  If not found, :exc:`KeyError` is raised.
+   For example,::
 
+      >>> unicodedata.lookup('PILCROW SIGN')
+      '¶'
 
+   The characters returned by this function are the same as those produced by
+   ``\N`` escape sequence in string literals::
+
+      >>> unicodedata.lookup('MIDDLE DOT') == '\N{MIDDLE DOT}'
+      True
+
 .. function:: name(chr[, default])
 
    Returns the name assigned to the character *chr* as a string. If no
    name is defined, *default* is returned, or, if not given, :exc:`ValueError` is
-   raised.
+   raised.  For example,::
 
+      >>> unicodedata.name('Ӝ')
+      'CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS'
 
+      >>> unicodedata.name('\uFFFF', 'no name')
+      'no name'
+
 .. function:: decimal(chr[, default])
 
    Returns the decimal value assigned to the character *chr* as integer.
    If no such value is defined, *default* is returned, or, if not given,
-   :exc:`ValueError` is raised.
+   :exc:`ValueError` is raised.  For example,::
 
+      >>> unicodedata.decimal('\N{ARABIC-INDIC DIGIT NINE}')
+      9
 
+      >>> unicodedata.decimal('\N{SUPERSCRIPT NINE}', -1)
+      -1
+
+
 .. function:: digit(chr[, default])
 
    Returns the digit value assigned to the character *chr* as integer.
    If no such value is defined, *default* is returned, or, if not given,
-   :exc:`ValueError` is raised.
+   :exc:`ValueError` is raised.  For example,::
 
+      >>> unicodedata.digit('\N{SUPERSCRIPT NINE}')
+      9
 
+      >>> unicodedata.decimal('\N{ROMAN NUMERAL NINE}', -1)
+      -1
+
+
 .. function:: numeric(chr[, default])
 
    Returns the numeric value assigned to the character *chr* as float.
    If no such value is defined, *default* is returned, or, if not given,
    :exc:`ValueError` is raised.
 
+      >>> unicodedata.numeric('½')
+      0.5
 
+      >>> unicodedata.numeric('\N{ROMAN NUMERAL TEN THOUSAND}')
+      10000.0
+
+
 .. function:: category(chr)
 
-   Returns the general category assigned to the character *chr* as
-   string.
+   Returns the general category assigned to the character *chr* as string.
+   General category names consist of two letters.  The first letter is always
+   uppercase and denotes one of seven major categories: Letter (L), Mark (M),
+   Number (N), Punctuation (P), Symbol (S), Separator (Z), and Other (C).  The
+   second letter is always lowercase and further subdivides major categories
+   into minor subcategories.
 
+   +--------------------------------------------------------------------------+
+   | **General Categories**                                                   |
+   +----+-------------+------------------+------------------------------------+
+   |Name|Major        |Minor             |Examples                            |
+   +====+=============+==================+====================================+
+   |Lu  | Letter      | uppercase        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Ll  | Letter      | lowercase        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Lt  | Letter      | titlecase        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Lm  | Letter      | modifier         |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Lo  | Letter      | other            |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Mn  | Mark        | nonspacing       |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Mc  | Mark        | spacing combining|                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Me  | Mark        | enclosing        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Nd  | Number      | decimal digit    |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Nl  | Number      | letter           |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |No  | Number      | other            |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Pc  | Punctuation | connector        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Pd  | Punctuation | dash             |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Ps  | Punctuation | open             |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Pe  | Punctuation | close            |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Pi  | Punctuation | initial quote    |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Pf  | Punctuation | final quote      |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Po  | Punctuation | other            |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Sm  | Symbol      | math             |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Sc  | Symbol      | currency         |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Sk  | Symbol      | modifier         |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |So  | Symbol      | other            |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Zs  | Separator   | space            |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Zl  | Separator   | line             |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Zp  | Separator   | paragraph        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Cc  | Other       | control          |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Cf  | Other       | format           |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Cs  | Other       | surrogate        |                                    |
+   +----+-------------+------------------+------------------------------------+
+   |Co  | Other       | private use      |                                    |
+   +----+-------------+------------------+------------------------------------+
 
+   The following example program produces code point counts by major category:
+
+   .. literalinclude:: ../includes/unistat.py
+
+   ::
+
+      Counter({'C': 1004868, 'L': 100520, 'S': 5508, 'M': 1498, 'N': 1100, 'P': 598, 'Z': 20})
+
 .. function:: bidirectional(chr)
 
    Returns the bidirectional category assigned to the character *chr* as
@@ -158,4 +266,3 @@
    'Lu'
    >>> unicodedata.bidirectional('\u0660') # 'A'rabic, 'N'umber
    'AN'
-
Index: Doc/includes/unistat.py
===================================================================
--- Doc/includes/unistat.py	(revision 0)
+++ Doc/includes/unistat.py	(revision 0)
@@ -0,0 +1,9 @@
+import unicodedata
+from collections import Counter
+
+catcount = Counter()
+for i in range(0x110000):
+    cat = unicodedata.category(chr(i))[0]
+    catcount[cat] += 1
+
+print(catcount)

Property changes on: Doc/includes/unistat.py
___________________________________________________________________
Added: svn:keywords
   + Id
Added: svn:eol-style
   + native