Index: Doc/library/unicodedata.rst
===================================================================
--- Doc/library/unicodedata.rst	(revision 87160)
+++ Doc/library/unicodedata.rst	(working copy)
@@ -18,58 +18,196 @@
 this database is compiled from the `UCD version 6.0.0
 <http://www.unicode.org/Public/6.0.0/ucd>`_.
 
-The module uses the same names and symbols as defined by Unicode
-Standard Annex #44, `"Unicode Character Database"
-<http://www.unicode.org/reports/tr44/tr44-6.html>`_.  It defines the
-following functions:
+The module uses the same names and symbols as defined by Unicode Standard Annex
+#44, `"Unicode Character Database (UCD)"
+<http://www.unicode.org/reports/tr44/tr44-6.html>`_.  It defines the following
+functions:
 
 
 .. function:: lookup(name)
 
-   Look up character by name.  If a character with the given name is found, return
-   the corresponding character.  If not found, :exc:`KeyError` is raised.
+   Look up character by name.  If a character with the given name is found,
+   return the corresponding character.  If not found, :exc:`KeyError` is raised.
+   For example,::
 
+      >>> unicodedata.lookup('PILCROW SIGN')
+      '¶'
 
+   The characters returned by this function are the same as those produced by
+   ``\N`` escape sequence in string literals::
+
+      >>> unicodedata.lookup('MIDDLE DOT') == '\N{MIDDLE DOT}'
+      True
+
 .. function:: name(chr[, default])
 
    Returns the name assigned to the character *chr* as a string. If no
    name is defined, *default* is returned, or, if not given, :exc:`ValueError` is
-   raised.
+   raised.  For example,::
 
+      >>> unicodedata.name('Ӝ')
+      'CYRILLIC CAPITAL LETTER ZHE WITH DIAERESIS'
 
+      >>> unicodedata.name('\uFFFF', 'no name')
+      'no name'
+
 .. function:: decimal(chr[, default])
 
    Returns the decimal value assigned to the character *chr* as integer.
    If no such value is defined, *default* is returned, or, if not given,
-   :exc:`ValueError` is raised.
+   :exc:`ValueError` is raised.  For example,::
 
+      >>> unicodedata.decimal('\N{ARABIC-INDIC DIGIT NINE}')
+      9
 
+      >>> unicodedata.decimal('\N{SUPERSCRIPT NINE}', -1)
+      -1
+
+
 .. function:: digit(chr[, default])
 
    Returns the digit value assigned to the character *chr* as integer.
    If no such value is defined, *default* is returned, or, if not given,
-   :exc:`ValueError` is raised.
+   :exc:`ValueError` is raised.  For example,::
 
+      >>> unicodedata.digit('\N{SUPERSCRIPT NINE}')
+      9
 
+      >>> unicodedata.digit('\N{ROMAN NUMERAL NINE}', -1)
+      -1
+
+
 .. function:: numeric(chr[, default])
 
    Returns the numeric value assigned to the character *chr* as float.
    If no such value is defined, *default* is returned, or, if not given,
    :exc:`ValueError` is raised.
 
+      >>> unicodedata.numeric('½')
+      0.5
 
+      >>> unicodedata.numeric('\N{ROMAN NUMERAL TEN THOUSAND}')
+      10000.0
+
+
 .. function:: category(chr)
 
-   Returns the general category assigned to the character *chr* as
-   string.
+   Returns the general category assigned to the character *chr* as string.
+   General category names consist of two letters.  The first letter is always
+   uppercase and denotes one of seven major categories: Letter (L), Mark (M),
+   Number (N), Punctuation (P), Symbol (S), Separator (Z), and Other (C).  The
+   second letter is always lowercase and further subdivides major categories
+   into minor subcategories.
 
+   +--------------------------------------------------------------------------+
+   | **General Categories**                                                   |
+   +----+-------------+------------------+------------------------------------+
+   |Name|Major        |Minor             |Examples                            |
+   +====+=============+==================+====================================+
+   |Lu  | Letter      | uppercase        | 'A', 'Z', 'Ω'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Ll  | Letter      | lowercase        | 'a', 'z', 'ω'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Lt  | Letter      | titlecase        | 'ǅ', 'ǈ', 'ῼ''                     |
+   +----+-------------+------------------+------------------------------------+
+   |Lm  | Letter      | modifier         | 'ʰ', 'ʲ', 'ʶ'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Lo  | Letter      | other            | 'ƻ', 'א' ,'ث'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Mn  | Mark        | nonspacing       | '\\u0300' (GRAVE ACCENT)           |
+   +----+-------------+------------------+------------------------------------+
+   |Mc  | Mark        | spacing combining| 'ः' (DEVANAGARI SIGN VISARGA)      |
+   +----+-------------+------------------+------------------------------------+
+   |Me  | Mark        | enclosing        | '\\u20DD' (ENCLOSING CIRCLE)       |
+   +----+-------------+------------------+------------------------------------+
+   |Nd  | Number      | decimal digit    | '1', '١', '१'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Nl  | Number      | letter           | 'Ⅸ' (ROMAN NUMERAL NINE)           |
+   +----+-------------+------------------+------------------------------------+
+   |No  | Number      | other            | '²' (SUPERSCRIPT TWO)              |
+   +----+-------------+------------------+------------------------------------+
+   |Pc  | Punctuation | connector        | '_' (ASCII UNDERSCORE)             |
+   +----+-------------+------------------+------------------------------------+
+   |Pd  | Punctuation | dash             | '-' (ASCII HYPHEN-MINUS)           |
+   +----+-------------+------------------+------------------------------------+
+   |Ps  | Punctuation | open             | '(', '[', '{'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Pe  | Punctuation | close            | ')',  ']', '}'                     |
+   +----+-------------+------------------+------------------------------------+
+   |Pi  | Punctuation | initial quote    | '«', '‘', '⸠'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Pf  | Punctuation | final quote      | '»', '’', '⸡'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Po  | Punctuation | other            | '!', '"', '¿'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Sm  | Symbol      | math             | '+', '=', '±'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Sc  | Symbol      | currency         | '$', '£', '¥'                      |
+   +----+-------------+------------------+------------------------------------+
+   |Sk  | Symbol      | modifier         | '\\u00B8' (CEDILLA)                |
+   +----+-------------+------------------+------------------------------------+
+   |So  | Symbol      | other            | '☹' (FACE), '�' (REPLACEMENT CHAR) |
+   +----+-------------+------------------+------------------------------------+
+   |Zs  | Separator   | space            | ' ' (ASCII SPACE)                  |
+   +----+-------------+------------------+------------------------------------+
+   |Zl  | Separator   | line             | '\\u2028' (LINE SEPARATOR)         |
+   +----+-------------+------------------+------------------------------------+
+   |Zp  | Separator   | paragraph        | '\\u2029' (PARAGRAPH SEPARATOR)    |
+   +----+-------------+------------------+------------------------------------+
+   |Cc  | Other       | control          | '\\0' (NULL), '\\t' (TAB)          |
+   +----+-------------+------------------+------------------------------------+
+   |Cf  | Other       | format           | '\\u00AD' (SOFT HYPHEN)            |
+   +----+-------------+------------------+------------------------------------+
+   |Cs  | Other       | surrogate        |  '\\uD800' - '\\uDFFF'             |
+   +----+-------------+------------------+------------------------------------+
+   |Co  | Other       | private use      |  '\\uE000' - '\\uF8FF'             |
+   +----+-------------+------------------+------------------------------------+
+   |Cn  | Other       | not assigned     |  '\\uFFFF'                         |
+   +----+-------------+------------------+------------------------------------+
 
+   The following example program produces code point counts by major category:
+
+   .. literalinclude:: ../includes/unistat.py
+
+   ::
+
+      Counter({'C': 1004868, 'L': 100520, 'S': 5508, 'M': 1498, 'N': 1100, 'P': 598, 'Z': 20})
+
 .. function:: bidirectional(chr)
 
-   Returns the bidirectional category assigned to the character *chr* as
-   string. If no such value is defined, an empty string is returned.
+   Returns the bidirectional class assigned to the character *chr* as
+   string. If no such value is defined, an empty string is returned. For example,::
 
+      >>> unicodedata.bidirectional('\u0660') # 'A'rabic, 'N'umber
+      'AN'
 
+   Bidirectional class names returned by this function have the following meaning:
+
+   =====     =========================
+   Class      Description
+   =====     =========================
+   AL         Arabic Letter	       
+   AN         Arabic Number	       
+   B          Paragraph Separator	       
+   BN         Boundary Neutral	       
+   CS         Common Separator	       
+   EN         European Number	       
+   ES         European Separator	       
+   ET         European Terminator	       
+   L          Left To Right	       
+   LRE        Left To Right Embedding    
+   LRO        Left To Right Override     
+   NSM        Nonspacing Mark	       
+   ON         Other Neutral	       
+   PDF        Pop Directional Format     
+   R          Right To Left	       
+   RLE        Right To Left Embedding    
+   RLO        Right To Left Override     
+   S          Segment Separator	       
+   WS         White Space                
+   =====     =========================
+
+
 .. function:: combining(chr)
 
    Returns the canonical combining class assigned to the character *chr*
@@ -81,21 +219,37 @@
    Returns the east asian width assigned to the character *chr* as
    string.
 
+   ====      ============
+   Code      Description
+   ====      ============
+   A          Ambiguous 
+   F          Fullwidth 
+   H          Halfwidth 
+   N          Neutral   
+   Na         Narrow    
+   W          Wide      
+   ====      ============
 
 .. function:: mirrored(chr)
 
    Returns the mirrored property assigned to the character *chr* as
    integer. Returns ``1`` if the character has been identified as a "mirrored"
-   character in bidirectional text, ``0`` otherwise.
+   character in bidirectional text, ``0`` otherwise. For example,::
 
+      >>> unicodedata.mirrored('>')
+      1
 
+
 .. function:: decomposition(chr)
 
    Returns the character decomposition mapping assigned to the character
    *chr* as string. An empty string is returned in case no such mapping is
-   defined.
+   defined.  For example,::
 
+      >>> unicodedata.decomposition('è')
+      '0065 0300'
 
+
 .. function:: normalize(form, unistr)
 
    Return the normal form *form* for the Unicode string *unistr*. Valid values for
@@ -157,6 +311,3 @@
    ValueError: not a decimal
    >>> unicodedata.category('A')  # 'L'etter, 'u'ppercase
    'Lu'
-   >>> unicodedata.bidirectional('\u0660') # 'A'rabic, 'N'umber
-   'AN'
-