Message 115339 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	vstinner
Recipients	docs@python, vstinner
Date	2010-09-01.22:41:28
SpamBayes Score	5.7731597e-15
Marked as misclassified	No
Message-id	<1283380895.91.0.777071955411.issue9738@psf.upfronthosting.co.za>
In-reply-to

Content
Many C functions have bytes argument (char* type) but the encoding is not documented. If would not be a problem if the encoding was always the same, but it is not. Examples: - format of PyUnicode_FromFormat() should be encoded as ISO-8859-1 - filename of PyParser_ASTFromString() should be encoded as utf-8 - filename of PyErr_SetFromErrnoWithFilename() should be encoded to the filesystem encoding (with strict error handler, and not surrogateescape) - 's' argument of PyParser_ASTFromString() should be encoded as utf-8 if PyPARSE_IGNORE_COOKIE flag is set, otherwise the parser checks for #coding:xxx cookie (if there is no cookie, utf-8 is used) Attached patch is a try to document most low level functions. I choosed to add the name of function arguments in the headers because I consider that a header can be used as a quick documentation. I only touched .c files to change argument names. It is hard to get the right encoding, so I cannot ensure that my patch is correct. My patch is just a draft. I don't know if "encoded to utf-8" is the right expression. Or should it be "decoded as utf-8"?

Many C functions have bytes argument (char* type) but the encoding is not documented. If would not be a problem if the encoding was always the same, but it is not. Examples:
 - format of PyUnicode_FromFormat() should be encoded as ISO-8859-1
 - filename of PyParser_ASTFromString() should be encoded as utf-8
 - filename of PyErr_SetFromErrnoWithFilename() should be encoded to the filesystem encoding (with strict error handler, and not surrogateescape)
 - 's' argument of PyParser_ASTFromString() should be encoded as utf-8 if PyPARSE_IGNORE_COOKIE flag is set, otherwise the parser checks for #coding:xxx cookie (if there is no cookie, utf-8 is used)

Attached patch is a try to document most low level functions. I choosed to add the name of function arguments in the headers because I consider that a header can be used as a quick documentation. I only touched .c files to change argument names.

It is hard to get the right encoding, so I cannot ensure that my patch is correct. My patch is just a draft.

I don't know if "encoded to utf-8" is the right expression. Or should it be "decoded as utf-8"?

History
Date	User	Action	Args
2010-09-01 22:41:36	vstinner	set	recipients: + vstinner, docs@python
2010-09-01 22:41:35	vstinner	set	messageid: <1283380895.91.0.777071955411.issue9738@psf.upfronthosting.co.za>
2010-09-01 22:41:34	vstinner	link	issue9738 messages
2010-09-01 22:41:34	vstinner	create