Message 115527 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	belopolsky
Recipients	Claudiu.Popa, belopolsky, eric.araujo, pitrou
Date	2010-09-03.22:47:04
SpamBayes Score	1.7184353e-08
Marked as misclassified	No
Message-id	<1283554026.29.0.926824675179.issue9598@psf.upfronthosting.co.za>
In-reply-to

Content
> If untabify fails because a file has an incorrect encoding, is it really > a problem in untabify? This is a developer’s tool, so getting a > traceback here seems okay to me. I disagree. I think we should use this opportunity to clarify preferred encoding for C language source files in python and make untabify produce meaningful diagnostic in case of encoding errors. As a matter of policy, I see two possibilities: 1. Restrict C sources to 7-bit ASCII. (A pedantic reading of ANSI C standard would probably suggest even more restricted character set, but practically, I don't think 7-bit ASCII in C comments is likely to cause problems for any tools. 2. Require UTF-8 encoding for non-ASCII characters. Given that this is the default for python source code, it is likely that tools that are used for python development can handle UTF-8. My vote is for #1. Display of non-ascii characters is still not universally supported and they are likely to be clobbered when diffs are copied in e-mails etc.

> If untabify fails because a file has an incorrect encoding, is it really
> a problem in untabify? This is a developer’s tool, so getting a
> traceback here seems okay to me.

I disagree.  I think we should use this opportunity to clarify preferred encoding for C language source files in python and make untabify produce meaningful diagnostic in case of encoding errors.

As a matter of policy, I see two possibilities:

1. Restrict C sources to 7-bit ASCII.  (A pedantic reading of ANSI C standard would probably suggest even more restricted character set, but practically, I don't think 7-bit ASCII in C comments is likely to cause problems for any tools.

2. Require UTF-8 encoding for non-ASCII characters.  Given that this is the default for python source code, it is likely that tools that are used for python development can handle UTF-8.

My vote is for #1.  Display of non-ascii characters is still not universally supported and they are likely to be clobbered when diffs are copied in e-mails etc.

History
Date	User	Action	Args
2010-09-03 22:47:06	belopolsky	set	recipients: + belopolsky, pitrou, eric.araujo, Claudiu.Popa
2010-09-03 22:47:06	belopolsky	set	messageid: <1283554026.29.0.926824675179.issue9598@psf.upfronthosting.co.za>
2010-09-03 22:47:05	belopolsky	link	issue9598 messages
2010-09-03 22:47:04	belopolsky	create