Message 299848 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Guillaume Sanchez
Recipients	Guillaume Sanchez, Socob, benjamin.peterson, ezio.melotti, lemburg, loewis, mrabarnett, r.david.murray, scoder, serhiy.storchaka, steven.daprano, terry.reedy, vstinner
Date	2017-08-07.13:47:48
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1502113669.02.0.13367737221.issue30717@psf.upfronthosting.co.za>
In-reply-to

Content
> I don't think unicodedata is the right place I do agree with that. A new module sounds good, would it be a problem if that module would contain very few functions at first? > Can we mark this as having a Provisional API to give us time to decide on the best API before locking it in permanently? I'm not sure it's my call to make, but I would gladly consider that option. > we should go through a PEP. Why not. I may need a bit of guidance though. > If you want state keeping for iterating over multiple <indextype> parts of the string, you can use an iterator. Sure thing. It just wasn't specified like this in the proto-PEP. > The APIs were inspired by the standard string.find() APIs, that's why they work on indexes and don't return Unicode strings. As such, they serve a different use case than an iterator. I personally like having a generator returning slice objects, as suggested above. What would be some good objections to this? > Wouldn't this be a typical case where we'd expect a module to evolve and gain usage on PyPI first, before adding it to the stdlib? [...] they might give inspiration for a suitable API design I'll give it a look. > The well known library for Unicode support in C++ and Java is ICU Yes. I clearly don't want this PR to be interpreted as "we're needing ICU". ICU provides much much more than what I'm willing to provide. My goal here is just to "fix" how the python's standard library iterates over characters. Noting more, nothing less. One might think that splitlines() should be "fixed" too, and there is clearly matter to discuss here. Same for words splitting. However, I do not intend to bring normalization, which you already have, collations, locale dependant upcasing or lowercasing, etc. We might need a wheel, but we don't have to take the whole truck. How do we discuss all of this? Who's in charge of making decisions? How long should we debate? That's my first time contributing to Python and I'm new to all of that. Thanks for your time.

> I don't think unicodedata is the right place

I do agree with that. A new module sounds good, would it be a problem if that module would contain very few functions at first?

> Can we mark this as having a Provisional API to give us time to decide on the best API before locking it in permanently?

I'm not sure it's my call to make, but I would gladly consider that option.

> we should go through a PEP.

Why not. I may need a bit of guidance though.

> If you want state keeping for iterating over multiple <indextype> parts of the string, you can use an iterator.

Sure thing. It just wasn't specified like this in the proto-PEP.

> The APIs were inspired by the standard string.find() APIs, that's why they work on indexes and don't return Unicode strings. As such, they serve a different use case than an iterator.

I personally like having a generator returning slice objects, as suggested above. What would be some good objections to this?

> Wouldn't this be a typical case where we'd expect a module to evolve and gain usage on PyPI first, before adding it to the stdlib? [...] they might give inspiration for a suitable API design

I'll give it a look.

> The well known library for Unicode support in C++ and Java is ICU

Yes. I clearly don't want this PR to be interpreted as "we're needing ICU". ICU provides much much more than what I'm willing to provide. My goal here is just to "fix" how the python's standard library iterates over characters. Noting more, nothing less.

One might think that splitlines() should be "fixed" too, and there is clearly matter to discuss here. Same for words splitting. However, I do not intend to bring normalization, which you already have, collations, locale dependant upcasing or lowercasing, etc. We might need a wheel, but we don't have to take the whole truck.

How do we discuss all of this? Who's in charge of making decisions? How long should we debate? That's my first time contributing to Python and I'm new to all of that.

Thanks for your time.

History
Date	User	Action	Args
2017-08-07 13:47:49	Guillaume Sanchez	set	recipients: + Guillaume Sanchez, lemburg, loewis, terry.reedy, scoder, vstinner, benjamin.peterson, ezio.melotti, mrabarnett, steven.daprano, r.david.murray, serhiy.storchaka, Socob
2017-08-07 13:47:49	Guillaume Sanchez	set	messageid: <1502113669.02.0.13367737221.issue30717@psf.upfronthosting.co.za>
2017-08-07 13:47:49	Guillaume Sanchez	link	issue30717 messages
2017-08-07 13:47:48	Guillaume Sanchez	create