Issue17878
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-04-30 09:51 by paul.moore, last changed 2022-04-11 14:57 by admin.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
codecs_searchers.py | dmi.baranov, 2013-05-02 13:45 |
Messages (13) | |||
---|---|---|---|
msg188147 - (view) | Author: Paul Moore (paul.moore) * | Date: 2013-04-30 09:51 | |
The codecs module allows the user to register additional codecs, but does not offer a means of getting a list of registered codecs. This is important, for example, in a tool to re-encode files. It is reasonable to expect that such a tool would offer a list of supported encodings, to assist the user. For example, the -l option of the iconv command. |
|||
msg188247 - (view) | Author: Dmi Baranov (dmi.baranov) * | Date: 2013-05-01 23:59 | |
I think its not possible while codecs registry contains search callbacks (stateless-registry) |
|||
msg188252 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-05-02 06:46 | |
On 02.05.2013 01:59, Dmi Baranov wrote: > > Dmi Baranov added the comment: > > I think its not possible while codecs registry contains search callbacks (stateless-registry) It is possible: we'd just need to invent a way to ask search functions for the list of available codecs, e.g. by moving from plain function objects to CodecSearchFunction objects. |
|||
msg188267 - (view) | Author: Dmi Baranov (dmi.baranov) * | Date: 2013-05-02 13:45 | |
I think the "function" is a bit misleading. I suggest something like CodecsSearcher, please look at attached implementation (dirty code, just for start discussion about interfaces, lazy caches, etc). |
|||
msg188268 - (view) | Author: Alyssa Coghlan (ncoghlan) * | Date: 2013-05-02 14:41 | |
This is actually similar to the problem with getting the list of modules an importer provides (that is, we don't currently have an officially defined method in the importer protocol for that, although pkgutil.iter_importer_modules implicitly looks for an "iter_modules" method, due to the old import emulation used until Python 3.2). I see three possibilities: 1. Use independent purpose specific protocols to get a list of entries out of these objects. 2. Create a new, common protocol for extracting lists of entries from search hooks like importers and codec search functions 3. Use the existing __iter__ protocol I'm currently thinking option 3 might be a reasonable way forward. That is, if a codec search hook wants to provide a listing of available codecs, it can just define __iter__ in addition to __call__. Importers could define __iter__ in addition to the other methods in the importer API. Thoughts? |
|||
msg188269 - (view) | Author: Walter Dörwald (doerwalter) * | Date: 2013-05-02 14:45 | |
The point of using a function is to allow the function special hanling of the encoding name, which goes beyond a simple map lookup, i.e. you could do the following: import codecs def search_function(encoding): if not encoding.startswith("append-"): return None suffix = encoding[7:] def encode(s, errors="strict"): s = (s + suffix).encode("utf-8", errors) return (s, len(s)) def decode(s, errors="strict"): s = bytes(s).decode("utf-8", errors) if s.endswith(suffix): s = s[:-len(suffix)] return (s, len(s)) return codecs.CodecInfo(encode, decode, name=encoding) codecs.register(search_function) $ python Python 3.3.1 (default, Apr 29 2013, 15:35:47) [GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.24)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import appendcodec >>> 'foo'.encode('append-bar') b'foobar' >>> b'foobar'.decode('append-bar') 'foo' The search function can't return a list of codec names in this case, as the list is infinite. |
|||
msg188270 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-05-02 14:47 | |
On 02.05.2013 16:41, Nick Coghlan wrote: > > Nick Coghlan added the comment: > > This is actually similar to the problem with getting the list of modules an importer provides (that is, we don't currently have an officially defined method in the importer protocol for that, although pkgutil.iter_importer_modules implicitly looks for an "iter_modules" method, due to the old import emulation used until Python 3.2). > > I see three possibilities: > > 1. Use independent purpose specific protocols to get a list of entries out of these objects. > > 2. Create a new, common protocol for extracting lists of entries from search hooks like importers and codec search functions > > 3. Use the existing __iter__ protocol > > I'm currently thinking option 3 might be a reasonable way forward. That is, if a codec search hook wants to provide a listing of available codecs, it can just define __iter__ in addition to __call__. Importers could define __iter__ in addition to the other methods in the importer API. > > Thoughts? Too obscure :-) Let the object expose a method: .list_codecs() -> returns a list of supported codecs as CodecInfo objects. We may also deprecate the .__call__() in favor of: .find_codec(encoding) -> return codec implementing encoding. |
|||
msg188271 - (view) | Author: Paul Moore (paul.moore) * | Date: 2013-05-02 14:51 | |
@doerwalter In that case, I'd take the view that such a codec should simply not return anything. The discovery mechanism can be limited to returning only statically discoverable codec names (and it can be documented as such). The original use case was to support functionality like iconv -l. Omitting edge cases like this is probably reasonable in that context. |
|||
msg188272 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-05-02 14:53 | |
On 02.05.2013 16:45, Walter Dörwald wrote: > ... > The search function can't return a list of codec names in this case, as the list is infinite. True. The search object will have to be allowed to raise a NotImplementedError or some other error/return value to signal that the list of supported codecs is not available. Note that the search object should only return a list of supported canonical encoding names with .list_codecs(), not all possible ones :-) |
|||
msg188273 - (view) | Author: Dmi Baranov (dmi.baranov) * | Date: 2013-05-02 15:35 | |
My +1 for __iter__ with default `raise StopIteration`, it is more elegant solution than declaration and guarantee of the interfaces (based at collections.abc.Callable and collections.abc.Iterator). Paul, result as iterable of CodecInfo objects is gives much more flexibility than the names of codecs (whats if you will have a few codecs with the same name in different SearchObjects?) As I see, you would like use this as: encoded_data = 'abc' for codecs in codecs.registered_codecs(): decoded_data = codecs.decode(data) if decoded_data == 'cba': # cracked break Whats about backward compatibly with Lib/encoding modules (initial item in interp->codec_search_path)? Can we skip anything in search_path, if its not supports iteration? |
|||
msg188274 - (view) | Author: Marc-Andre Lemburg (lemburg) * | Date: 2013-05-02 15:40 | |
On 02.05.2013 16:53, Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > On 02.05.2013 16:45, Walter Dörwald wrote: >> ... >> The search function can't return a list of codec names in this case, as the list is infinite. > > True. > > The search object will have to be allowed to raise a > NotImplementedError or some other error/return value > to signal that the list of supported codecs is not available. > > Note that the search object should only return a list of > supported canonical encoding names with .list_codecs(), > not all possible ones :-) Scratch that last sentence. Returning CodecInfo instances, as I originally wrote, is a better way to go. |
|||
msg188275 - (view) | Author: Paul Moore (paul.moore) * | Date: 2013-05-02 15:43 | |
On 2 May 2013 16:35, Dmi Baranov <report@bugs.python.org> wrote: > Paul, result as iterable of CodecInfo objects is gives much more > flexibility than the names of codecs (whats if you will have a few codecs > with the same name in different SearchObjects?) Works for me. My usage would be def list_supported_codecs(): for codec in codecs.registered_codecs(): print(codec.name) |
|||
msg188277 - (view) | Author: Dmi Baranov (dmi.baranov) * | Date: 2013-05-02 16:00 | |
Sorry for additional nose - currently there is no way to change the codecs_search_path. Similarly with sys.patch_hooks is a great way to increase the level of customization (maybe I have a faster codec?). |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:45 | admin | set | github: 62078 |
2013-05-02 16:00:48 | dmi.baranov | set | messages: + msg188277 |
2013-05-02 15:43:58 | paul.moore | set | messages: + msg188275 |
2013-05-02 15:40:54 | lemburg | set | messages: + msg188274 |
2013-05-02 15:35:50 | dmi.baranov | set | messages: + msg188273 |
2013-05-02 14:53:14 | lemburg | set | messages: + msg188272 |
2013-05-02 14:51:01 | paul.moore | set | messages: + msg188271 |
2013-05-02 14:47:27 | lemburg | set | messages: + msg188270 |
2013-05-02 14:45:58 | doerwalter | set | nosy:
+ doerwalter messages: + msg188269 |
2013-05-02 14:41:44 | ncoghlan | set | messages: + msg188268 |
2013-05-02 13:45:45 | dmi.baranov | set | files:
+ codecs_searchers.py messages: + msg188267 |
2013-05-02 06:46:15 | lemburg | set | messages: + msg188252 |
2013-05-01 23:59:18 | dmi.baranov | set | nosy:
+ dmi.baranov messages: + msg188247 components: + Library (Lib) |
2013-04-30 10:21:39 | vstinner | set | nosy:
+ vstinner |
2013-04-30 09:56:17 | ezio.melotti | set | nosy:
+ lemburg, ncoghlan |
2013-04-30 09:51:57 | paul.moore | create |