This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author phr
Recipients lemburg, phr
Date 2008-05-15.14:39:29
SpamBayes Score 0.041105404
Marked as misclassified No
Message-id <1210862385.72.0.789210868978.issue2857@psf.upfronthosting.co.za>
In-reply-to
Content
I'm not sure what you mean by "ditto for Lucene indexes".  I wasn't
planning to use C code.  I was hoping to write Python code to parse
those indexes, then found they use this weird encoding, and Python's
codec set is fairly inclusive already, so this codec sounded like a
reasonably useful addition.  It probably shows up other places as well.
 It might even be a reasonable internal representation for Python, which
as I understand it currently can't represent codepoints outside the BMP.
 Also, it is used in Java serialization, which I think of as a somewhat
weird and whacky thing, but it's conceivable that somebody someday might
want to write a Python program that speaks the Java serialization
protocol (I don't have a good sense of whether that's feasible).

Writing an application specific codec with the C API is doable in
principle, but it seems like an awful lot of effort for just one quickie
program.  These indexes are very large and so writing the codec in
Python would probably be painfully slow.
History
Date User Action Args
2008-05-15 14:39:46phrsetspambayes_score: 0.0411054 -> 0.041105404
recipients: + phr, lemburg
2008-05-15 14:39:45phrsetspambayes_score: 0.0411054 -> 0.0411054
messageid: <1210862385.72.0.789210868978.issue2857@psf.upfronthosting.co.za>
2008-05-15 14:39:30phrlinkissue2857 messages
2008-05-15 14:39:29phrcreate