Issue923643
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2004-03-26 03:17 by trevp, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
base256.diff | trevp, 2004-09-15 09:27 |
Messages (19) | |||
---|---|---|---|
msg45671 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-03-26 03:17 | |
Sometimes you want to turn a long into a byte-string, or vice-versa. This is useful in cryptographic protocols, and probably other network protocols where you need to exchange large integers. In 2.4, you can handle unsigned longs with this: def stringToLong(s): return long(binascii.hexlify(s), 16) def longToString(n): return binascii.unhexlify("%x" % n) However, these functions are slower than they need to be, they're kinda kludgey, and they don't handle negative values. So here's a proposal: def stringToLong(s): return long(s, 256) def longToString(n): return n.tostring() These functions operate on big-endian, 2's-complement byte-strings. If the value is positive but has its most- significant-bit set, an extra zero-byte will be prepended. This is the same way OpenSSL and (I think) GMP handle signed numbers. These functions are ~5x faster than the earlier ones, they're cleaner, and they work with negative numbers. If you only want to deal with unsigned positive numbers, you'll have to do some adjustments: def stringToLong(s): return long('\0'+s, 256) def longToString(n): s = n.tostring() if s[0] == '\0' and s != '\0': s = s[1:] return s That's not ideal, but it seems better than any interface change I could think of. Anyways, the patch adds this to longs. It should be added to ints too, and I guess it needs tests etc.. I can help with that, if the basic idea is acceptable. Trevor |
|||
msg45672 - (view) | Author: paul rubin (phr) | Date: 2004-03-26 03:53 | |
Logged In: YES user_id=72053 I think those funcs should take an optional extra arg to say you want unsigned. That's cleaner than prepending '0'. In cryptography you usually do want unsigned. |
|||
msg45673 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-03-27 06:45 | |
Logged In: YES user_id=973611 You're right, we should support unsigned strings somehow. Adding another argument to the int() and long() constructors would be messy, though. How about: n = long(s, 256) #unsigned n = long(s, -256) #signed n.tounsignedstring() n.tosignedstring() The "-256" thing is a hack, I admit.. but it kinda grows on you, if you stare it at awhile :-)... |
|||
msg45674 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-03-27 06:45 | |
Logged In: YES user_id=973611 You're right, we should support unsigned strings somehow. Adding another argument to the int() and long() constructors would be messy, though. How about: n = long(s, 256) #unsigned n = long(s, -256) #signed n.tounsignedstring() n.tosignedstring() The "-256" thing is a hack, I admit.. but it kinda grows on you, if you stare it at awhile :-)... |
|||
msg45675 - (view) | Author: paul rubin (phr) | Date: 2004-03-27 06:57 | |
Logged In: YES user_id=72053 How about just punting signed conversion. I don't think two's complement makes much sense for arbitrary precision longs. Have some separate representation for negative longs if needed. If you call hex() on a large negative number, you get a hex string with a leading minus sign. For base 256, you can't reserve a char like that, so I guess you have to just throw an error if someone tries to convert a negative long to a string. If you want a representation for signed longs, ASN1 DER is probably an ok choice. I agree with Guido that the binascii module is a good place to put such a function. Twos complement can work if you specify a fixed precision, but that sure complicates what this started out as. |
|||
msg45676 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-03-27 07:51 | |
Logged In: YES user_id=973611 I think 2's complement makes good sense for arbitrary precision longs. This is how OpenSSL and GMP handle them. It's also how the ASN.1 BER/DER encodings handle integers: these encodings just prepend tag and length fields to the big- endian 2's complement value. I.e.: If you want to extract RSA public values from an X.509 certificate, they'll be in 2's complement (well, they'll always be positive... but they'll have an extra zero byte if necessary). Since the functionality for 2's complement is already in the C code it's easy to expose through a patch. So I'm still in favor of presenting it. |
|||
msg45677 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-03-29 00:54 | |
Logged In: YES user_id=973611 My last comment was wrong: GMP's raw input/output format uses big-endian positive values, with the sign bit stored separately. |
|||
msg45678 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-03-29 00:55 | |
Logged In: YES user_id=973611 My last comment was wrong: GMP's raw input/output format uses big-endian positive values, with the sign bit stored separately. |
|||
msg45679 - (view) | Author: Josiah Carlson (josiahcarlson) * | Date: 2004-03-30 07:10 | |
Logged In: YES user_id=341410 I'm curious to know if anyone would object to optional minimum or maximum or both arguments, or even some additional methods that would result in a potentially constrained string output from long.tostring()? If I were to split the functionality into three methods, they would be as follows... def atleast(long, atl): if atl < 0: raise TypeError("atleast requires a positive integer for a minimum length") a = long.tostring() la = len(a) return (atl-la)*'\o' + a def atmost(long, atm): if atm < 0: raise TypeError("atleast requires a positive integer for a minimum length") a = long.tostring() la = len(a) return a[:atm] def constrained(long, atl, atm): if atm < atl: raise TypeError("constrained requires that the maximum length be larger than the minimum length") if atl < 0 or atm < 0: raise TypeError("constrained requires that both arguments are positive") a = long.tostring() la = len(a) return ((atl-la)*'\o' + a)[:atm] I personally would find use for the above, would anyone else have use for it? |
|||
msg45680 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-09-15 09:27 | |
Logged In: YES user_id=973611 Uploading a new patch (base256.diff). This implements only the string-> long (or int) conversion. It adds support for radix 256 (unsigned) or -256 (2's-complement signed) to the int() and long() built-ins: int("\xFF\xFF\xFF", 256) -> 0xFFFFFF int("\xFF\xFF\xFF", -256) -> -1 long(os.urandom(128), 256) -> 1024-bit integer I left out the long -> string conversion. If python adds a bytes() type, then that conversion could be done as bytes(long). This patch has docs and tests. |
|||
msg45681 - (view) | Author: Josiah Carlson (josiahcarlson) * | Date: 2004-09-15 16:49 | |
Logged In: YES user_id=341410 As an aside, I've requested similar functionality be added to the struct module, which handles signed, unsigned, big-endian, little-endian, and integers of arbitrary length. The request is available here: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470 Would you prefer two functions that live as methods of ints and longs, or would you prefer the functionality be placed inside of the struct module (which already does numeric packing and upacking), and had its own type code? |
|||
msg45682 - (view) | Author: Trevor Perrin (trevp) | Date: 2004-09-16 06:56 | |
Logged In: YES user_id=973611 Thanks for the pointer. I'm actually not smitten with any of the approaches for long<->byte-string conversion (mine included). Now that I think about it, the best interface might be if Python had a 'bytes' type, then both conversions could be done by constructors: long(bytes) bytes(long) If Python is going to grow such a type in the future, maybe we can defer this question, and live with the hexlifying and whatnot, till then. |
|||
msg45683 - (view) | Author: paul rubin (phr) | Date: 2004-09-16 07:53 | |
Logged In: YES user_id=72053 There shouldn't be a bytes type. Those are just strings. Instead I think it's better to add a conversion function to the strings module or the binascii module. The hex thing is a crock and should be exterminated. |
|||
msg45684 - (view) | Author: Josiah Carlson (josiahcarlson) * | Date: 2004-09-16 19:23 | |
Logged In: YES user_id=341410 Paul, it seems as though there exists a Python function in pickle that does conversions: encode_long() and decode_long(), though depending on the size of your longs, a custom version may be faster (I wrote one that doesn't use hex, and manages to be 20% faster on integers less than 32 bits). It also appears that Raymond is going to be placing such a pair of functions in binascii. I'm curious as to whether you prefer a variable-width return as provided by pickle.encode_long, or a fixed-width return as provided by the struct method I propose. |
|||
msg59827 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2008-01-12 17:07 | |
Here is another integer related optimization patch from Trev. I've no opinion on the matter but you might be interested in it as well. |
|||
msg59868 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2008-01-13 22:32 | |
Unless I'm mistaken, the patch provides only half of the solution: the stringToLong part, but not longToString. I agree this feature is interesting, not for optimization but becomes it avoids using clunky ways (long -> hex -> bin) to implement something simple. However, perhaps making it part of either the binascii or the struct module would allow more formatting choices (e.g. big-endian or little-endian, fixed-length or not). For example in cryptography you would want a byte string of a fixed size even if your 512-bit number happens to have its 8 most significant bits set to zero. |
|||
msg60134 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2008-01-19 03:25 | |
It seems to me that this issue is almost entirely subsumed by issue #1023290. I'm quite tempted to close this and direct further discussion there---personally, I'd support Josiah's proposed struct addition. Paul: if you're still listening after all this time, is it true that between them, Josiah's proposal and pickle.{encode_long, decode_long} would cover your needs? |
|||
msg61809 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2008-01-29 01:19 | |
Closing this due to little-or-no activity for over three years. Antoine's use case seems to me to be covered by issue #1023290. Trevor, please speak up if you want to keep this alive. |
|||
msg62878 - (view) | Author: Mark Dickinson (mark.dickinson) * | Date: 2008-02-24 05:41 | |
pending -> closed. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:03 | admin | set | github: 40083 |
2008-02-24 05:41:33 | mark.dickinson | set | status: pending -> closed messages: + msg62878 |
2008-01-29 01:19:07 | mark.dickinson | set | status: open -> pending resolution: rejected superseder: Conversion of longs to bytes and vice-versa. messages: + msg61809 |
2008-01-19 03:25:19 | mark.dickinson | set | messages: + msg60134 |
2008-01-13 22:32:29 | pitrou | set | nosy:
+ pitrou messages: + msg59868 |
2008-01-12 17:07:12 | christian.heimes | set | assignee: mark.dickinson versions: + Python 2.6, - Python 2.4 messages: + msg59827 nosy: + christian.heimes, mark.dickinson |
2004-03-26 03:17:02 | trevp | create |