classification
Title: long <-> byte-string conversion
Type: Stage:
Components: Interpreter Core Versions: Python 2.6
process
Status: closed Resolution: rejected
Dependencies: Superseder: Conversion of longs to bytes and vice-versa.
View: 1023290
Assigned To: mark.dickinson Nosy List: christian.heimes, josiahcarlson, mark.dickinson, phr, pitrou, trevp
Priority: normal Keywords: patch

Created on 2004-03-26 03:17 by trevp, last changed 2008-02-24 05:41 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
base256.diff trevp, 2004-09-15 09:27
Messages (19)
msg45671 - (view) Author: Trevor Perrin (trevp) Date: 2004-03-26 03:17
Sometimes you want to turn a long into a byte-string, 
or vice-versa.  This is useful in cryptographic protocols, 
and probably other network protocols where you need 
to exchange large integers.

In 2.4, you can handle unsigned longs with this:

def stringToLong(s):
    return long(binascii.hexlify(s), 16)

def longToString(n):
    return binascii.unhexlify("%x" % n)

However, these functions are slower than they need to 
be, they're kinda kludgey, and they don't handle 
negative values.

So here's a proposal:

def stringToLong(s):
    return long(s, 256)

def longToString(n):
    return n.tostring()

These functions operate on big-endian, 2's-complement 
byte-strings.  If the value is positive but has its most-
significant-bit set, an extra zero-byte will be 
prepended.  This is the same way OpenSSL and (I think) 
GMP handle signed numbers.

These functions are ~5x faster than the earlier ones, 
they're cleaner, and they work with negative numbers.  

If you only want to deal with unsigned positive numbers, 
you'll have to do some adjustments:

def stringToLong(s):
    return long('\0'+s, 256)

def longToString(n):
    s = n.tostring()
    if s[0] == '\0' and s != '\0':
        s = s[1:]
    return s

That's not ideal, but it seems better than any interface 
change I could think of.

Anyways, the patch adds this to longs.  It should be 
added to ints too, and I guess it needs tests etc..  I 
can help with that, if the basic idea is acceptable.

Trevor
msg45672 - (view) Author: paul rubin (phr) Date: 2004-03-26 03:53
Logged In: YES 
user_id=72053

I think those funcs should take an optional extra arg to say
you want unsigned.  That's cleaner than prepending '0'.  In
cryptography you usually do want unsigned.
msg45673 - (view) Author: Trevor Perrin (trevp) Date: 2004-03-27 06:45
Logged In: YES 
user_id=973611

You're right, we should support unsigned strings somehow.  
Adding another argument to the int() and long() constructors 
would be messy, though.  How about:

n = long(s, 256) #unsigned
n = long(s, -256) #signed

n.tounsignedstring()
n.tosignedstring()

The "-256" thing is a hack, I admit..  but it kinda grows on 
you, if you stare it at awhile :-)...




msg45674 - (view) Author: Trevor Perrin (trevp) Date: 2004-03-27 06:45
Logged In: YES 
user_id=973611

You're right, we should support unsigned strings somehow.  
Adding another argument to the int() and long() constructors 
would be messy, though.  How about:

n = long(s, 256) #unsigned
n = long(s, -256) #signed

n.tounsignedstring()
n.tosignedstring()

The "-256" thing is a hack, I admit..  but it kinda grows on 
you, if you stare it at awhile :-)...




msg45675 - (view) Author: paul rubin (phr) Date: 2004-03-27 06:57
Logged In: YES 
user_id=72053

How about just punting signed conversion.  I don't think
two's complement makes much sense for arbitrary precision
longs.  Have some separate representation for negative longs
if needed.  If you call hex() on a large negative number,
you get a hex string with a leading minus sign.  For base
256, you can't reserve a char like that, so I guess you have
to just throw an error if someone tries to convert a
negative long to a string.  If you want a representation for
signed longs, ASN1 DER is probably an ok choice.  I agree
with Guido that the binascii module is a good place to put
such a function.  Twos complement can work if you specify a
fixed precision, but that sure complicates what this started
out as.
msg45676 - (view) Author: Trevor Perrin (trevp) Date: 2004-03-27 07:51
Logged In: YES 
user_id=973611

I think 2's complement makes good sense for arbitrary 
precision longs.  This is how OpenSSL and GMP handle them.  
It's also how the ASN.1 BER/DER encodings handle integers: 
these encodings just prepend tag and length fields to the big-
endian 2's complement value.  I.e.: If you want to extract 
RSA public values from an X.509 certificate, they'll be in 2's 
complement (well, they'll always be positive... but they'll 
have an extra zero byte if necessary).

Since the functionality for 2's complement is already in the C 
code it's easy to expose through a patch.  So I'm still in favor 
of presenting it.
msg45677 - (view) Author: Trevor Perrin (trevp) Date: 2004-03-29 00:54
Logged In: YES 
user_id=973611

My last comment was wrong: GMP's raw input/output format 
uses big-endian positive values, with the sign bit stored 
separately.
msg45678 - (view) Author: Trevor Perrin (trevp) Date: 2004-03-29 00:55
Logged In: YES 
user_id=973611

My last comment was wrong: GMP's raw input/output format 
uses big-endian positive values, with the sign bit stored 
separately.
msg45679 - (view) Author: Josiah Carlson (josiahcarlson) * Date: 2004-03-30 07:10
Logged In: YES 
user_id=341410

I'm curious to know if anyone would object to optional
minimum or maximum or both arguments, or even some
additional methods that would result in a potentially
constrained string output from long.tostring()?

If I were to split the functionality into three methods,
they would be as follows...

def atleast(long, atl):
    if atl < 0:
        raise TypeError("atleast requires a positive integer
for a minimum length")
    a = long.tostring()
    la = len(a)
    return (atl-la)*'\o' + a

def atmost(long, atm):
    if atm < 0:
        raise TypeError("atleast requires a positive integer
for a minimum length")
    a = long.tostring()
    la = len(a)
    return a[:atm]

def constrained(long, atl, atm):
    if atm < atl:
        raise TypeError("constrained requires that the
maximum length be larger than the minimum length")
    if atl < 0 or atm < 0:
        raise TypeError("constrained requires that both
arguments are positive")
    a = long.tostring()
    la = len(a)
    return ((atl-la)*'\o' + a)[:atm]


I personally would find use for the above, would anyone else
have use for it?
msg45680 - (view) Author: Trevor Perrin (trevp) Date: 2004-09-15 09:27
Logged In: YES 
user_id=973611


Uploading a new patch (base256.diff).  This implements only
the string-> long (or int) conversion.  It adds support for
radix 256 (unsigned) or -256 (2's-complement signed) to the
int() and long() built-ins:
  int("\xFF\xFF\xFF", 256) -> 0xFFFFFF
  int("\xFF\xFF\xFF", -256) -> -1
  long(os.urandom(128), 256) -> 1024-bit integer

I left out the long -> string conversion.  If python adds a
bytes() type, then that conversion could be done as
bytes(long).  This patch has docs and tests.
msg45681 - (view) Author: Josiah Carlson (josiahcarlson) * Date: 2004-09-15 16:49
Logged In: YES 
user_id=341410

As an aside, I've requested similar functionality be added
to the struct module, which handles signed, unsigned,
big-endian, little-endian, and integers of arbitrary length.
 The request is available here:
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1023290&group_id=5470

Would you prefer two functions that live as methods of ints
and longs, or would you prefer the functionality be placed
inside of the struct module (which already does numeric
packing and upacking), and had its own type code?
msg45682 - (view) Author: Trevor Perrin (trevp) Date: 2004-09-16 06:56
Logged In: YES 
user_id=973611


Thanks for the pointer.  I'm actually not smitten with any
of the approaches for long<->byte-string conversion (mine
included).  Now that I think about it, the best interface
might be if Python had a 'bytes' type, then both conversions
could be done by constructors:
  long(bytes)
  bytes(long)

If Python is going to grow such a type in the future, maybe
we can defer this question, and live with the hexlifying and
whatnot, till then.
msg45683 - (view) Author: paul rubin (phr) Date: 2004-09-16 07:53
Logged In: YES 
user_id=72053

There shouldn't be a bytes type.  Those are just strings. 
Instead I think it's better to add a conversion function to
the strings module or the binascii module.  The hex thing is
a crock and should be exterminated.
msg45684 - (view) Author: Josiah Carlson (josiahcarlson) * Date: 2004-09-16 19:23
Logged In: YES 
user_id=341410

Paul, it seems as though there exists a Python function in
pickle that does conversions: encode_long() and
decode_long(), though depending on the size of your longs, a
custom version may be faster (I wrote one that doesn't use
hex, and manages to be 20% faster on integers less than 32
bits).

It also appears that Raymond is going to be placing such a
pair of functions in binascii.

I'm curious as to whether you prefer a variable-width return
as provided by pickle.encode_long, or a fixed-width return
as provided by the struct method I propose.
msg59827 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2008-01-12 17:07
Here is another integer related optimization patch from Trev. I've no
opinion on the matter but you might be interested in it as well.
msg59868 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-01-13 22:32
Unless I'm mistaken, the patch provides only half of the solution: the
stringToLong part, but not longToString.
I agree this feature is interesting, not for optimization but becomes it
avoids using clunky ways (long -> hex -> bin) to implement something simple.
However, perhaps making it part of either the binascii or the struct
module would allow more formatting choices (e.g. big-endian or
little-endian, fixed-length or not). For example in cryptography you
would want a byte string of a fixed size even if your 512-bit number
happens to have its 8 most significant bits set to zero.
msg60134 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-01-19 03:25
It seems to me that this issue is almost entirely subsumed by issue #1023290.  I'm quite tempted 
to close this and direct further discussion there---personally, I'd support Josiah's proposed 
struct addition.

Paul:  if you're still listening after all this time, is it true that between them,  Josiah's 
proposal and pickle.{encode_long, decode_long} would cover your needs?
msg61809 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-01-29 01:19
Closing this due to little-or-no activity for over three years.  Antoine's 
use case seems to me to be covered by issue #1023290.

Trevor, please speak up if you want to keep this alive.
msg62878 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2008-02-24 05:41
pending -> closed.
History
Date User Action Args
2008-02-24 05:41:33mark.dickinsonsetstatus: pending -> closed
messages: + msg62878
2008-01-29 01:19:07mark.dickinsonsetstatus: open -> pending
resolution: rejected
superseder: Conversion of longs to bytes and vice-versa.
messages: + msg61809
2008-01-19 03:25:19mark.dickinsonsetmessages: + msg60134
2008-01-13 22:32:29pitrousetnosy: + pitrou
messages: + msg59868
2008-01-12 17:07:12christian.heimessetassignee: mark.dickinson
versions: + Python 2.6, - Python 2.4
messages: + msg59827
nosy: + christian.heimes, mark.dickinson
2004-03-26 03:17:02trevpcreate