Author mark.dickinson
Recipients Francois Schneider, mark.dickinson
Date 2018-06-07.06:58:40
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1528354720.83.0.592728768989.issue33784@psf.upfronthosting.co.za>
In-reply-to
Content
This shouldn't be a problem: there's no rule that says that different objects should have different hashes. Indeed, with a countable infinity of possible different hashable inputs, a deterministic hashing algorithm, and only finitely many outputs, such a rule would be a mathematical impossibility. For example:

>>> hash(-1) == hash(-2)
True

Are these hash collisions causing real issues in your code? While a single hash collision like this shouldn't be an issue, if there are many collisions within a single (non-artificial) dataset, that _can_ lead to performance issues.

Looking at the code, we could probably do a better job of making the hash collisions less predictable. The current code looks like:

    def __hash__(self):
        return hash(int(self.network_address) ^ int(self.netmask))

I'd propose hashing a tuple instead of using the xor. For example:

    def __hash__(self):
        return hash((int(self.network_address), int(self.netmask)))

Hash collisions would almost certainly still occur with this scheme, but they'd be a tiny bit less obvious and harder to find.
History
Date User Action Args
2018-06-07 06:58:40mark.dickinsonsetrecipients: + mark.dickinson, Francois Schneider
2018-06-07 06:58:40mark.dickinsonsetmessageid: <1528354720.83.0.592728768989.issue33784@psf.upfronthosting.co.za>
2018-06-07 06:58:40mark.dickinsonlinkissue33784 messages
2018-06-07 06:58:40mark.dickinsoncreate