This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author neologix
Recipients Eric.Wolf, neologix, niemeyer, wrobell
Date 2011-03-01.22:11:25
SpamBayes Score 2.78065e-09
Marked as misclassified No
Message-id <>
In-reply-to <>
> Stupid questions are always worth asking. I did check the MD5 sum earlier
> and just checked it again (since I copied the file from one machine to
> another):
> ebwolf@ubuntu:/opt$ md5sum /host/full-planet-110115-1800.osm.bz2
> 0e3f81ef0dd415d8f90f1378666a400c  /host/full-planet-110115-1800.osm.bz2
> ebwolf@ubuntu:/opt$ cat full-planet-110115-1800.osm.bz2.md5
> 0e3f81ef0dd415d8f90f1378666a400c  full-planet-110115-1800.osm.bz2

Well, that only proves that the file wasn't corrupted during the download.
But this doesn't prove that the file on the remote server isn't
corrupt (see for example the link I gave you, the guy used rsync and
had a correct checksum but was still unable to extract the file).

> There you have it. I was able to convert the bz2 to gzip with no errors:
> bzcat full-planet-110115-1800.osm.bz2 | gzip > full-planet.osm.gz

How big is full-planet.osm.gz ?
Since bzip2 uses bzlib, and can very well return after having
uncompressed only half the file.
A more interesting test would be
$ bzip2 -cd full-planet-110115-1800.osm.bz2 | bzip2 -c >
$ md5sum full-planet.*.bz2

> FYI: This problem came up last year with no resolution:

Yeah, and it was also on an OSM file.
Now, I know that OSM are probably one of the biggest providers of huge
archives, but it's surprising that everytime there's a problem with
bz2, it's with an OSM file, no ?

Look at what I just found, a message from an OSM admin dating from later 2010:

On 26 October 2010 13:47, Anthony <osm <at>> wrote:
> a <at> A-PC:/media/usbdrive$ cat full-planet-101022.osm.bz2.md5
> 0a90fec8ce66bdd82984c2ee8c6bb6ac  full-planet-101022.osm.bz2
> a <at> A-PC:/media/usbdrive$ md5sum full-planet-101022.osm.bz2
> c652430b00668c30bb04816ff16cbfbe  full-planet-101022.osm.bz2
> Just me?

We had problems with the network card in that machine last night
causing some corruption, try
rsync:// the file
into a good state.

Although best to wait a few hours, currently packet loss issues on
server's upstream network.


> In general, is it best to always read the same number of bytes?

In that case, it doesn't matter.

> And what is the best value to pass for buffering in BZ2File? I just made up
> something hoping it would work.

The default one ;-) (don't provide any)

> Colin was using an OSM planet file from some time last year and it quit at exactly 900000 bytes.

OSM again :-)
900.000 is exacty the default bz2 block size...
Date User Action Args
2011-03-01 22:11:33neologixsetrecipients: + neologix, niemeyer, wrobell, Eric.Wolf
2011-03-01 22:11:26neologixlinkissue10900 messages
2011-03-01 22:11:25neologixcreate