This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author edulix
Recipients christian.heimes, edulix, lars.gustaebel, loewis, serhiy.storchaka
Date 2014-04-13.21:00:16
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1397422817.01.0.0536325908073.issue18321@psf.upfronthosting.co.za>
In-reply-to
Content
>> [...] but remember, we split a volume only in the middle of a big file, not in any other case (AFAIK). Hopefully you don't get huge pax headers or anything strange. [...] 
> Hopefully? Sorry, but have you tested this? I did. I let GNU tar create a two volume archive that is split exactly between the two blocks of an XHDTYPE pax header.
>
> The result is terrifying. At the beginning of the second volume GNU tar creates an XGLTYPE header as the pax replacement for a GNUTYPE_MULTIVOL header, followed by an XHDTYPE header ("GNUFileParts") that somehow decorates the following REGTYPE(!) tar header that contains the continuation of the split XHDTYPE header data from the previous volume. After that comes the REGTYPE file that the split XHDTYPE header was actually meant for as decoration.
>
> I attached the archive to this issue.
>
> What happens if a GNUTYPE_LONGNAME header is split in two? I don't wanna know...
>
>> write() will need to take into account blocks (BLOCKSIZE), just to be able to split the volumes correctly.
>
> It is mandatory to do the split on a block boundary (a multiple of 512).
>>> BTW, my version of GNU tar refuses to create compressed multiple-volume archives which is why I doubt the usefulness of this feature overall.
>> But it has multivolume support right? Which is what I am proposing here. Also, you can gzip (or encrypt or anything) the volumes after creating the volumes..
>
> Yeah, it has multivolume support, but a very limited one that is not only weird but isn't even usable together with compression. And sure, I can compress and encrypt the volumes afterward, but I can also create a compressed archive and pipe it through split(1) to split it into parts. Both ways create tar archives that are not readable by GNU tar because they're non-standard. So what?
>
> Please tell me, what is your actual personal use-case for this feature?

I'm willing modify the patch to remove the "weirdness" you refer to. I differ on that it's not usable: it might not be useful to you, but it's certainly a feature that covers part of the functionality of GNU tar. Actually, some of the unit tests are like this: use GNU Tar to compress, then extract with tarfile - and viceversa.

Of course you can use split. And I could use Ruby or Perl, but I'm using python and tarfile, and this is a GNU tar feature that is just not supported in python tarfile upstream, and I'm just trying to contribute this feature, if possible :-).

BTW, If I create a multivol tar file and then compress the volumes, that does not make it "non-standard", in the same way that if I create a PNG file and then compress it and then store it in EXTFS, it doesn't make it non-standard. I'm just using multiple layers of standards.

I'm a contractor, and I have been asked by a client to develop a python-based backup tool. The client is technical and had already an idea of what he wanted to do: use python-tarfile and add support to multivolume and some other goodies, and the client also wanted to try to push the changes upstream as we believe it is the correct thing to do.

BTW, when we designed the backup tool, we ruled out the possibility of using split because split wouldn't allow to correctly list all the files in each file-slice separately. We wanted to be able to recover all the files of each "volume" so that if we lose other volumes, we can still recover all the data from the volumes we have. 

Anyway, if you are the maintainer of tarfile and you think it's not possible to push tar-multivolume support upstream in python tarfile for whatever reason, please tell me.
History
Date User Action Args
2014-04-13 21:00:17edulixsetrecipients: + edulix, loewis, lars.gustaebel, christian.heimes, serhiy.storchaka
2014-04-13 21:00:17edulixsetmessageid: <1397422817.01.0.0536325908073.issue18321@psf.upfronthosting.co.za>
2014-04-13 21:00:16edulixlinkissue18321 messages
2014-04-13 21:00:16edulixcreate