This utility is a very powerful one and lets you copy any leaf, group or complete subtree into another file. During the copy process you are allowed to change the filter properties if you want so. Also, in the case of duplicated pathnames, you can decide if you want to overwrite already existing nodes on the destination file. Generally speaking, ptrepack can be useful in may situations, like replicating a subtree in another file, change the filters in objects and see how affect this to the compression degree or I/O performance, consolidating specific data in repositories or even importing generic HDF5 files and create true PyTables counterparts.
For instructions on how to use it, just pass the -h flag to the command:
$ ptrepack -hto see the message usage:
usage: ptrepack [-h] [-v] [-o] [-R start,stop,step] [--non-recursive] [--dest-title=title] [--dont-copyuser-attrs] [--overwrite-nodes] [--complevel=(0-9)] [--complib=lib] [--shuffle=(0|1)] [--fletcher32=(0|1)] [--keep-source-filters] sourcefile:sourcegroup destfile:destgroup -h -- Print usage message. -v -- Show more information. -o -- Overwite destination file. -R RANGE -- Select a RANGE of rows (in the form "start,stop,step") during the copy of *all* the leaves. --non-recursive -- Do not do a recursive copy. Default is to do it. --dest-title=title -- Title for the new file (if not specified, the source is copied). --dont-copy-userattrs -- Do not copy the user attrs (default is to do it) --overwrite-nodes -- Overwrite destination nodes if they exist. Default is to not overwrite them. --complevel=(0-9) -- Set a compression level (0 for no compression, which is the default). --complib=lib -- Set the compression library to be used during the copy. lib can be set to "zlib", "lzo", "ucl" or "bzip2". Defaults to "zlib". --shuffle=(0|1) -- Activate or not the shuffling filter (default is active if complevel>0). --fletcher32=(0|1) -- Whether to activate or not the fletcher32 filter (not active by default). --keep-source-filters -- Use the original filters in source files. The default is not doing that if any of --complevel, --complib, --shuffle or --fletcher32 option is specified.
Imagine that we have ended the tutorial 1 (see the output of examples/tutorial1-1.py), and we want to copy our reduced data (i.e. those datasets that hangs from the /column group) to another file. First, let's remember the content of the examples/tutorial1.h5:
$ ptdump tutorial1.h5 Filename: 'tutorial1.h5' Title: 'Test file' , Last modif.: 'Fri Feb 6 19:33:28 2004' , rootUEP='/', filters=Filters(), Format version: 1.2 / (Group) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10L,)) 'Readout example'Now, copy the /columns to other non-existing file. That's easy:
$ ptrepack tutorial1.h5:/columns reduced.h5That's all. Let's see the contents of the newly created reduced.h5 file:
$ ptdump reduced.h5 Filename: 'reduced.h5' Title: '' , Last modif.: 'Fri Feb 20 15:26:47 2004' , rootUEP='/', filters=Filters(), Format version: 1.2 / (Group) '' /name (Array(3,)) 'Name column selection' /pressure (Array(3,)) 'Pressure column selection'so, you have copied the children of /columns group into the root of the reduced.h5 file.
Now, you suddenly realized that what you intended to do was to copy all the hierarchy, the group /columns itself included. You can do that by just specifying the destination group:
$ ptrepack tutorial1.h5:/columns reduced.h5:/columns ptdump reduced.h5 Filename: 'reduced.h5' Title: '' , Last modif.: 'Fri Feb 20 15:39:15 2004' , rootUEP='/', filters=Filters(), Format version: 1.2 / (Group) '' /name (Array(3,)) 'Name column selection' /pressure (Array(3,)) 'Pressure column selection' /columns (Group) '' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'OK. Much better. But you want to get rid of the existing nodes on the new file. You can achieve this by adding the -o flag:
$ ptrepack -o tutorial1.h5:/columns reduced.h5:/columns $ ptdump reduced.h5 Filename: 'reduced.h5' Title: '' , Last modif.: 'Fri Feb 20 15:41:57 2004' , rootUEP='/', filters=Filters(), Format version: 1.2 / (Group) '' /columns (Group) '' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'where you can see how the old contents of the reduced.h5 file has been overwritten.
You can copy just one single node in the repacking operation and change its name in destination:
$ ptrepack tutorial1.h5:/detector/readout reduced.h5:/rawdata $ ptdump reduced.h5 Filename: 'reduced.h5' Title: '' , Last modif.: 'Fri Feb 20 15:52:22 2004', rootUEP='/', filters=Filters(), Format version: 1.2 / (Group) '' /rawdata (Table(10L,)) 'Readout example' /columns (Group) '' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'where the /detector/readout has been copied to /rawdata in destination.
We can change the filter properties as well:
$ ptrepack --complevel=1 tutorial1.h5:/detector/readout reduced.h5:/rawdata Problems doing the copy from 'tutorial1.h5:/detector/readout' to 'reduced.h5:/rawdata' The error was --> exceptions.ValueError: The destination (/rawdata (Table(10L,)) 'Readout example') already exists. Assert the overwrite parameter if you really want to overwrite it. The destination file looks like: Filename: 'reduced.h5' Title: ''; Last modif.: 'Fri Feb 20 15:52:22 2004'; rootUEP='/'; filters=Filters(), Format version: 1.2 / (Group) '' /rawdata (Table(10L,)) 'Readout example' /columns (Group) '' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' Traceback (most recent call last): File "../utils/ptrepack", line 358, in ? start=start, stop=stop, step=step) File "../utils/ptrepack", line 111, in copyLeaf raise RuntimeError, "Please, check that the node names are not duplicated in destination, and if so, add the --overwrite-nodes flag if desired." RuntimeError: Please, check that the node names are not duplicated in destination, and if so, add the --overwrite-nodes flag if desired.ooops!. We ran into problems: we forgot that /rawdata pathname already existed in destination file. Let's add the --overwrite-nodes, as the verbose error suggested:
$ ptrepack --overwrite-nodes --complevel=1 tutorial1.h5:/detector/readout reduced.h5:/rawdata $ ptdump reduced.h5 Filename: 'reduced.h5' Title: ''; Last modif.: 'Fri Feb 20 16:02:20 2004'; rootUEP='/'; filters=Filters(), Format version: 1.2 / (Group) '' /rawdata (Table(10L,), shuffle, zlib(1)) 'Readout example' /columns (Group) '' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'you can check how the filter properties has been changed for the /rawdata table. Check as the other nodes still exists.
Finally, let's copy a slice of the readout table in origin to destination, under a new group called /slices and with the name, for example, aslice:
$ ptrepack -R1,8,3 tutorial1.h5:/detector/readout reduced.h5:/slices/aslice $ ptdump reduced.h5 Filename: 'reduced.h5' Title: ''; Last modif.: 'Fri Feb 20 16:17:13 2004'; rootUEP='/'; filters=Filters(); Format version: 1.2 / (Group) '' /rawdata (Table(10L,), shuffle, zlib(1)) 'Readout example' /columns (Group) '' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /slices (Group) '' /slices/aslice (Table(3L,)) 'Readout example'note how only 3 rows of the original readout table has been copied to the new aslice destination. Note as well how the previously inexistent slices group has been created in the same operation.