PyTables comes with a couple of utilities that make the life easier to the user. One is called ptdump and lets you see the contents of a PyTables file (or generic HDF5 file, if supported). The other one is named ptrepack that allows to (recursively) copy sub-hierarchies of objects present in a file into another one, changing, if desired, some of the filters applied to the leaves during the copy process.
Normally, these utilities will be installed somewhere in your PATH during the process of installation of the PyTables package, so that you can invoke them from any place in your file system after the installation has successfully finished.
As has been said before, ptdump utility allows you look into the contents of your PyTables files. It lets you see not only the data but also the metadata (that is, the structure and additional information in the form of attributes).
For instructions on how to use it, just pass the -h flag to the command:
$ ptdump -hto see the message usage:
usage: ptdump [-R start,stop,step] [-a] [-h] [-d] [-v] file[:nodepath] -R RANGE -- Select a RANGE of rows in the form "start,stop,step" -a -- Show attributes in nodes (only useful when -v or -d are active) -c -- Show info of columns in tables (only useful when -v or -d are active) -i -- Show info of indexed columns (only useful when -v or -d are active) -d -- Dump data information on leaves -h -- Print help on usage -v -- Dump more meta-information on nodes
Let's suppose that we want to know only the structure of a file. In order to do that, just don't pass any flag, just the file as parameter:
$ ptdump vlarray1.h5 Filename: 'vlarray1.h5' Title: '' , Last modif.: 'Fri Feb 6 19:33:28 2004' , rootUEP='/', filters=Filters(), Format version: 1.2 / (Group) '' /vlarray1 (VLArray(4,), shuffle, zlib(1)) 'ragged array of ints'we can see that the file contains a just a leaf object called vlarray1, that is an instance of VLArray, has 4 rows, and two filters has been used in order to create it: shuffle and zlib (with a compression level of 1).
Let's say we want more meta-information. Just add the -v (verbose) flag:
$ ptdump -v vlarray1.h5 / (Group) '' children := ['vlarray1' (VLArray)] /vlarray1 (VLArray(4,), shuffle, zlib(1)) 'ragged array of ints' atom = Atom(type=Int32, shape=1, flavor='Numeric') nrows = 4 flavor = 'Numeric' byteorder = 'little'so we can see more info about the atoms that are the components of the vlarray1 dataset, i.e. they are scalars of type Int32 and with Numeric flavor.
If we want information about the attributes on the nodes, we must add the -a flag:
$ ptdump -va vlarray1.h5 / (Group) '' children := ['vlarray1' (VLArray)] /._v_attrs (AttributeSet), 5 attributes: [CLASS := 'GROUP', FILTERS := None, PYTABLES_FORMAT_VERSION := '1.2', TITLE := '', VERSION := '1.0'] /vlarray1 (VLArray(4,), shuffle, zlib(1)) 'ragged array of ints' atom = Atom(type=Int32, shape=1, flavor='Numeric') nrows = 4 flavor = 'Numeric' byteorder = 'little' /vlarray1.attrs (AttributeSet), 4 attributes: [CLASS := 'VLARRAY', FLAVOR := 'Numeric', TITLE := 'ragged array of ints', VERSION := '1.0']
Let's have a look at the real data:
$ ptdump -d vlarray1.h5 / (Group) '' /vlarray1 (VLArray(4,), shuffle, zlib(1)) 'ragged array of ints' Data dump: [array([5, 6]), array([5, 6, 7]), array([5, 6, 9, 8]), array([ 5, 6, 9, 10, 12])]we see here a data dump of the 4 rows in vlarray1 object, in the form of a list. Because the object is a VLA, we see a different number of integers on each row.
Say that we are interested only on a specific row range of the /vlarray1 object:
ptdump -R2,4 -d vlarray1.h5:/vlarray1 /vlarray1 (VLArray(4,), shuffle, zlib(1)) 'ragged array of ints' Data dump: [array([5, 6, 9, 8]), array([ 5, 6, 9, 10, 12])]Here, we have specified the range of rows between 2 and 4 (the upper limit excluded, as usual in Python). See how we have selected only the /vlarray1 object for doing the dump (vlarray1.h5:/vlarray1).
Finally, you can mix several information at once:
$ ptdump -R2,4 -vad vlarray1.h5:/vlarray1 /vlarray1 (VLArray(4,), shuffle, zlib(1)) 'ragged array of ints' atom = Atom(type=Int32, shape=1, flavor='Numeric') nrows = 4 flavor = 'Numeric' byteorder = 'little' /vlarray1.attrs (AttributeSet), 4 attributes: [CLASS := 'VLARRAY', FLAVOR := 'Numeric', TITLE := 'ragged array of ints', VERSION := '1.0'] Data dump: [array([5, 6, 9, 8]), array([ 5, 6, 9, 10, 12])]