PyTables has a powerful capability to deal with native HDF5 files created with another tools. However, there are situations were you may want to create truly native PyTables files with those tools while retaining fully compatibility with PyTables format. That is perfectly possible, and in this appendix is presented the format that you should endow to your own-generated files in order to get a fully PyTables compatible file.
We are going to describe the 1.6 version of PyTables file format (introduced in PyTables version 1.3). At this stage, this file format is considered stable enough to do not introduce significant changes during a reasonable amount of time. As time goes by, some changes will be introduced (and documented here) in order to cope with new necessities. However, the changes will be carefully pondered so as to ensure backward compatibility whenever is possible.
A PyTables file is composed with arbitrarily large amounts of HDF5 groups (Groups in PyTables naming scheme) and datasets (Leaves in PyTables naming scheme). For groups, the only requirements are that they must have some system attributes available. By convention, system attributes in PyTables are written in upper case, and user attributes in lower case but this is not enforced by the software. In the case of datasets, besides the mandatory system attributes, some conditions are further needed in their storage layout, as well as in the datatypes used in there, as we will see shortly.
As a final remark, you can use any filter as you want to create a PyTables file, provided that the filter is a standard one in HDF5, like zlib, shuffle or szip (although the last one can not be used from within PyTables to create a new file, datasets compressed with szip can be read, because it is the HDF5 library which do the decompression transparently).
The File object is, in fact, an special HDF5 group structure that is root for the rest of the objects on the object tree. The next attributes are mandatory for the HDF5 root group structure in PyTables files:
This attribute should always be set to 'GROUP' for group structures.
It represents the internal format version, and currently should be set to the '1.6' string.
A string where the user can put some description on what is this group used for.
Should contains the string '1.0'.