D.3. Mandatory attributes, storage layout and supported data types for Leaves

This depends on the kind of Leaf. The format for each type follows.

D.3.1. Table format

D.3.1.1. Mandatory attributes

The next attributes are mandatory for table structures:

CLASS

Must be set to 'TABLE'.

TITLE

A string where the user can put some description on what is this dataset used for.

VERSION

Should contain the string '2.6'.

FLAVOR

This is meant to provide the information about the kind of object kept in the Table, i.e. when the dataset is read, it will be converted to the indicated flavor. It can take one the next string values:

"numarray"

The read operations will return a numarray object.

"numpy"

The read operations will be return as a NumPy object.

FIELD_X_NAME

It contains the names of the different fields. The X means the number of the field, zero-based (beware, order do matter). You should add as many attributes of this kind as fields you have in your records.

FIELD_X_FILL

It contains the default values of the different fields. All the datatypes are suported natively, except for complex types that are currently serialized using Pickle. The X means the number of the field, zero-based (beware, order do matter). You should add as many attributes of this kind as fields you have in your records. These fields are meant for saving the default values persistently and their existence is optional.

NROWS

This should contain the number of compound data type entries in the dataset. It must be an int data type.

D.3.1.2. Storage Layout

A Table has a dataspace with a 1-dimensional chunked layout.

D.3.1.3. Datatypes supported

The datatype of the elements (rows) of Table must be the H5T_COMPOUND compound data type, and each of these compound components must be built with only the next HDF5 data types classes:

H5T_BITFIELD

This class is used to represent the Bool type. Such a type must be build using a H5T_NATIVE_B8 datatype, followed by a HDF5 H5Tset_precision call to set its precision to be just 1 bit.

H5T_INTEGER

This includes the next data types:

H5T_NATIVE_SCHAR

This represents a signed char C type, but it is effectively used to represent an Int8 type.

H5T_NATIVE_UCHAR

This represents an unsigned char C type, but it is effectively used to represent an UInt8 type.

H5T_NATIVE_SHORT

This represents a short C type, and it is effectively used to represent an Int16 type.

H5T_NATIVE_USHORT

This represents an unsigned short C type, and it is effectively used to represent an UInt16 type.

H5T_NATIVE_INT

This represents an int C type, and it is effectively used to represent an Int32 type.

H5T_NATIVE_UINT

This represents an unsigned int C type, and it is effectively used to represent an UInt32 type.

H5T_NATIVE_LONG

This represents a long C type, and it is effectively used to represent an Int32 or an Int64, depending on whether you are running a 32-bit or 64-bit architecture.

H5T_NATIVE_ULONG

This represents an unsigned long C type, and it is effectively used to represent an UInt32 or an UInt64, depending on whether you are running a 32-bit or 64-bit architecture.

H5T_NATIVE_LLONG

This represents a long long C type (__int64, if you are using a Windows system) and it is effectively used to represent an Int64 type.

H5T_NATIVE_ULLONG

This represents an unsigned long long C type (beware: this type does not have a correspondence on Windows systems) and it is effectively used to represent an UInt64 type.

H5T_FLOAT

This includes the next datatypes:

H5T_NATIVE_FLOAT

This represents a float C type and it is effectively used to represent an Float32 type.

H5T_NATIVE_DOUBLE

This represents a double C type and it is effectively used to represent an Float64 type.

H5T_TIME

This includes the next datatypes:

H5T_UNIX_D32BE

This represents a POSIX time_t C type and it is effectively used to represent a 'Time32' aliasing type, which corresponds to an Int32 type.

H5T_UNIX_D64BE

This represents a POSIX struct timeval C type and it is effectively used to represent a 'Time64' aliasing type, which corresponds to a Float64 type.

H5T_STRING

The datatype used to describe strings in PyTables is H5T_C_S1 (i.e. a string C type) followed with a call to the HDF5 H5Tset_size() function to set their length.

H5T_ARRAY

This allows the construction of homogeneous, multidimensional arrays, so that you can include such objects in compound records. The types supported as elements of H5T_ARRAY data types are the ones described above. Currently, PyTables does not support nested H5T_ARRAY types.

H5T_COMPOUND

This allows the support of complex numbers. Its format is described below:

The H5T_COMPOUND type class contains two members. Both members must have the H5T_FLOAT atomic datatype class. The name of the first member should be "r" and represents the real part. The name of the second member should be "i" and represents the imaginary part. The precision property of both of the H5T_FLOAT members must be either 32 significant bits (e.g. H5T_NATIVE_FLOAT) or 64 significant bits (e.g. H5T_NATIVE_DOUBLE). They represent Complex32 and Complex64 types respectively.

Currently, PyTables does not support nested H5T_COMPOUND types, the only exception being supporting complex numbers in Table objects as described above.

D.3.2. Array format

D.3.2.1. Mandatory attributes

The next attributes are mandatory for array structures:

CLASS

Must be set to 'ARRAY'.

FLAVOR

This is meant to provide the information about the kind of object kept in the Array, i.e. when the dataset is read, it will be converted to the indicated flavor. It can take one the next string values:

"numarray"

The read operations will return a numarray object.

"numpy"

The read operations will return a NumPy object.

"numeric"

The read operations will return a Numeric object.

"python"

The read operations will return a Python list object in case the dataset has dimensionality. If the dataset is an scalar, then an appropriate Python scalar will be returned instead.

TITLE

A string where the user can put some description on what is this dataset used for.

VERSION

Should contain the string '2.3'.

D.3.2.2. Storage Layout

An Array has a dataspace with a N-dimensional contiguous layout (if you prefer a chunked layout see EArray below).

D.3.2.3. Datatypes supported

The elements of Array must have either HDF5 atomic data types or a compound data type representing a complex number. The atomic data types can currently be one of the next HDF5 data type classes: H5T_BITFIELD, H5T_INTEGER, H5T_FLOAT and H5T_STRING. The H5T_TIME class is also supported for reading existing Array objects, but not for creating them. See the Table format description in section D.3.1 for more info about these types.

In addition to the HDF5 atomic data types, the Array format supports complex numbers with the H5T_COMPOUND data type class. See the Table format description in section D.3.1 for more info about this special type.

You should note that H5T_ARRAY class datatypes are not allowed in Array objects.

D.3.3. CArray format

D.3.3.1. Mandatory attributes

The next attributes are mandatory for carray structures:

CLASS

Must be set to 'CARRAY'.

FLAVOR

This is meant to provide the information about the kind of objects kept in the CArray, i.e. when the dataset is read, it will be converted to the indicated flavor. It can take the same values as the Array object.

TITLE

A string where the user can put some description on what is this dataset used for.

VERSION

Should contain the string '1.0'.

D.3.3.2. Storage Layout

An CArray has a dataspace with a N-dimensional chunked layout.

D.3.3.3. Datatypes supported

The elements of CArray must have either HDF5 atomic data types or a compound data type representing a complex number. The atomic data types can currently be one of the next HDF5 data type classes: H5T_BITFIELD, H5T_INTEGER, H5T_FLOAT and H5T_STRING. The H5T_TIME class is also supported for reading existing CArray objects, but not for creating them. See the Table format description in section D.3.1 for more info about these types.

In addition to the HDF5 atomic data types, the CArray format supports complex numbers with the H5T_COMPOUND data type class. See the Table format description in section D.3.1 for more info about this special type.

You should note that H5T_ARRAY class datatypes are not allowed in Array objects.

D.3.4. EArray format

D.3.4.1. Mandatory attributes

The next attributes are mandatory for earray structures:

CLASS

Must be set to 'EARRAY'.

EXTDIM

(Integer) Must be set to the extensible dimension. Only one extensible dimension is supported right now.

FLAVOR

This is meant to provide the information about the kind of objects kept in the EArray, i.e. when the dataset is read, it will be converted to the indicated flavor. It can take the same values as the Array object (see D.3.2), except "Int" and "Float".

TITLE

A string where the user can put some description on what is this dataset used for.

VERSION

Should contain the string '1.3'.

D.3.4.2. Storage Layout

An EArray has a dataspace with a N-dimensional chunked layout.

D.3.4.3. Datatypes supported

The elements of EArray are allowed to have the same data types as for the elements in the Array format. They can be one of the HDF5 atomic data type classes: H5T_BITFIELD, H5T_INTEGER, H5T_FLOAT, H5T_TIME or H5T_STRING, see the Table format description in section D.3.1 for more info about these types. They can also be a H5T_COMPOUND datatype representing a complex number, see the Table format description in section D.3.1.

You should note that H5T_ARRAY class data types are not allowed in EArray objects.

D.3.5. VLArray format

D.3.5.1. Mandatory attributes

The next attributes are mandatory for vlarray structures:

CLASS

Must be set to 'VLARRAY'.

FLAVOR

This is meant to provide the information about the kind of objects kept in the VLArray, i.e. when the dataset is read, it will be converted to the indicated flavor. It can take one of the next values:

"numarray"

The dataset will be returned as a numarray object.

"numpy"

The dataset will be returned as a NumPy object.

"numeric"

The dataset will be returned as an Numeric object.

"python"

The dataset will be returned as a Python List object in case the dataset has dimensionality. If the dataset is an scalar, then an appropriate Python scalar will be returned instead.

"Object"

The elements in the dataset will be interpreted as pickled (i.e. serialized objects through the use of the Pickle Python module) objects and returned as Python generic objects. Only one of such objects will be deserialized per entry. As the Pickle module is not normally available in other languages, this flavor won't be useful in general.

"VLString"

The elements in the dataset will be returned as Python String objects of any length, with the twist that Unicode strings are supported as well (provided you use the UTF-8 codification, see below). However, only one of such objects will be deserialized per entry.

TITLE

A string where the user can put some description on what is this dataset used for.

VERSION

Should contain the string '1.2'.

D.3.5.2. Storage Layout

An VLArray has a dataspace with a 1-dimensional chunked layout.

D.3.5.3. Data types supported

The data type of the elements (rows) of VLArray objects must be the H5T_VLEN variable-length (or VL for short) datatype, and the base datatype specified for the VL datatype can be of any atomic HDF5 datatype that is listed in the Table format description section D.3.1. That includes the classes:

  • H5T_BITFIELD

  • H5T_INTEGER

  • H5T_FLOAT

  • H5T_TIME

  • H5T_STRING

  • H5T_ARRAY

They can also be a H5T_COMPOUND data type representing a complex number, see the Table format description in section D.3.1 for a detailed description.

You should note that this does not include another VL datatype, or a compound datatype that does not fit the description of a complex number. Note as well that, for Object and VLString special flavors, the base for the VL datatype is always a H5T_NATIVE_UCHAR. That means that the complete row entry in the dataset has to be used in order to fully serialize the object or the variable length string.

In addition, if you plan to use a VLString flavor for your text data and you are using ascii-7 (7 bits ASCII) codification for your strings, but you don't know (or just don't want) to convert it to the required UTF-8 codification, you should not worry too much about that because the ASCII characters with values in the range [0x00, 0x7f] are directly mapped to Unicode characters in the range [U+0000, U+007F] and the UTF-8 encoding has the useful property that an UTF-8 encoded ascii-7 string is indistinguishable from a traditional ascii-7 string. So, you will not need any further conversion in order to save your ascii-7 strings and have an VLString flavor.