3.7. Dealing with nested structures in tables

PyTables supports the handling of nested structures (or nested datatypes, as you prefer) in table objects, allowing you to define arbitrarily nested columns.

An example will clarify what this means. Let's suppose that you want to group your data in pieces of information that are more related than others pieces in your table, So you may want to tie them up together in order to have your table better structured but also be able to retrieve and deal with these groups more easily.

You can create such a nested substructures by just nesting subclasses of IsDescription. Let's see one example (okay, it's a bit silly, but will serve for demonstration purposes):


class Info(IsDescription):
    """A sub-structure of Test"""
    _v_pos = 2   # The position in the whole structure
    name = StringCol(10)
    value = Float64Col(pos=0)

colors = Enum(['red', 'green', 'blue'])  # An enumerated type

class NestedDescr(IsDescription):
    """A description that has several nested columns"""
    color = EnumCol(colors, 'red', dtype='UInt32', indexed=1) # indexed column
    info1 = Info()
    class info2(IsDescription):
        _v_pos = 1
        name = StringCol(10)
        value = Float64Col(pos=0)
        class info3(IsDescription):
            x = FloatCol(1)
            y = UInt8Col(1)

	

The root class is NestedDescr and both info1 and info2 are substructures of it. Note how info1 is actually an instance of the class Info that was defined prior to NestedDescr. Also, there is a third substructure, namely info3 that hangs from the substructure info2. You can also define positions of substructures in the containing object by declaring the special class attribute _v_pos.

3.7.1. Nested table creation

Now that we have defined our nested structure, let's create a nested table, that is a table with columns that contain other subcolumns.


>>> from tables import *
>>> fileh = openFile("nested-tut.h5", "w")
>>> table = fileh.createTable(fileh.root, 'table', NestedDescr)
>>>
	  

Done! Now, we have to feed the table with some values. The problem is how we are going to reference to the nested fields. That's easy, just use a '/' character to separate names in different nested levels. Look at this:


>>> for i in range(10):
...     row['color'] = colors[['red', 'green', 'blue'][i%3]]
...     row['info1/name'] = "name1-%s" % i
...     row['info2/name'] = "name2-%s" % i
...     row['info2/info3/y'] =  i
...     # All the rest will be filled with defaults
...     row.append()
...
>>> table.flush()
>>> table.nrows
10L
>>>
	  

You see? In order to fill the fields located in the substructures, we just need to specify its full path in the table hierarchy.

3.7.2. Reading nested tables: introducing NestedRecArray objects

Now, what happens if we want to read the table? Which data container will be used to keep the data? Well, it's worth trying it:


>>> nra = table[::4]
>>> print nra
NestedRecArray[
(((1.0, 0), 'name2-0', 0.0), ('name1-0', 0.0), 0L),
(((1.0, 4), 'name2-4', 0.0), ('name1-4', 0.0), 1L),
(((1.0, 8), 'name2-8', 0.0), ('name1-8', 0.0), 2L)
]
>>>
	  

We have read one row for each four in the table, giving a result of three rows. What about the container? Well, we can see that it is a new mysterious object known as NestedRecArray. If we ask for more info on that:


>>> type(nra)
<class 'tables.nestedrecords.NestedRecArray'>
	  

we see that it is an instance of the class NestedRecArray that lives in the module nestedrecords of tables package. NestedRecArray is actually a subclass of the RecArray object of the records module of numarray package. You can see more info about NestedRecArray object in appendix B.

You can make use of the above object in many different ways. For example, you can use it to append new data to the existing table object:


>>> table.append(nra)
>>> table.nrows
13L
>>>
	  

Or, to create new tables:


>>> table2 = fileh.createTable(fileh.root, 'table2', nra)
>>> table2[:]
array(
[(((1.0, 0), 'name2-0', 0.0), ('name1-0', 0.0), 0L),
(((1.0, 4), 'name2-4', 0.0), ('name1-4', 0.0), 1L),
(((1.0, 8), 'name2-8', 0.0), ('name1-8', 0.0), 2L)],
descr=[('info2', [('info3', [('x', '1f8'), ('y', '1u1')]), ('name',
 '1a10'), ('value', '1f8')]), ('info1', [('name', '1a10'), ('value',
 '1f8')]), ('color', '1u4')], shape=3)
	  

Finally, we can select nested values that fulfill some condition:


>>> names = [ x['info2/name'] for x in table if x['color'] == colors.red ]
>>> names
['name2-0', 'name2-3', 'name2-6', 'name2-9', 'name2-0']
>>>
	  

Note that the row accessor does not provide the natural naming feature, so you have to completely specify the path of your desired columns in order to reach them.

3.7.3. Using Cols accessor

We can use the cols attribute object (see 4.7) of the table so as to quickly access the info located in the interesting substructures:


>>> table.cols.info2[1:5]
array(
[((1.0, 1), 'name2-1', 0.0),
((1.0, 2), 'name2-2', 0.0),
((1.0, 3), 'name2-3', 0.0),
((1.0, 4), 'name2-4', 0.0)],
descr=[('info3', [('x', '1f8'), ('y', '1u1')]), ('name', '1a10'),
 ('value', '1f8')],
shape=4)
>>>
	  

Here, we have made use of the cols accessor to access to the info2 substructure and an slice operation to get access to the subset of data we were interested in; you probably have recognized the natural naming approach here. We can continue and ask for data in info3 substructure:


>>> table.cols.info2.info3[1:5]
array(
[(1.0, 1),
(1.0, 2),
(1.0, 3),
(1.0, 4)],
descr=[('x', '1f8'), ('y', '1u1')],
shape=4)
>>>
	  

You can also use the _f_col method to get a handler for a column:


>>> table.cols._f_col('info2')
/table.cols.info2 (Cols), 3 columns
  info3 (Cols(1,), Description)
  name (Column(1,), CharType)
  value (Column(1,), Float64)
	  

Here, you've got another Cols object handler because info2 was a nested column. If you select a non-nested column, you will get a regular Column instance:


>>> ycol = table.cols._f_col('info2/info3/y')
>>> ycol
/table.cols.info2.info3.y (Column(1,), UInt8, idx=None)
>>>
	  

To sum up, the cols accessor is a very handy and powerful way to access data in your nested tables. Be sure of using it, specially when doing interactive work.

3.7.4. Accessing meta-information of nested tables

Tables have an attribute called description which points to an instance of the Description class (see 4.8) and is useful to discover different meta-information about table data.

Let's see how it looks like:


>>> table.description
{
  "info2": {
    "info3": {
      "x": FloatCol(dflt=1, shape=1, itemsize=8, pos=0, indexed=False),
      "y": UInt8Col(dflt=1, shape=1, pos=1, indexed=False)},
    "name": StringCol(length=10, dflt=None, shape=1, pos=1, indexed=False),
    "value": Float64Col(dflt=0.0, shape=1, pos=2, indexed=False)},
  "info1": {
    "name": StringCol(length=10, dflt=None, shape=1, pos=0, indexed=False),
    "value": Float64Col(dflt=0.0, shape=1, pos=1, indexed=False)},
  "color": EnumCol(Enum({'blue': 2, 'green': 1, 'red': 0}), 'red',
 dtype='UInt32', shape=1, pos=2, indexed=1)}
>>>
	  

As you can see, it provides very useful information on both the formats and the structure of the columns in your table.

This object also provides a natural naming approach to access to subcolumns metadata:


>>> table.description.info1
{
    "name": StringCol(length=10, dflt=None, shape=1, pos=0, indexed=False),
    "value": Float64Col(dflt=0.0, shape=1, pos=1, indexed=False)}
>>> table.description.info2.info3
{
      "x": FloatCol(dflt=1, shape=1, itemsize=8, pos=0, indexed=False),
      "y": UInt8Col(dflt=1, shape=1, pos=1, indexed=False)}
>>>
	  

There are other variables that can be interesting for you:


>>> table.description._v_nestedNames
[('info2', [('info3', ['x', 'y']), 'name', 'value']), ('info1',
 ['name', 'value']), 'color']
>>> table.description.info1._v_nestedNames
['name', 'value']
>>>
	  

_v_nestedNames provides the names of the columns as well as its structure. You can see that there are the same attributes for the different levels of the Description object, because the levels are also Description objects themselves.

There is a special attribute, called _v_nestedDescr that can be useful to create NestedRecArrays objects that imitate the structure of the table (or a subtable!):


>>> from tables import nestedrecords
>>> table.description._v_nestedDescr
[('info2', [('info3', [('x', '1f8'), ('y', '1u1')]), ('name', '1a10'),
 ('value', '1f8')]), ('info1', [('name', '1a10'), ('value', '1f8')]),
 ('color', '1u4')]
>>> nestedrecords.array(None, descr=table.description._v_nestedDescr)
array(
[],
descr=[('info2', [('info3', [('x', '1f8'), ('y', '1u1')]), ('name',
 '1a10'), ('value', '1f8')]), ('info1', [('name', '1a10'), ('value',
 '1f8')]),('color', '1u4')], shape=0)
>>> nestedrecords.array(None, descr=table.description.info2._v_nestedDescr)
array(
[],
descr=[('info3', [('x', '1f8'), ('y', '1u1')]), ('name', '1a10'),
('value', '1f8')], shape=0)
>>>
	  

Look the section 4.8 for the complete listing of attributes.

Finally, there is a special iterator of the Description class, called _v_walk that is able to return you the different columns of the table:


>>> for coldescr in table.description._v_walk():
...     print "column-->",coldescr
...
column--> Description([('info2', [('info3', [('x', '1f8'), ('y',
 '1u1')]), ('name', '1a10'), ('value', '1f8')]), ('info1', [('name',
 '1a10'), ('value', '1f8')]), ('color', '1u4')])
column--> EnumCol(Enum({'blue': 2, 'green': 1, 'red': 0}), 'red',
 dtype='UInt32', shape=1, pos=2, indexed=1)
column--> Description([('info3', [('x', '1f8'), ('y', '1u1')]),
 ('name', '1a10'), ('value', '1f8')])
column--> StringCol(length=10, dflt=None, shape=1, pos=1, indexed=False)
column--> Float64Col(dflt=0.0, shape=1, pos=2, indexed=False)
column--> Description([('name', '1a10'), ('value', '1f8')])
column--> StringCol(length=10, dflt=None, shape=1, pos=0, indexed=False)
column--> Float64Col(dflt=0.0, shape=1, pos=1, indexed=False)
column--> Description([('x', '1f8'), ('y', '1u1')])
column--> FloatCol(dflt=1, shape=1, itemsize=8, pos=0, indexed=False)
column--> UInt8Col(dflt=1, shape=1, pos=1, indexed=False)
>>>
	  

Well, this is the end of this tutorial. As always, do not forget to close your files:


>>> fileh.close()
>>>
	  

Finally, you may want to have a look at your resulting data file:


$ ptdump -d nested-tut.h5
/ (RootGroup) ''
/table (Table(13L,)) ''
  Data dump:
[0] (((1.0, 0), 'name2-0', 0.0), ('name1-0', 0.0), 0L)
[1] (((1.0, 1), 'name2-1', 0.0), ('name1-1', 0.0), 1L)
[2] (((1.0, 2), 'name2-2', 0.0), ('name1-2', 0.0), 2L)
[3] (((1.0, 3), 'name2-3', 0.0), ('name1-3', 0.0), 0L)
[4] (((1.0, 4), 'name2-4', 0.0), ('name1-4', 0.0), 1L)
[5] (((1.0, 5), 'name2-5', 0.0), ('name1-5', 0.0), 2L)
[6] (((1.0, 6), 'name2-6', 0.0), ('name1-6', 0.0), 0L)
[7] (((1.0, 7), 'name2-7', 0.0), ('name1-7', 0.0), 1L)
[8] (((1.0, 8), 'name2-8', 0.0), ('name1-8', 0.0), 2L)
[9] (((1.0, 9), 'name2-9', 0.0), ('name1-9', 0.0), 0L)
[10] (((1.0, 0), 'name2-0', 0.0), ('name1-0', 0.0), 0L)
[11] (((1.0, 4), 'name2-4', 0.0), ('name1-4', 0.0), 1L)
[12] (((1.0, 8), 'name2-8', 0.0), ('name1-8', 0.0), 2L)
/table2 (Table(3L,)) ''
  Data dump:
[0] (((1.0, 0), 'name2-0', 0.0), ('name1-0', 0.0), 0L)
[1] (((1.0, 4), 'name2-4', 0.0), ('name1-4', 0.0), 1L)
[2] (((1.0, 8), 'name2-8', 0.0), ('name1-8', 0.0), 2L)
	  

Most of the code in this section is also available in examples/nested-tut.py.

All in all, PyTables provides a quite comprehensive toolset to cope with nested structures and address your classification needs. However, caveat emptor, be sure to not nest your data too deeply or you will get inevitably messed interpreting too intertwined lists, tuples and description objects.