3.6. Using enumerated types

Beginning from version 1.1, PyTables supports the handling of enumerated types. Those types are defined by providing an exhaustive set or list of possible, named values for a variable of that type. Enumerated variables of the same type are usually compared between them for equality and sometimes for order, but are not usually operated upon.

Enumerated values have an associated name and concrete value. Every name is unique and so are concrete values. An enumerated variable always takes the concrete value, not its name. Usually, the concrete value is not used directly, and frequently it is entirely irrelevant. For the same reason, an enumerated variable is not usually compared with concrete values out of its enumerated type. For that kind of use, standard variables and constants are more adequate.

PyTables provides the Enum (see 4.17.4) class to provide support for enumerated types. Each instance of Enum is an enumerated type (or enumeration). For example, let us create an enumeration of colors[1]:

>>> import tables
>>> colorList = ['red', 'green', 'blue', 'white', 'black']
>>> colors = tables.Enum(colorList)
>>> 

Here we used a simple list giving the names of enumerated values, but we left the choice of concrete values up to the Enum class. Let us see the enumerated pairs to check those values:

>>> print "Colors:", [v for v in colors]
Colors: [('blue', 2), ('black', 4), ('white', 3), ('green', 1), ('red', 0)]
>>> 

Names have been given automatic integer concrete values. We can iterate over the values in an enumeration, but we will usually be more interested in accessing single values. We can get the concrete value associated with a name by accessing it as an attribute or as an item (the later can be useful for names not resembling Python identifiers):

>>> print "Value of 'red' and 'white':", (colors.red, colors.white)
Value of 'red' and 'white': (0, 3)
>>> print "Value of 'yellow':", colors.yellow
Value of 'yellow':
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "enum.py", line 222, in __getattr__
AttributeError: no enumerated value with that name: 'yellow'
>>> 
>>> print "Value of 'red' and 'white':", (colors['red'], colors['white'])
Value of 'red' and 'white': (0, 3)
>>> print "Value of 'yellow':", colors['yellow']
Value of 'yellow':
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "enum.py", line 181, in __getitem__
KeyError: "no enumerated value with that name: 'yellow'"
>>> 

See how accessing a value that is not in the enumeration raises the appropriate exception. We can also do the opposite action and get the name that matches a concrete value by using the __call__() method of Enum:

>>> print "Name of value %s:" % colors.red, colors(colors.red)
Name of value 0: red
>>> print "Name of value 1234:", colors(1234)
Name of value 1234:
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "enum.py", line 311, in __call__
ValueError: no enumerated value with that concrete value: 1234
>>> 

You can see what we made as using the enumerated type to convert a concrete value into a name in the enumeration. Of course, values out of the enumeration can not be converted.

3.6.1. Enumerated columns

Columns of an enumerated type can be declared by using the EnumCol (see 4.16.2) class. To see how this works, let us open a new PyTables file and create a table to collect the simulated results of a probabilistic experiment. In it, we have a bag full of colored balls; we take a ball out and annotate the time of extraction and the color of the ball.

>>> h5f = tables.openFile('enum.h5', 'w')
>>> 
>>> class BallExt(tables.IsDescription):
...     ballTime = tables.Time32Col()
...     ballColor = tables.EnumCol(colors, 'black', dtype='UInt8')
... 
>>> tbl = h5f.createTable(
...     '/', 'extractions', BallExt, title="Random ball extractions")
>>> 

We declared the ballColor column to be of the enumerated type colors, with a default value of black. We also stated that we are going to store concrete values as unsigned 8-bit integer values[2].

Let us use some random values to fill the table:

>>> import time
>>> import random
>>> now = time.time()
>>> row = tbl.row
>>> for i in range(10):
...     row['ballTime'] = now + i
...     row['ballColor'] = colors[random.choice(colorList)]  # notice this
...     row.append()
... 
>>> 

Notice how we used the __getitem()__ call of colors to get the concrete value to store in ballColor. You should know that this way of appending values to a table does automatically check for the validity on enumerated values. For instance:

>>> row['ballTime'] = now + 42
>>> row['ballColor'] = 1234
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
  File "hdf5Extension.pyx", line 2936, in hdf5Extension.Row.__setitem__
  File "enum.py", line 311, in __call__
ValueError: no enumerated value with that concrete value: 1234
>>> 

But take care that this check is only performed here and not in other methods such as tbl.append() or tbl.modifyRows(). Now, after flushing the table we can see the results of the insertions:

>>> tbl.flush()
>>>
>>> COMMENT("Now print them!")
>>> for r in tbl:
...     ballTime = r['ballTime']
...     ballColor = colors(r['ballColor'])  # notice this
...     print "Ball extracted on %d is of color %s." % (ballTime, ballColor)
...
Ball extracted on 1116501220 is of color white.
Ball extracted on 1116501221 is of color red.
Ball extracted on 1116501222 is of color blue.
Ball extracted on 1116501223 is of color white.
Ball extracted on 1116501224 is of color white.
Ball extracted on 1116501225 is of color green.
Ball extracted on 1116501226 is of color black.
Ball extracted on 1116501227 is of color red.
Ball extracted on 1116501228 is of color white.
Ball extracted on 1116501229 is of color white.
>>> 

As a last note, you may be wondering how to have access to the enumeration associated with ballColor once the file is closed and reopened. You can call tbl.getEnum('ballColor') (see 4.6.2) to get the enumeration back.

3.6.2. Enumerated arrays

EArray and VLArray leaves can also be declared to store enumerated values by means of the EnumAtom (see 4.16.3) class, which works very much like EnumCol for tables. Also, Array leaves can be used to open native HDF enumerated arrays.

Let us create a sample EArray containing ranges of working days as bidimensional values:

>>> workingDays = {'Mon': 1, 'Tue': 2, 'Wed': 3, 'Thu': 4, 'Fri': 5}
>>> dayRange = tables.EnumAtom(workingDays, shape=(0, 2), flavor='Tuple')
>>> earr = h5f.createEArray('/', 'days', dayRange, title="Working day ranges")
>>> 

Nothing surprising, except for a pair of details. In the first place, we use a dictionary instead of a list to explicitly set concrete values in the enumeration. In the second place, there is no explicit Enum instance created! Instead, the dictionary is passed as the first argument to the constructor of EnumAtom. If the constructor gets a list or a dictionary instead of an enumeration, it automatically builds the enumeration from it.

Now let us feed some data to the array:

>>> wdays = earr.getEnum()
>>> earr.append([(wdays.Mon, wdays.Fri), (wdays.Wed, wdays.Fri)])
>>> earr.append([(wdays.Mon, 1234)])
>>> 

Please note that, since we had no explicit Enum instance, we were forced to use getEnum() (see 4.12.2) to get it from the array (we could also have used dayRange.enum). Also note that we were able to append an invalid value (1234). Array methods do not check the validity of enumerated values.

Finally, we will print the contents of the array:

>>> for (d1, d2) in earr:
...     print "From %s to %s (%d days)." % (wdays(d1), wdays(d2), d2-d1+1)
... 
From Mon to Fri (5 days).
From Wed to Fri (3 days).
Traceback (most recent call last):
  File "<stdin>", line 2, in ?
  File "enum.py", line 311, in __call__
ValueError: no enumerated value with that concrete value: 1234L
>>> 

That was an example of operating on concrete values. It also showed how the value-to-name conversion failed because of the value not belonging to the enumeration.

Now we will close and remove the file, and this little tutorial on enumerated types is done:

>>> import os
>>> h5f.close()
>>> os.remove('enum.h5')
>>> 

Notes

[1]

All these examples can be found in examples/enum.py.

[2]

In fact, only integer values are supported right now, but this may change in the future.