In this section, we will learn how to browse the tree and retrieve data and also meta-information about the actual data.
In examples/tutorial1-2.py you will find the working version of all the code in this section. As before, you are encouraged to use a python shell and inspect the object tree during the course of the tutorial.
Let's start by opening the file we created in last tutorial section.
>>> h5file = openFile("tutorial1.h5", "a")
This time, we have opened the file in "a"ppend mode. We use this mode to add more information to the file.
PyTables, following the Python tradition, offers powerful introspection capabilities, i.e. you can easily ask information about any component of the object tree as well as search the tree.
To start with, you can get a preliminary overview of the object tree by simply printing the existing File instance:
>>> print h5file Filename: 'tutorial1.h5' Title: 'Test file' Last modif.: 'Sun Jul 27 14:40:51 2003' / (Group) 'Test file' /columns (Group) 'Pressure and Name' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector (Group) 'Detector information' /detector/readout (Table(10,)) 'Readout example'
It looks like all of our objects are there. Now let's make use of the File iterator to see to list all the nodes in the object tree:
>>> for node in h5file: ... print node ... / (Group) 'Test file' /columns (Group) 'Pressure and Name' /detector (Group) 'Detector information' /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection' /detector/readout (Table(10,)) 'Readout example'
We can use the walkGroups method (see 4.2.2) of the File class to list only the groups on tree:
>>> for group in h5file.walkGroups("/"): ... print group ... / (Group) 'Test file' /columns (Group) 'Pressure and Name' /detector (Group) 'Detector information'
Note that walkGroups() actually returns an iterator, not a list of objects. Using this iterator with the listNodes() method is a powerful combination. Let's see an example listing of all the arrays in the tree:
>>> for group in h5file.walkGroups("/"): ... for array in h5file.listNodes(group, classname = 'Array'): ... print array ... /columns/name Array(3,) 'Name column selection' /columns/pressure Array(3,) 'Pressure column selection'
listNodes() (see 4.2.2) returns a list containing all the nodes hanging off a specific Group. If the classname keyword is specified, the method will filter out all instances which are not descendants of the class. We have asked for only Array instances. There exist also an iterator counterpart called iterNodes() (see 4.2.2) that might be handy is some situations, like for example when dealing with groups with a large number of nodes behind it.
We can combine both calls by using the walkNodes(where, classname) special method of the File object (see 4.2.2). For example:
>>> for array in h5file.walkNodes("/", "Array"): ... print array ... /columns/name (Array(3,)) 'Name column selection' /columns/pressure (Array(3,)) 'Pressure column selection'
This is a nice shortcut when working interactively.
Finally, we will list all the Leaf, i.e. Table and Array instances (see 4.5 for detailed information on Leaf class), in the /detector group. Note that only one instance of the Table class (i.e. readout) will be selected in this group (as should be the case):
>>> for leaf in h5file.root.detector._f_walkNodes('Leaf'): ... print leaf ... /detector/readout (Table(10,)) 'Readout example'
We have used a call to the Group._f_walkNodes(classname, recursive) method (4.4.2), using the natural naming path specification.
Of course you can do more sophisticated node selections using these powerful methods. But first, let's take a look at some important PyTables object instance variables.
PyTables provides an easy and concise way to complement the meaning of your node objects on the tree by using the AttributeSet class (see section 4.15). You can access this object through the standard attribute attrs in Leaf nodes and _v_attrs in Group nodes.
For example, let's imagine that we want to save the date indicating when the data in /detector/readout table has been acquired, as well as the temperature during the gathering process:
>>> table = h5file.root.detector.readout >>> table.attrs.gath_date = "Wed, 06/12/2003 18:33" >>> table.attrs.temperature = 18.4 >>> table.attrs.temp_scale = "Celsius"
Now, let's set a somewhat more complex attribute in the /detector group:
>>> detector = h5file.root.detector >>> detector._v_attrs.stuff = [5, (2.3, 4.5), "Integer and tuple"]
Note how the AttributeSet instance is accessed with the _v_attrs attribute because detector is a Group node. In general, you can save any standard Python data structure as an attribute node. See section 4.15 for a more detailed explanation of how they are serialized for export to disk.
Retrieving the attributes is equally simple:
>>> table.attrs.gath_date 'Wed, 06/12/2003 18:33' >>> table.attrs.temperature 18.399999999999999 >>> table.attrs.temp_scale 'Celsius' >>> detector._v_attrs.stuff [5, (2.2999999999999998, 4.5), 'Integer and tuple']
You can probably guess how to delete attributes:
>>> del table.attrs.gath_date
If you want to examine the current user attribute set of /detector/table, you can print its representation (try hitting the TAB key twice if you are on a Unix Python console with the rlcompleter module active):
>>> table.attrs /detector/readout (AttributeSet), 2 attributes: [temp_scale := 'Celsius', temperature := 18.399999999999999]
You can get a list of all attributes or only the user or system attributes with the _f_list() method.
>>> print table.attrs._f_list("all") ['CLASS', 'FIELD_0_NAME', 'FIELD_1_NAME', 'FIELD_2_NAME', 'FIELD_3_NAME', 'FIELD_4_NAME', 'FIELD_5_NAME', 'FIELD_6_NAME', 'FIELD_7_NAME', 'NROWS', 'TITLE', 'VERSION', 'temp_scale', 'temperature'] >>> print table.attrs._f_list("user") ['temp_scale', 'temperature'] >>> print table.attrs._f_list("sys") ['CLASS', 'FIELD_0_NAME', 'FIELD_1_NAME', 'FIELD_2_NAME', 'FIELD_3_NAME', 'FIELD_4_NAME', 'FIELD_5_NAME', 'FIELD_6_NAME', 'FIELD_7_NAME', 'NROWS', 'TITLE', 'VERSION']
You can also rename attributes:
>>> table.attrs._f_rename("temp_scale","tempScale") >>> print table.attrs._f_list() ['tempScale', 'temperature']
However, you can not set, delete or rename read-only attributes:
>>> table.attrs._f_rename("VERSION", "version") Traceback (most recent call last): File ">stdin>", line 1, in ? File "/home/falted/PyTables/pytables-0.7/tables/AttributeSet.py", line 249, in _f_rename raise AttributeError, \ AttributeError: Read-only attribute ('VERSION') cannot be renamed
If you would terminate your session now, you would be able to use the h5ls command to read the /detector/readout attributes from the file written to disk:
$ h5ls -vr tutorial1.h5/detector/readout Opened "tutorial1.h5" with sec2 driver. /detector/readout Dataset {10/Inf} Attribute: CLASS scalar Type: 6-byte null-terminated ASCII string Data: "TABLE" Attribute: VERSION scalar Type: 4-byte null-terminated ASCII string Data: "2.0" Attribute: TITLE scalar Type: 16-byte null-terminated ASCII string Data: "Readout example" Attribute: FIELD_0_NAME scalar Type: 9-byte null-terminated ASCII string Data: "ADCcount" Attribute: FIELD_1_NAME scalar Type: 9-byte null-terminated ASCII string Data: "TDCcount" Attribute: FIELD_2_NAME scalar Type: 7-byte null-terminated ASCII string Data: "energy" Attribute: FIELD_3_NAME scalar Type: 7-byte null-terminated ASCII string Data: "grid_i" Attribute: FIELD_4_NAME scalar Type: 7-byte null-terminated ASCII string Data: "grid_j" Attribute: FIELD_5_NAME scalar Type: 9-byte null-terminated ASCII string Data: "idnumber" Attribute: FIELD_6_NAME scalar Type: 5-byte null-terminated ASCII string Data: "name" Attribute: FIELD_7_NAME scalar Type: 9-byte null-terminated ASCII string Data: "pressure" Attribute: tempScale scalar Type: 8-byte null-terminated ASCII string Data: "Celsius" Attribute: temperature {1} Type: native double Data: 18.4 Attribute: NROWS {1} Type: native int Data: 10 Location: 0:1:0:1952 Links: 1 Modified: 2003-07-24 13:59:19 CEST Chunks: {2048} 96256 bytes Storage: 470 logical bytes, 96256 allocated bytes, 0.49% utilization Type: struct { "ADCcount" +0 native unsigned short "TDCcount" +2 native unsigned char "energy" +3 native double "grid_i" +11 native int "grid_j" +15 native int "idnumber" +19 native long long "name" +27 16-byte null-terminated ASCII string "pressure" +43 native float } 47 bytes
Attributes are a useful mechanism to add persistent (meta) information to your data.
Each object in PyTables has metadata information about the data in the file. Normally this meta-information is accessible through the node instance variables. Let's take a look at some examples:
>>> print "Object:", table Object: /detector/readout Table(10,) 'Readout example' >>> print "Table name:", table.name Table name: readout >>> print "Table title:", table.title Table title: Readout example >>> print "Number of rows in table:", table.nrows Number of rows in table: 10 >>> print "Table variable names with their type and shape:" Table variable names with their type and shape: >>> for name in table.colnames: ... print name, ':= %s, %s' % (table.coltypes[name], table.colshapes[name]) ... ADCcount := UInt16, 1 TDCcount := UInt8, 1 energy := Float64, 1 grid_i := Int32, 1 grid_j := Int32, 1 idnumber := Int64, 1 name := CharType, 1 pressure := Float32, 1
Here, the name, title, nrows, colnames, coltypes and colshapes attributes (see 4.6.1 for a complete attribute list) of the Table object gives us quite a bit of information about the table data.
You can interactively retrieve general information about the public objects in PyTables by printing their internal doc strings:
>>> print table.__doc__ Represent a table in the object tree. It provides methods to create new tables or open existing ones, as well as to write/read data to/from table objects over the file. A method is also provided to iterate over the rows without loading the entire table or column in memory. Data can be written or read both as Row instances or numarray (NumArray or RecArray) objects or NestedRecArray objects. Methods: __getitem__(key) __iter__() __setitem__(key, value) append(rows) flushRowsToIndex() iterrows(start, stop, step) itersequence(sequence) modifyRows(start, rows) modifyColumn(columns, names, [start] [, stop] [, step]) modifyColumns(columns, names, [start] [, stop] [, step]) read([start] [, stop] [, step] [, field [, flavor]]) reIndex() reIndexDirty() removeRows(start [, stop]) removeIndex(column) where(condition [, start] [, stop] [, step]) whereAppend(dstTable, condition [, start] [, stop] [, step]) getWhereList(condition [, flavor]) Instance variables: description -- the metaobject describing this table row -- a reference to the Row object associated with this table nrows -- the number of rows in this table rowsize -- the size, in bytes, of each row cols -- accessor to the columns using a natural name schema colnames -- the field names for the table (tuple) coltypes -- the type class for the table fields (dictionary) colshapes -- the shapes for the table fields (dictionary) colindexed -- whether the table fields are indexed (dictionary) indexed -- whether or not some field in Table is indexed indexprops -- properties of an indexed Table
The help function is also a handy way to see PyTables reference documentation online. Try it yourself with other object docs:
>>> help(table.__class__) >>> help(table.removeRows)
To examine metadata in the /columns/pressure Array object:
>>> pressureObject = h5file.getNode("/columns", "pressure") >>> print "Info on the object:", repr(pressureObject) Info on the object: /columns/pressure (Array(3,)) 'Pressure column selection' type = Float64 itemsize = 8 flavor = 'numarray' byteorder = 'little' >>> print " shape: ==>", pressureObject.shape shape: ==> (3,) >>> print " title: ==>", pressureObject.title title: ==> Pressure column selection >>> print " type: ==>", pressureObject.type type: ==> Float64
Observe that we have used the getNode() method of the File class to access a node in the tree, instead of the natural naming method. Both are useful, and depending on the context you will prefer one or the other. getNode() has the advantage that it can get a node from the pathname string (as in this example) and can also act as a filter to show only nodes in a particular location that are instances of class classname. In general, however, I consider natural naming to be more elegant and easier to use, especially if you are using the name completion capability present in interactive console. Try this powerful combination of natural naming and completion capabilities present in most Python consoles, and see how pleasant it is to browse the object tree (well, as pleasant as such an activity can be).
If you look at the type attribute of the pressureObject object, you can verify that it is a "Float64" array. By looking at its shape attribute, you can deduce that the array on disk is unidimensional and has 3 elements. See 4.10.1 or the internal doc strings for the complete Array attribute list.
Once you have found the desired Array, use the read() method of the Array object to retrieve its data:
>>> pressureArray = pressureObject.read() >>> pressureArray array([ 25., 36., 49.]) >>> print "pressureArray is an object of type:", type(pressureArray) pressureArray is an object of type: <class 'numarray.numarraycore.NumArray'> >>> nameArray = h5file.root.columns.name.read() >>> nameArray ['Particle: 5', 'Particle: 6', 'Particle: 7'] >>> print "nameArray is an object of type:", type(nameArray) nameArray is an object of type: <type 'list'> >>> >>> print "Data on arrays nameArray and pressureArray:" Data on arrays nameArray and pressureArray: >>> for i in range(pressureObject.shape[0]): ... print nameArray[i], "-->", pressureArray[i] ... Particle: 5 --> 25.0 Particle: 6 --> 36.0 Particle: 7 --> 49.0 >>> pressureObject.name 'pressure'
You can see that the read() method (see section 4.10.2) returns an authentic numarray object for the pressureObject instance by looking at the output of the type() call. A read() of the nameObject object instance returns a native Python list (of strings). The type of the object saved is stored as an HDF5 attribute (named FLAVOR) for objects on disk. This attribute is then read as Array meta-information (accessible through in the Array.attrs.FLAVOR variable), enabling the read array to be converted into the original object. This provides a means to save a large variety of objects as arrays with the guarantee that you will be able to later recover them in their original form. See section 4.2.2 for a complete list of supported objects for the Array object class.