Showing posts with label Group. Show all posts
Showing posts with label Group. Show all posts

Wednesday, March 2, 2011

Check If 'Group' Exits?

Sometimes it happens that you need to traverse through your file hierarchy in HDF5 file to check for an existance of a group/ a link. Yes, it can be a 'Group' or even a dataset(Here, if we take an example, it can be a packet table also).

so what all you need to do is, use H5Lexist, more information you can find here

Thursday, February 24, 2011

Compound Data in Packet Table

As I promised, here I am back with some new stuff i.e. Packet Table in HDF5. Packet tables are one of the most flexible data structures in HDF5 which allows you to store data either fixed or variable length. If you have a situation where you need to write your data coming from multiple sources into a single object, packet tables are worth to accomplish your job. There exists one more data structure which is 'Table'. Unlike packet table, this allows you to write only fixed length data defined in a structure. In performance wise also it is recommended to use packet tables over 'tables'. Actually, I was also finding it very hard to implement compound data with Packet table but doing some research and digging into HDF5 libraries I managed to do it. So here we go.

Consider, We have a Structure defined as:
/*
 *
 * Struct MyTable.
 *
 */
typedef struct 
{
    /// Field 1
    int field1;
    /// Field 2
    float field2;
    /// Field 3
    int field3;
}myTable;

Now, we need to create a data type:

/* Create data type to be used to write data in the Table */
hid_t myDataType = H5Tcreate(H5T_COMPOUND, sizeof(myTable));

herr_t status;
status = H5Tinsert (myDataType , "Field 1",
            HOFFSET (myTable, field1), H5T_NATIVE_INT);

status = H5Tinsert (myDataType , "Field 2",
            HOFFSET (myTable, field2), H5T_NATIVE_FLOAT);

status = H5Tinsert (myDataType , "Field 3",
            HOFFSET (myTable, field3), H5T_NATIVE_INT);

Here our compound datatype is ready and we can create a packet table where we can associate this type.

/* Create packet table */
ptMyTable = H5PTcreate_fl(file, "My Table", myDataType , chunk_size, compression);

As, variable length data having memory leak issues(Check here) I have created a packet table using an API for fixed length.

So, if you view it with HDF viewer it will look like:





Tuesday, February 22, 2011

Let's do Some Design & Code

There are many datataypes which one can use to implement HDF5. But, the base for all is 'DataSet'. You can store your raw data in a DataSet, in a table structure or packet table. We will see it one by one. Before starting anything I'll show you the structure which we are going to create in our file.




Here the 'Circle' represents a group and 'Rectangle' stands for a dataset a container in which we are going to put our data. Here I am demonstrating you my experiences, mistakes and how I deal with them to implement a better solutions for my problems.

The root node is our file's default node/group under which all branches we are going to hold throughout the development of HDF5 file. If you are familiar with B-Trees then your work is more easier with HDF5. The file can be created using following API:

hid_t fileID = H5Fcreate( "../../FileName.h5", H5F_ACC_TRUNC, H5P_Default, 
H5P_Default);

The more APIs related to file properties can be found here.

For a sake of naming convention and to understand it better I have named a newly created group as 'MyGroup'. Now, we'll see how we can create the same with HDF5 API. The HDF5 APIs to create, open, delete a 'Group' are grouped under H5G.

/* Create group 'MyGroup' under 'root' node */
hid_t groupID = H5Gcreate(file_id, "MyGroup", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
H5D is the next collection which holds the all the APIs required to work with the datasets(our data containers). Now, what we need is create a dataset under the group, the one which have already created called 'MyGroup'. Let's give our dataset a name 'DataSet1'.

/* Create the dataset */
hid_t dset = H5Dcreate (file, "DataSet1", filetype, filespace, H5P_DEFAULT, cparms, H5P_DEFAULT);

Here if you look at the parameters associated with it, you will get confused. So as for intial understanding consider I have defined few properties for our dataset to write data like data type, enable/disable chunking, etc.

Remember, I started this as creating a dataset for every new set of data. After giving it a test with running it in a thread to write data into file, I came to know that the number of datasets that can be created under one group has limitations i.e I could not create more than 65556 datasets in a single group. Now, this was a big big problem for me. I had almost lost everything that I already have spent much time on this and could not back-track.

To work with this problem I decided to go by another approach. That is, instead of creating altogether a new dataset for every set of new data, if I can use only one. For this I thought I need to understand more internals of datasets. Finaly, I came to know that I can use the same dataset, provided I should extend the dimensions of dataset each time I write a new data to it. This was the another reason why I chose C to program HDF5 as these APIs only available with C/C++ and not with the dot net or Java.

Before putting it altogether, I would like to know you that, I am going to write a variable length data to the dataset.

/* 
 * Create a new file using the default properties.
 */
file = H5Fcreate (FILE, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

/* 
 * Modify dataset creation properties, i.e. enable chunking
 */
cparms = H5Pcreate (H5P_DATASET_CREATE);
status = H5Pset_chunk (cparms, 1, dims);

/*
 * Create dataspace.  Setting maximum size to NULL sets the maximum
 * size to be the current size.
 */
filespace = H5Screate_simple (1, dims, NULL);

memspace = H5Screate_simple (1, dims, NULL);


/*
 * Create file and memory datatypes. 
 */
filetype = H5Tvlen_create(H5T_NATIVE_UCHAR);

memtype = H5Tvlen_create(H5T_NATIVE_UCHAR);


/*
 * Create the dataset
 */
dset = H5Dcreate (file, "DataSet1", filetype, filespace, H5P_DEFAULT, cparms, H5P_DEFAULT);

Till here, we have done with the creating file and dataset ready to write data into it.

Now, next step was to write variable length data in thread/loop. This was actually a interesting part for me as I wanted to see how the dimensions of dataset changes at runtime. After reading out more I understood that there is one more attribute which I need to consider and is HyperSlab.

So, my 'Write' Function was pretty simple:

extdSize[0] = recordNumber + 1;
/* 
 * Extend Dataset 
 */
status = H5Dextend(dset,extdSize);

/* 
 * Define memory space 
 */
memspace = H5Screate_simple (1, dims, NULL);

and finally the most awaited writing:
offset[0] = recordNumber;
offset[1] = 0;

/* 
 * Select a hyperslab  
 */
filespace = H5Dget_space (dset);

status = H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL,
    dims, NULL);

/* 
 * Write the variable-length data to the dataset.
 */
status = H5Dwrite (dset, memtype, memspace, filespace, H5P_DEFAULT, data);