Next: FARFALLA Basics Up: Introduction Previous: Why DSTs?

Why FARFALLA?

There are lots of ways to make DSTs. Probably the simplest way is just to write a FORTRAN program with large common blocks and use unformatted FORTRAN READ and WRITE statements to read that information back and forth to the disk. This is the approach taken in the Muon Astronomy DST for example. When you do things this way you don't need any external packages like ZEBRA or FARFALLA to help you. Although this technique is simple conceptually, in practice it also has a lot of drawbacks for complex data sets.

Why do we say this? If the DST contains variable-length data(sometimes there are 50 ST hits; sometimes there are 100) and conditional I/O(``Only write the full scintillator waveform information if there has been a Caltech Slow monopole trigger with two faces, or if there has been a LIP trigger.'') the unformatted WRITE statement becomes extremely complicated. It becomes difficult to read, write and maintain. Furthermore, as the format of the WRITE statement evolves over time, it becomes difficult for the application programmers to keep their READ statements synchronized with the WRITE statement used for producing a particular file. If the READ statement does not agree perfectly with the WRITE statement that wrote the file horrible things can happen.

Finally, when unformatted FORTRAN WRITE statements are used, the data files produced are probably not transportable among different machine types.

It was for this reason and others that packages like ZEBRA were written. It is for these same reasons that we actually use ZEBRA in MACRO for our raw data. Now the programmer can have complicated conditional structures in memory. The programmer can write them out to the disk with a single subroutine call without the need for a complicated conditional WRITE statement. The data on disk is self-describing so it can be read back in and the complicated structure will be exactly recreated in memory. This is true even if the the program doing the reading is a newer version with more features than the the program that did the writing. Also with ZEBRA it is possible to write the data in a machine independent way.

Once we have our data in independent banks and we can read and write arbitrary structures of these banks back and forth to disk a number of advantages suddenly appear. Perhaps most importantly, routines can add and delete banks from the structure. This modularizes tasks. If you want to create a track matching routine you can just assume that the track banks are already there, and match them independently of the core tracking code itself, adding banks with the matching information to the structure. This means that changes to the tracking code won't change your code(unless of course the format of the track banks are changed).

However, even ZEBRA has many shortcomings. Initializing ZEBRA, defining new bank types, and putting them into a structure is far from a simple programming task. Doing it properly can be tricky. In addition, once things are set up, navigating through a structure is not particularly easy. First of all to find a bank you need to know a hard coded location relative to a parent, and even once you find the bank to get a particular piece of data you need to know the data type of the word and the absolute location of that word in an array. So if what you want is an ERP ADC(an integer) you first need to find the bank and then once you have its address know which word it is at. Getting the ADC and some more data might look like this:



IERPLINK    = LQ(PARENTADRESS - 5)
IADC0       = IQ(ERPLINK + 10)
ENERGY      = Q(ERPLINK + 24)
DOUBLEDATA  = DQ(ERPLINK/2 +13)

The numbers like 5 and 10 are hard coded into the program. You either have to look them up or have files which define parameters so you don't need to remember them.

Finally one of the biggest problems with ZEBRA is that it requires large amounts of overhead data to keep track of its structure. Each bank in ZEBRA requires at least 10 words of data describing the structure. This data must be written out to disk. So if you have banks of data that are 10 words long you need to write out twice the amount of real data. In addition there are extra words of overhead(called pilot records) that ZEBRA requires when the structure is output. Remember one of the points of DSTs is to save space!

Now most of these problems weren't the fault of the programmers who wrote ZEBRA. When ZEBRA was written the only tool high energy physicists had available to them for programming was FORTRAN. Because of this, the code to handle complex data structures wound up taking thousands of lines. FORTRAN just wasn't designed to handle this sort of thing. However, in the 1990's, we have new programming languages and tools available to us that are beginning to be used in the HEP community. These languages have many of the important features of ZEBRA built into the language already! If we start to use these tools we can overcome many of the programming difficulties imposed on us by out-dated code.

At Caltech we have developed a new data structure/input-output package called FARFALLA. We wrote this package in C++ to address many of the problems that we pointed out earlier. Since we wrote the package in C++ we were able to use some ideas of object-oriented programming that are unfamiliar to many particle physicists. However most of this is hidden to the user and we believe that once the user learns the basics of the FARFALLA package it is somewhat easier compared to ZEBRA to navigate the data structure in memory and it is immensely easier to design new data banks and put them into the structure. Any bank can have an arbitrary number of children and the data can be written to disk with just 2 words of overhead regardless of the number of children.

In FARFALLA there are actually data types that describe our MACRO data. So instead of the piece of ZEBRA code we wrote earlier, getting an ERP ADC in FARFALLA looks like this:



F_getchild(eventBank,erpBank,i);  //i is which erp box hit you want(1,2 etc.)
adc    = erpBank->adc0;
energy = erpBank->energy;

Don't worry if this seems a little strange to you. We will go through lots of examples later to show you how to actually use FARFALLA. A programmer who just wants to read existing FARFALLA DSTs needs to learn some simple C++ programming and learn the few commands of FARFALLA that he or she needs. A more advanced user (or someone who wants to actually create new custom DSTs) needs to learn some more C programming and a few more FARFALLA commands. However when we wrote this package we spent a lot of time thinking how we could make it easier for the user to use. So if you want a flexible data structure which is easy to modify and use to create custom DSTs for your group FARFALLA is for you!

We have written FARFALLA using a free publicly available C++ compiler called g++ which is available from the Free Software Foundation for almost all UNIX platforms.

We have written the following guides to help you use FARFALLA:



Next: FARFALLA Basics Up: Introduction Previous: Why DSTs?


walter@
Wed Aug 10 11:42:05 PDT 1994