Visual aggregates

There is a number of built-in aggregates in Informix which are there for what seams like forever, and it didn’t change throughout many version now. If you’re using SQL, you surely use aggregates – COUNT, SUM, AVG, to name just a few. And if you want something more, there is a way to create your own aggregate. Only four functions need to be implemented, which can be done in SPL, Java or C. These functions are:

  • init – initialises the data structure needed to calculate the aggregate and states the data types aggregate can be used on, called once at the beginning of aggregate execution
  • iter – called once for each row in the dataset, adds the information brought by that row to current result value of the aggregate and passes it on to next iteration
  • combine – called if aggregate is running in parallel, to combine current results of two threads together
  • final – called at the end of aggregate calculation, returns the final result of the aggregate.

After that, one simple SQL statement creates an aggregate, for instance, if we want to create median aggregate, first we implement median_init, median_iter, median_combine and median_final functions and execute:

CREATE AGGREGATE median2 WITH
   (INIT = median_init, 
    ITER = median_iter,
    COMBINE = median_combine,
    FINAL = median_final);

and it is ready to be used is SQL:

SELECT employee_id, median (salary) FROM income GROUP BY 1;

There is a nice post from Andrew Ford on how to create aggregates, take a look at it: User Defined Aggregates.

The limitation of the aggregate in general, from a data research point of view, is that it returns a single value, which describes the selected dataset in a particular way. If we want to find out something about the dataset, what does it mean to get only one number that sums it – be it a count, an average (mathematical mean), median or anything else? Experienced statistician might get a grasp on data distribution having these numbers if a data set is relatively small, but for a bigger data set, one should export the data and analyse it with some other tool. Or, for instance, pull it into R. 

If an aggregate could return a picture, it would be possible to actually see the distribution of the data from within the SQL, before exploring it further and using some other tools. 

So the idea is to return the blob from the final function, which contains the picture of a histogram, for example. For that to work, one should use an SQL tool which is capable of interpreting the blob data as a picture and displaying it (like ifmx-sql-editor, available here). Since all the data needs to be collected, init, iter and combine functions should be used to collect it all and process it in the final function. Currently, it is not possible to create the picture in SPL, so some kind of external library, whether in C or in Java should be used. 

Here is an example on how to create a histogram aggregate using SPL functions and small Java class (all the code is available here!). In order to collect the data, a list data type is used. Each new row adds new value in the list and passes the list on to the next iteration. This is how the init, iter and combine functions look like:

CREATE FUNCTION histogram_init (dummy INT)
RETURNING LIST(INT NOT NULL);
  RETURN LIST{};
END FUNCTION;

CREATE FUNCTION histogram_iter
  (result LIST(INT NOT NULL), value INT)
  RETURNING LIST(INT NOT NULL);

  INSERT INTO TABLE (result) VALUES (value::INT)

  RETURN result;
END FUNCTION;

CREATE FUNCTION histogram_combine
(partial1 LIST(INT NOT NULL),
 partial2 LIST(INT NOT NULL))

RETURNING LIST(INT NOT NULL);

   INSERT INTO TABLE (partial1)
      SELECT * FROM TABLE(partial2);

   RETURN partial1;
END FUNCTION;

The Java class resides on the database server and is called from the final function via the SYSTEM statement. Created image is put on the server and collected with the fileToBlob function and returned to the user. 

Since one query can result in returning multiple images (if group by is used), the input data set name and the output image name both need to be pretty much unique and provided to the Java class which creates the image. To ensure uniqueness, a pseudo random number generator is used when creating file names. Here is the simplified version of the final function:

CREATE FUNCTION histogram_final (final LIST(INT NOT NULL)) 
RETURNING BLOB;

  DEFINE...

  LET rnd = sp_random();
  LET in_file = "in_" || rnd || ".txt";
  LET out_file = "out_" || rnd || ".png";

  FOREACH
    SELECT * INTO value FROM TABLE(final) ORDER BY 1

    SYSTEM "echo " || value || " >> " || in_file; 
  END FOREACH

  SYSTEM "sh Histogram " || in_file || out_file; 

  RETURN filetoBLOB("out_" || rnd || ".png", 'server');
END FUNCTION;

Java class should have a main method, accept two parameters, and create the given picture.  It can use the simple Graphics object to create and store the picture. This is the idea:

public class Histogram {


  public static void main(String[] args) {
    // read the data from the text file args[0]

    // calculate min, max, frequencies

    // set up image
    BufferedImage bImg = ...
    Graphics g = bImg.createGraphics();

    // draw histogram lines

    // draw min and max

    // write an image
    ImageIO.write(bImg, "png", new File(args[1]));
  } 
}

Here is the example of the call and a screenshot of the SQL editor:

Screen Shot 2018-10-06 at 16.36.33 copy

Simillary to this, other aggregates can be implemented as well. For instance box and whicker plot, violin plot, etc.

All the code mentioned here, as well as the median aggregate which is currently not implemented in informix, can be found at my GitHub (direct link: https://github.com/ognjenorel/ifmx-utils/tree/master/aggregates)     

Advertisements

, , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: