Posts Tagged java

Visual aggregates

There is a number of built-in aggregates in Informix which are there for what seams like forever, and it didn’t change throughout many version now. If you’re using SQL, you surely use aggregates – COUNT, SUM, AVG, to name just a few. And if you want something more, there is a way to create your own aggregate. Only four functions need to be implemented, which can be done in SPL, Java or C. These functions are:

  • init – initialises the data structure needed to calculate the aggregate and states the data types aggregate can be used on, called once at the beginning of aggregate execution
  • iter – called once for each row in the dataset, adds the information brought by that row to current result value of the aggregate and passes it on to next iteration
  • combine – called if aggregate is running in parallel, to combine current results of two threads together
  • final – called at the end of aggregate calculation, returns the final result of the aggregate.

After that, one simple SQL statement creates an aggregate, for instance, if we want to create median aggregate, first we implement median_init, median_iter, median_combine and median_final functions and execute:

CREATE AGGREGATE median2 WITH
   (INIT = median_init, 
    ITER = median_iter,
    COMBINE = median_combine,
    FINAL = median_final);

and it is ready to be used is SQL:

SELECT employee_id, median (salary) FROM income GROUP BY 1;

There is a nice post from Andrew Ford on how to create aggregates, take a look at it: User Defined Aggregates.

The limitation of the aggregate in general, from a data research point of view, is that it returns a single value, which describes the selected dataset in a particular way. If we want to find out something about the dataset, what does it mean to get only one number that sums it – be it a count, an average (mathematical mean), median or anything else? Experienced statistician might get a grasp on data distribution having these numbers if a data set is relatively small, but for a bigger data set, one should export the data and analyse it with some other tool. Or, for instance, pull it into R. 

If an aggregate could return a picture, it would be possible to actually see the distribution of the data from within the SQL, before exploring it further and using some other tools. 

So the idea is to return the blob from the final function, which contains the picture of a histogram, for example. For that to work, one should use an SQL tool which is capable of interpreting the blob data as a picture and displaying it (like ifmx-sql-editor, available here). Since all the data needs to be collected, init, iter and combine functions should be used to collect it all and process it in the final function. Currently, it is not possible to create the picture in SPL, so some kind of external library, whether in C or in Java should be used. 

Here is an example on how to create a histogram aggregate using SPL functions and small Java class (all the code is available here!). In order to collect the data, a list data type is used. Each new row adds new value in the list and passes the list on to the next iteration. This is how the init, iter and combine functions look like:

CREATE FUNCTION histogram_init (dummy INT)
RETURNING LIST(INT NOT NULL);
  RETURN LIST{};
END FUNCTION;

CREATE FUNCTION histogram_iter
  (result LIST(INT NOT NULL), value INT)
  RETURNING LIST(INT NOT NULL);

  INSERT INTO TABLE (result) VALUES (value::INT)

  RETURN result;
END FUNCTION;

CREATE FUNCTION histogram_combine
(partial1 LIST(INT NOT NULL),
 partial2 LIST(INT NOT NULL))

RETURNING LIST(INT NOT NULL);

   INSERT INTO TABLE (partial1)
      SELECT * FROM TABLE(partial2);

   RETURN partial1;
END FUNCTION;

The Java class resides on the database server and is called from the final function via the SYSTEM statement. Created image is put on the server and collected with the fileToBlob function and returned to the user. 

Since one query can result in returning multiple images (if group by is used), the input data set name and the output image name both need to be pretty much unique and provided to the Java class which creates the image. To ensure uniqueness, a pseudo random number generator is used when creating file names. Here is the simplified version of the final function:

CREATE FUNCTION histogram_final (final LIST(INT NOT NULL)) 
RETURNING BLOB;

  DEFINE...

  LET rnd = sp_random();
  LET in_file = "in_" || rnd || ".txt";
  LET out_file = "out_" || rnd || ".png";

  FOREACH
    SELECT * INTO value FROM TABLE(final) ORDER BY 1

    SYSTEM "echo " || value || " >> " || in_file; 
  END FOREACH

  SYSTEM "sh Histogram " || in_file || out_file; 

  RETURN filetoBLOB("out_" || rnd || ".png", 'server');
END FUNCTION;

Java class should have a main method, accept two parameters, and create the given picture.  It can use the simple Graphics object to create and store the picture. This is the idea:

public class Histogram {


  public static void main(String[] args) {
    // read the data from the text file args[0]

    // calculate min, max, frequencies

    // set up image
    BufferedImage bImg = ...
    Graphics g = bImg.createGraphics();

    // draw histogram lines

    // draw min and max

    // write an image
    ImageIO.write(bImg, "png", new File(args[1]));
  } 
}

Here is the example of the call and a screenshot of the SQL editor:

Screen Shot 2018-10-06 at 16.36.33 copy

Simillary to this, other aggregates can be implemented as well. For instance box and whicker plot, violin plot, etc.

All the code mentioned here, as well as the median aggregate which is currently not implemented in informix, can be found at my GitHub (direct link: https://github.com/ognjenorel/ifmx-utils/tree/master/aggregates)     

Advertisements

, , , ,

Leave a comment

Using Groovy in Daily DBA Work

Groovy is a relatively new programming language. I like to think of it as Java on steroids. It is based on Java, so all of Java code is automatically Groovy code, but not vice versa. Groovy is also compiled to Java byte-code, so it can run on every JRE. Better yet, you can use GDK (Groovy Development Kit) which provides many very easy to use APIs – working with files, XML is lot easier than in Java.

Some of its main characteristics are:

  • it is real object oriented language,
  • it is a dynamic language,
  • it can be used for scripting.

This last thing means that you can write Groovy script file and execute it from the command line without the need to compile it first, and that’s what makes it appealing. You get to use real powerful language within simple script. Of course, lots of things you’d write in Groovy could be done using shell scripts and if you’re not a programming soul, you won’t find many usefulness in it. Nevertheless, some of my next posts will include ready-to-use Groovy scripts that you may find useful, so here are some hints on what you need to do in order to run Groovy on your PC or server.

First of all, you’ll need JRE (Java Runtime Environment), which is probably already installed on every desktop PC and most of servers. As you’re probably  going to run scripts from shell or command line, ensure that you have JAVA_HOME environment variable set. It should point to main Java install directory which contains bin, lib and other directories.

After that, you need to download Groovy binaries from Groovy download site. I suggest to download zipped binaries. This file is suitable for all operating systems, as libraries are all jar files, and binaries contain both windows batch files and Unix/Linux shell scripts. All you need to do after that is unzip the file wherever you find suitable.

Final step is adjusting your PATH environment variable. It should contain both Java binaries directory and Groovy binaries directory (e.g. /usr/java/bin and /usr/groovy/bin).

You can check if all is set well by executing the command “groovy” from the command line. It should output standard usage message, something like this:

[~]> groovy
error: neither -e or filename provided
usage: groovy [options] [args]
options:
  -a,--autosplit <splitPattern>    split lines using splitPattern (default '\s')
                                   using implicit 'split' variable
  -c,--encoding <charset>          specify the encoding of the files
  -D,--define <name=value>         define a system property
  -d,--debug                       debug mode will print out full stack traces
  -e <script>                      specify a command line script
  -h,--help                        usage information
  -i <extension>                   modify files in place; create backup if
                                   extension is given (e.g. '.bak')
  -l <port>                        listen on a port and process inbound lines
  -n                               process files line by line using implicit
                                   'line' variable
  -p                               process files line by line and print result
                                   (see also -n)
  -v,--version                     display the Groovy and JVM versions

So now we can start groovying. As a teaser, here’s a readable example: calculate bufwaits ratio based on onstat -p output (version 11.50xC9):

groovy -e "out='onstat -p'.execute().in.text.readLines(); \
pg=out[5].tokenize()[1].toLong(); \
bwr=out[5].tokenize()[6].toLong(); \
bwai=out[17].tokenize()[0].toLong(); \
println '\nBufwaits Ratio: ' + ((pg+bwr)>0 ? (bwai*100)/(pg+bwr) : 0)"

Although Groovy doesn’t require a semicolon at the end of the statement, it has to be used here to delimit the statements because this is actually one-liner. You could make a script out of it, bw_ratio.groovy, with the same contents:

out='onstat -p'.execute().in.text.readLines()
pg=out[5].tokenize()[1].toLong()
bwr=out[5].tokenize()[6].toLong()
bwai=out[17].tokenize()[0].toLong()
println '\nBufwaits Ratio: ' + ((pg+bwr)>0 ? (bwai*100)/(pg+bwr) : 0)

and execute it:

[~/]> groovy bw_ratio.groovy 

Bufwaits Ratio: 0.0602958416

In addition to Groovy home page with loads of examples and information, there are many sites dedicated to this powerful technology, just google for Groovy. There are some cool books about it, but I’d definitely recommend “Groovy in Action” authored by Dierk Konig.

, , ,

1 Comment