Working with JSON Data from SQL

The MongoDB support was introduced in 12.10xC2, bringing many cool things to Informix, one of them being JSON and BSON data types. Putting all the NoSQL and MongoDB story aside, these new data types enable us to work with semi-structured data directly from SQL, thanks to several new built-in functions. Of course, you could do the same with XML documents, but it took a while before all the necessary functions became available in Informix, and working with XML is still more complex than working with JSON, because of the difference of those two data formats.

In order to put the data in a JSON column you can use genBSON function, or simply cast a text to a JSON type. Here’s an example – a tourist office database table storing various places people could visit. One table with JSON data could be used to store data of many different places – cities, regions, islands, landmarks etc. So the table could be defined as:

CREATE TABLE places (
  place_id SERIAL,
  numberOfVisitorsPerYear INT,
  place BSON
);

The place column could also be of JSON type, but if you want to perform more meaningful queries on a table, stick to BSON. There are some BSON functions we can use, and BSON can be cast to JSON data.

Rows could be inserted via plain SQL:

INSERT INTO places VALUES (0, 500000, '{city: "Zagreb"}'::JSON);

Note that last value needs to be cast to JSON in order to be able to run queries on it with the bson_value functions. Here are some other data rows with various attributes describing places:

INSERT INTO places VALUES (0, 600000, '{city: "Pula", country: "Croatia", population: 57000}'::JSON);
INSERT INTO places VALUES (0, 20000, '{mountain: "Velebit", country: "Croatia", height: 1757}'::JSON);
INSERT INTO places VALUES (0, 1000000, '{national_park: "Plitvice", country: "Croatia"}'::JSON);

Simplest way to find out what is stored in a table is to execute query like this one:

SELECT place_id, numberOfVisitorsPerYear, place::JSON
FROM places

which will return these results:

place_id numberOfVisitorsPerYear (expression)
1 500000 {“city”:”Zagreb”}
2 600000 {“city”:”Pula”,”country”:”Croatia”,”population”:57000}
3 20000 {“mountain”:”Velebit”,”country”:”Croatia”,”height”:1757}
4 1000000 {“national_park”:”Plitvice”,”country”:”Croatia”}

However, the idea is to be able to search within JSON data. For that purpose, there are some new functions we can use:

  • bson_value_int(column, bson_attribute) – returns an integer value of the bson_attribute stored in a specified column of the row
  • bson_value_bigint(column, bson_attribute) – returns a bigint value of the bson_attribute stored in a specified column of the row
  • bson_value_double(column, bson_attribute) – returns an double value of the bson_attribute stored in a specified column of the row
  • bson_value_lvarchar(column, bson_attribute) – returns an lvarchar value of the bson_attribute stored in a specified column of the row

Here are some query examples:

-- find all destinations in Croatia 
SELECT *, place::JSON FROM places 
WHERE bson_value_lvarchar (place, 'country') = 'Croatia'; 

-- find all destinations without specified county 
SELECT *, place::JSON FROM places 
WHERE bson_value_lvarchar (place, 'country') IS NULL; 

-- find all mountains higher than 1000 meters 
SELECT *, place::JSON FROM places 
WHERE bson_value_lvarchar (place, 'mountain') IS NOT NULL 
AND bson_value_int (place, 'height') > 1000; 

-- find all national parks in Croatia 
SELECT *, place::JSON FROM places 
WHERE bson_value_lvarchar (place, 'country') = 'Croatia' 
AND bson_value_lvarchar (place, 'national_park') IS NOT NULL; 

There is another new function, genBSON, which generates BSON/JSON data from a relational table’s data, depending on a query. It can be used to return JSON directly from a query or to insert a JSON data in a column. The Informix Knowledge Center support page for this function is informative, along with some examples, so I’m not going to repeat it all here. As a continuation to our example, if a tourist office already has a relational table named cities in its database, then this data could be imported in a places table with a single SQL statement:

-- cities relational table 
CREATE TABLE cities (
  city_id SERIAL, 
  city CHAR(30), 
  population INT, 
  country CHAR(30), 
  numberOfVisitorsPerYear INT
); 

-- copy the cities data into places table: 
INSERT INTO places 
SELECT 0, numberOfVisitorsPerYear, genBSON(ROW(city, country, population), 0, 1)::JSON
FROM cities; 

Or, if we don’t want to have a copy of cities in places, a view on structured and semi-structured data could be made (this one returning only JSON data):

CREATE VIEW places_and_cities (place) AS 
  SELECT place::JSON FROM places 
  UNION 
  SELECT genBSON(ROW(city, country, population), 0, 1)::JSON FROM cities; 

In conclusion, with JSON capabilities at hand, it’s now pretty simple to mix structured and semi-structured data in a single database. But before we do it, we should make sure there is a proper need to design our data model that way, bearing in mind there are numerous advantages of having relational data in a relational database.

, , ,

Leave a comment

IBM Bluemix is generally available

I previously mentioned IBM Bluemix as a cool new thing that was open for beta testing, and as of today it is generally available. Bluemix is powerful development platform in a cloud, with many services already included, some IBM’s, some open source, some 3rd party. Some of the services could be used for free, some have a monthly payment plan.

One of these services is our favorite database, Informix, and can be found under the cryptic name of Time Series Database. It can be used as a standard Informix database, there are no limits to use only TimeSeries data.

There is also a one-month free trial, and I encourage all the developers, DBAs, project managers, decision makers to have a look. Find Bluemix at www.bluemix.net and the news of GA at the Bluemix blog here.

,

Leave a comment

A Short Recap of the Informix >> 2014 Conference

As previously announced, our Informix >> 2014 (Fast Forward your Data) Conference took place in Zagreb, Croatia on May 22nd, with some great presenters and topics. Jerry Keesee gave a great overview of the current state of Informix technologies, Informix road map, IBM’s software portfolio and Informix’s role in it. Stuart Litel explained the work IIUG is doing for all the Informix people and the product. He also talked about the IIUG Board of Directors Award, promoting once again one of two this year’s winners, Adria IUG president, Hrvoje Zoković.
Jan Musil made two great live demo presentations, one about exploiting Genero to create a native mobile applications for both Android and iOS devices, and the other about using new mongodb capabilities in Informix, including sharding.
Frederick Ho presented the impressive current state of Informix Warehouse Accelerator, while Jean Georges Perrin showed the potential of employing the JSON capabilities in the existing information system. And finally, yours truly gave a talk about Internet of Things and its impact on our future.

On the downside, the event attendance was not as expected. However, we hope to get it back on track for the upcoming events.

,

Leave a comment

A Look Back at the 2014 IIUG Conference

The 2014 edition of annual International Informix Users Group conference was held last week in Miami, FL. Another great opportunity to hear the news from the world of Informix, discuss the future of Informix and information management in general with the Informix architects, developers and managers, and last, but certainly not the least, hang out with dear friends.

Looking at the session line up, clearly NoSQL integration has become a very interesting point in Informix’s life. Roughly 30% or more of the sessions covered up some piece of this new feature and possibilities of expansion. Seams like hybrid databases could get a life in this environment. Also, implementation of MongoDB API for Informix (JSON Wire Listener) enabled the connectivity of various frameworks and tools to Informix, which offers some interesting thoughts – like graph database support through TinkerPop API.

IoT (Internet of Things) is another new focal point for Informix database. Mainly because of its great embeddability and auto configuration features, Informix is a database of choice for many automation devices. Shaspa, the company producing home automation control devices, based on ARM processor, has been mentioned as an example more than a few times throughout the conference talks.

There is also a new logical organization in Informix being prepared for the next release. If a DBA handles many databases on a single Informix instance, this organization allows her to define each database as a tenant, confined in its own db space and other resources. This ensures the hardware limitations are not exceeded by a single database, yet enables easier administration of a server, allowing the DBA to have to administer a single Informix instance. This kind of instance is referred to as a multi-tenant server, and is now configurable in Informix as well.

IBM also has a nice new toy, currently in beta, called BlueMix. It is a cloud-based platform for building, managing and running applications of all types (web, mobile, big data…). BlueMix uses open standards, and offers various tools as-a-service, some of which are Informix-based. Definitely worth taking a good look at: www.bluemix.net.

There were plenty more talks on various subjects regarding Informix – total of about 90 sessions, hands-on labs and half-day tutorials. All presentations should be available to IIUG members via the IIUG member site.

As for the venue, the conference hotel was the best we’ve seen on IIUG conferences. Downtown Miami proved to be a very interesting place to hold a conference, and also at much closer reach to us coming from the other side of the Atlantic. Nevertheless, the conference attendance seems to be lower than the previous years, perhaps because of higher prices of the conference as well as the hotel.

And to conclude with some good news, IIUG Board of Directors is giving two IIUG Board of Directors Awards every year, one to an IBMer, one to a non-IBMer. Award winners this year are Ms Anita McKeithen from IBM, and the president of our local Adria IUG and a long-time Informix promoter and educator, Mr Hrvoje Zoković. Congratulations to both!

 

,

1 Comment

New Edition of the Informix >> Conference

Our local IUG group, Adria IUG (with members from countries near the Adriatic Sea: Croatia, Bosnia and Herzegovina, Serbia, Montenegro) organizes the one-day conference we call Informix >> (Fast Forward your Data). This year, forth edition of the conference will take place on May 22nd in Zagreb, Croatia. The complete list of speakers is yet to be finalized, but we’re already happy to announce the three number ones  in the Informix world that will come and share it all with our eager community. They are (in alphabetical order):

  • Jerry Keesee – IBM’s Worldwide Director of the Informix Business – the guy who knows the (Informix) future, even without the tea leaves!
  • Stuart Litel – The President of the International Informix Users Group – the ultimate president of every Informix user, should I say more?
  • Scott Pickett – The Biggest Informix Traveler of the world – the guy who brings Informix closer to the masses; the roadshow master; the Informix missionary!

Informix users and friends near Croatia, save the date and make sure you don’t miss this great opportunity to listen for some good talks, meet and get to know our guests! Looking forward to see you all there.

Edit: All the info about the conference in Croatian is now published here.  The invitation in English is here.

And this is the complete line-up for the event:

09:00 Registration & Coffee
09:45 Opening (Hrvoje Zoković, Adria IUG, President)
10:00 Keynote (Jerry Keesee, IBM, Worldwide Director – Informix)
11:00 All about IIUG (Stuart Litel, IIUG, President)
11:30 Coffee Break
12:00 Informix & Mobile technologies – Genero Mobile demo (Jan Musil, IBM, Level 2 Certified IT Specialist)
13:00 Driving fast Business Decisions through Extremely Fast Analytics (Fred Ho, IBM, Chief Technologist – Informix Warehouse)
14:00 Lunch
15:00 JSon in your engine – what does that mean? (Jean Georges Perrin, IIUG Board of Directors)
16:00 Informix & NoSQL integration – demo (Jan Musil, IBM, Level 2 Certified IT Specialist)
17:00 Internet of Things (Ognjen Orel, Adria IUG Board of Directors)
17:30 Informix Q&A

,

1 Comment

Informix SWAT

Our friend Eric Vercelletto has been busy lately – he constructed an Informix Special Worldwide Advanced Technicians (SWAT) site. This is a place where everyone in need of an Informix specialist of some kind (architect, DBA, developer, …) can register and get connected to an expert in these particular areas. Also, if you’re an Informix expert, make sure you visit this site and register yourself as one. Great idea, Eric 🙂

1 Comment

Reading the Logical Logs

Every now and then (hopefully not often), there is a need to browse through some logical logs and see what’s been going on with your system. In this post, I’ll cover the purpose and logical log basics and provide some info on how to read what’s inside the logs.

In the simplest words, logical logs provide the way to ensure the minimal loss of data in case the system comes to a halt. Every operation and every transaction that is executed is written in one or more logical logs. So if the system, for some reason, crashes, when it is re-inited, you’ll see in the online log that the logical recovery is performed. That means the system is going through the logical logs since the last checkpoint, and re-executing the operations that were executed till the system halted. Transactions that were started, but not completed, are rolled back.

Knowing what and when is written in logical logs provides the opportunity to explore the systems behavior in some cases. For example, it is the way to identify the busiest tables and indexes during some period of time, or perhaps to explore in detail what’s been going on during a single transaction in order to detect possible errors in end-user application while modifying the data.

The size and the number of logical in your system is determined by onconfig parameters LOGSIZE and LOGFILES, respectively. When all the logs are used (flag “U”, in the following list), the system reuses the first one, and so on, in endless cycle. The logs that are to be reused, should always be backed up (flag “B” in the following list), using onbar or ontape utilities. By setting the onconfig parameter LTAPEDEV to /dev/null you can discard the logical logs, but this should never be done in a production environment.

You can see the state of your logs by executing onstat -l. This is an example of the output:

address number flags uniqid begin size used %used
...
3b2a507e0 992 U-B---- 1066232 2:991053 1000 1000 100.00
3b2a50848 993 U-B---- 1066233 2:992053 1000 1000 100.00
3b2a508b0 994 U---C-L 1066234 2:993053 1000 190 19.00
3b2a50918 995 U-B---- 1064843 2:994053 1000 1000 100.00
3b2a50980 996 U-B---- 1064844 2:995053 1000 1000 100.00
...

Each logical log is identified by its uniqid (in the list above, logs have uniqids 1066232 – 1066234, and 1064843 – 1064844). The flag “C” indicates that this is the current log being written. The next log has a smaller uniqid than the current – it is the next one to be used (and overwritten in the current logical space). The log with the “L” flag is the log containing the last checkpoint.

The onlog utility is used to display the content (or part of it) of a logical logs. The main thing you have to know is what logical log(s) you want to see (ie. their uniqids). Apart of that, there are number of filters that could be used in order to reduce the output and make it more readable (as you’ll be able to see, there is lot of information being written in logs). The complete syntax of the onlog utility can be found here.

Filtering could be done by username,  partition number (often referred as partnum in Informix) and/or transaction id. Username obviously belongs to the user that executed some statements or commands resulting in data manipulation. Partition number identifies the table or index the operation is executed upon. The easiest way to find out partnums of tables or indices is to query the sysmaster:systabnames table. In some outputs it is shown as decimal, in other as hexadecimal number, so you may need to use the hex() function in your SQL. And finally, transaction id (denoted here as xid) is a number identifying the transaction – all log entries that belong to the same transaction have the same xid (starting with BEGIN entry and ending with COMMIT or ROLLBACK). But be careful about this one – it’s not unique in any way. Often you’ll see the same xid being used again for the next transaction of the same user for some time, and after that a new xid is assigned to this user, and the previous one my be assigned to some other user.

If you’re not backing up logical logs, using the onlog utility you’ll be able to display the contents of those logs currently on disk only, but if you have the logs on backup, you can also fetch and display those. If using ontape utility for logical log backup, backed up logs could be fetched with onlog utility as well. If backup is made with the onbar, then you need to use onbar -P (with the rest of the options same as if you’d use onlog) to fetch logs that are sitting in your backup.

Here’s an example of a logical log display (obtained by onlog -n):

addr len type    xid id link
18   172 HINSERT 155 1  3e7714    70013b 414d03 105
c4   320 HINSERT 155 0  18        60007c c64bc02 254
204  56  COMMIT  155 0  c4        11/30/2013 16:22:56

There are 3 entries shown above. There is the addr column, which is the address of the entry (hexadecimal) in the current log. Len is the length of the entry (decimal) in bytes. Next column is type – one of predefined identifiers, specifying the action executed. Xid column is transaction id. Id column mostly has the value 0, but can take some other values if the entry is linked to the previous log. Link is the address of previous entry that belongs to the same transaction. The rest of the data in each row is type-specific. The list of all entry types and corresponding additional data can be found here.

In a regular production environment, seams reasonable to expect most entries to be of these types:

  • BEGIN, COMMIT, ROLLBACK – transaction start/end
  • HINSERT, HDELETE, HUPDATE, HUPBEF, HUPAFT – insert, delete or update row
  • ADDITEM, DELITEM – add or remove item from index

Reading through the documentation for additional data for these types, we can see the partition number always involved, so it’s fairly easy to infer on which table/index is the action executed. But that’s about it. There is no information which data was written/deleted etc. In order to find that out, we need to look at the long output of onlog, obtained with onlog -l. This option is not documented, the documentation states: The listing is not intended for casual use. Nevertheless, there are some things you can find out of it. Here is an example of the onlog -l, for a single entry.

Say we have the table test_table with columns col1 of type INTEGER, and col2 of type CHAR(10), and execute this command:

insert into test_table values(123000123, 'dva');

This will be written in log as 3 entries – BEGIN, HINSERT and COMMIT, provided there is no primary or foreign keys, nor other indexes on that table (remember that, when a single SQL statement is executed, it is implicitly executed as a single operation transaction). Let’s take a look of HINSERT entry:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Each byte is represented by two hexadecimal numbers. The entire log entry is shown both in hexadecimal and also ASCII view, side by side, which makes it easier to identify the text strings. First part of each entry are the header bytes, and the rest is operation-type dependent.

Corresponding header data are marked in the following lists (some numbers in header are decimal). The same list is repeated several times, with different data pairs marked.

Entry lenght:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Entry id:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Entry type:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Entry xid:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Entry link:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Highlighted bytes below represent Informix internal clock:

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

Additional data for HINSERT are three numbers: partition number (hexadecimal), rowid (decimal), and row size in bytes (decimal). In this example, those are: 200324, 103, 14. All three are marked in the list below (bold, italic, underline matches):

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

14 bytes of data in this example refers to one integer (4 bytes) and one char field which is defined as char(10). Remember that, even though we’re inserting three-letter word, the rest of the char field is filled with spaces (ASCII code 20).

insert into test_table values(123000123, 'dva');

addr     len  type     xid      id link
21f050   80   HINSERT  158      0  21f018   200324   103    14 
         00000050 00000028 00000112 00000000 ...P...( ........ 
         00000000 00000000 0000009e 0021f018 ........ .....!.. 
         6b03f454 00200324 00200324 00000103 k..T. .$ . .$.... 
         000e0004 00000000 00000000 8002084b ........ .......K 
         0754d53b 64766120 20202020 20207369 .T.;dva        si

When observing update operations, note that both before- and after-image are written to the log, but only changed columns are written, not all. If the amount of data being written is small enough, than single HUPDATE entry is written in log, otherwise, two entries are written: HUPBEF and HUPAFT.

In conclusion, reading the logical logs can be very useful, and sometimes the only way to get some information about how your system works. But there are a lot of things one should be aware of before trying to read the log structure, most important being:

  • the structure of the tables and indices
  • how the data are stored (how many bytes are needed for each data type etc.)
  • what’s your system endianness.

Given these important notes, and the fact that the logical log structure is not documented, that means it could be (and very likely is) different not only from one version to another, but also from one platform to another. The examples above are taken at a standalone Solaris SPARC system, Informix 11.70.FC5.

, ,

3 Comments