In-Memory : is it all about speed?

In-Memory Computing means for sure getting data faster probably very fast. But is this the only expectation you would have? Ok it's cool not to wait too long for report or other decision related queries but is it all about speed?

Traditional RDBMS have been good for operational business, but lacked if you wanted to analyze data dependent on different criterias. Just the data model was not optimized for that and so the rise of cubes and data marts and data warehouses begun and every larger system, in need of extended reporting, had two data stages  - the operational one and the transformed one.

Now the thing with in-Memory databases like SAP HANA ist due to speed no transformation optimization is  needed. Cubes and data marts can be created logically as model without materialization created on runtime in memory.

With this in mind your operational applciation is able to retrieve data analysis you expected only from data warhouse infra structure.

This means less cost due to reduced infrastructure and avoiding of transformation errors.

So it's not all about speed, primarily yes but more important to the user is the deep fast access to the systems data and so a wider view on the systems data status.

After all this praise I see some weak points/dangers of the new concept too. let me explain you ...

it remembers me when images were taken with analog cameras,

a lot more time was invested to set the right aperture, right timing right light and so on, and the you waited till your pictures have been processed. Today you just click - you see immediately the result, you make thousands of pictures of one scene and one will finally fit and even you fail - your post processing will fix it!
Similar when you created a data mart you did a lot of analyis how to get the data consolidated and with business proof and value out of the system. Generating the query directly on operational data could lead to less quality in the answer the query really gives you. So that time is still needed for creating high value data reports.

Now take a look on that video which shows S/4HANA a SAP HANA optimized ERP solution and how In-memory computing may change the way we work with the data.

Live Demo of Simple Finance and Simple Logistics in SAP S/4 HANA

Have fun!




SAP HANA SPS 09 - Dynamic Tiering - pure In-Memory goes disk based

SAP introduced the SPS 09 Service Pack for SAP HANA; These SPS's (SUPPORT PACK STACK) are more then minor release levels with error fixes but with introduction of new functionality sometimes mighty features - they are similar to the Oracle patch sets and clearly show how vibrant the evolution of SAP HANA really is.

Among a lot of other new features they introduced a feature called SAP HANA Dynamic Tiering. Hm ... to be honest I did not have any idea what it should be about when I heared about it the first time. But ... Hey!!!!
The stuff is about Data Aging, hot and warm data, data which should be in-memory and data which could reside on disk. but now please sit down.

With that feature SAP did not less then integrate a traditional Sybase IQ column store database engine buffer cached into SAP HANA for archival of warm data.

So this is really funny and a tremendous step indeed! because Oracle did it vise versa; Integrating an in-memory column store into it's traditional database engine.

And SAP did a lot of engineering to integrate the second engine so that from outside both engines appear as one. Just an extended table definition is needed - something like another table type  - in SAP HANA  dictionary points to the other database engine and processing.

That is a tremendous step because SAP HANA now is capable to store tons of historic data on disk almost without a limit. Which now adds another feature to compete with Oracle.

Future releases will improve seamless inegration of both engines, also do have one backup tool even to do Crash-recovery.

SAP HANA is on the run to bypass ...

It's simply amazing how SAP HANA develops and the Dynamic Tiering stuff besides other cool features like tenant-databases which Oracle introduced in 12c - positions SAP HANA against Oracle even better and better




Learning Basic SQL with SAP HANA

This is a video series of learning basic SQL with SAP HANA;

Not only it's a cute introduction to basic SQL but also an Intro how to work with SAP HANA Studio.

Basic SAP HANA SQL Course




OpenHpi : Course In-Memory Data Management

Do you want to learn about concepts and mechanics of In-Memory Data management ?

The Hasso Plattner Institute in Potsdam/Germany provides a Learning Platform with online courses.

The courses are free - lasting about 6 weeks and end with a final test. You also have a mini-exam per week; And these courses and material are completely free ...

Today, the 26th of August the In-Memory Data management course starts.

Here is the link to it : https://openhpi.de/course/imdb2013?locale=de


Have fun and take time to dig deeper into how In-Memory database works.





What is the difference between SAP HANA and a traditional RDBMS like Oracle?

Yeshua Ben Elohim : Don't mix the old wine with the new wine or put the new wine in the old bottles
- it will break!

Sometimes it's confusing and surprising reading blogs about in-memory database computing and the understanding of it; Right at the beginning of this Blog I must claim : If processing of all data of an traditional RDBMS would be done in-memory (all data in place) you are still not having an in-memory database at all!

Now did I take a too deep look into the wine bottle?
definitely No!

How traditionally RDBMS work;

As in the introduction of traditional RDBMS memory was very very expensive and compared to that disk space was a lot cheaper so the disk based RDBMS was born. As I look back from today - this was always meant as to be an intermediate solution.

Understanding the Buffer Cache and it's nature

Even though data of traditional RDBMS is disk based - the only place to operate on that data is in the CPU-registers, so the buffer cache is needed to bring a subset of the data nearer to the CPU without the need of I/O on every block access;

The buffer cache itself is nothing else then a small, virtual and logical memory window of the complete disk based data. Data blocks which will read into the buffer cache and are replacing other cached data blocks (already flushed ones),  blocks changed in the buffer cache will written down to the disk on checkpoint
and contiguously changed committed data will be logged as a byte stream by the log writer.

Because the buffer cache is a virtual window on the file block oriented data there is no capability of direct memory access to the data or more precise to a specific row of a table once loaded into the buffer cache. You need to organize a lot of lists, semaphores and memory address translation stuff to get a specific row from the buffer cache, because the unique identifier of a row the rowid. The rowid is not a memory based construct but a file based one - it contains no direct info where a specific row is located in the buffer cache - the rows starting address in the memory. A lot of CPU-cyles are needed to translate this virtual file cache nature into the a memory addressable one.

Back to the intro, if you would resize the buffer cache to hold the complete data in the cache
you still have all these virtual file based mechanisms; No direct memory-access to a row - you deal still with  a disk based behaving RDBMS; This is not in-memory databasing!
Do not mix up old stuff with new one.

Real In-Memory databasing - SAP HANA

Now as CPU and Memory has increased with it's capacity/capabilities with stellar growth even a larger amount of data could be hold directly completely in-memory.

Hence on startup of a SAP HANA database all data is loaded into memory - then there is no need to check anymore if a data is already in memory or a read from disk is necessary. The data due to column stores (vertical colum wise storage, mean values of one attribute are stored sequential in memory)  is CPU-aligned;
no virtual expensive calculation of LRU, logical block addresses ... but direct (pointer) addressing of data.

Additionally with SAP HANA the data is dictionary compressed means the table itself is modelled as a micro starschema, tables data contains only integers (CPU -friendly and compact) or bitmaps as data
referencing the dictionary maintained values of the column and even more the usage of native advanced features of the CPU for example SIMD (Single instruction, multiple data) is supported.

The main database storage now is the RAM instead of the disks;

with this in mind an SAP HANA is able too be multitudes faster compared to traditional RDBMS
even the data on the old style RDBMS would fit completely in the buffer cache.

In a real in-memory database you won't find any rowids anymore ;)