Entries in In-Memory Database (3)


OpenHpi : Course In-Memory Data Management

Do you want to learn about concepts and mechanics of In-Memory Data management ?

The Hasso Plattner Institute in Potsdam/Germany provides a Learning Platform with online courses.

The courses are free - lasting about 6 weeks and end with a final test. You also have a mini-exam per week; And these courses and material are completely free ...

Today, the 26th of August the In-Memory Data management course starts.

Here is the link to it : https://openhpi.de/course/imdb2013?locale=de


Have fun and take time to dig deeper into how In-Memory database works.





What is the difference between SAP HANA and a traditional RDBMS like Oracle?

Yeshua Ben Elohim : Don't mix the old wine with the new wine or put the new wine in the old bottles
- it will break!

Sometimes it's confusing and surprising reading blogs about in-memory database computing and the understanding of it; Right at the beginning of this Blog I must claim : If processing of all data of an traditional RDBMS would be done in-memory (all data in place) you are still not having an in-memory database at all!

Now did I take a too deep look into the wine bottle?
definitely No!

How traditionally RDBMS work;

As in the introduction of traditional RDBMS memory was very very expensive and compared to that disk space was a lot cheaper so the disk based RDBMS was born. As I look back from today - this was always meant as to be an intermediate solution.

Understanding the Buffer Cache and it's nature

Even though data of traditional RDBMS is disk based - the only place to operate on that data is in the CPU-registers, so the buffer cache is needed to bring a subset of the data nearer to the CPU without the need of I/O on every block access;

The buffer cache itself is nothing else then a small, virtual and logical memory window of the complete disk based data. Data blocks which will read into the buffer cache and are replacing other cached data blocks (already flushed ones),  blocks changed in the buffer cache will written down to the disk on checkpoint
and contiguously changed committed data will be logged as a byte stream by the log writer.

Because the buffer cache is a virtual window on the file block oriented data there is no capability of direct memory access to the data or more precise to a specific row of a table once loaded into the buffer cache. You need to organize a lot of lists, semaphores and memory address translation stuff to get a specific row from the buffer cache, because the unique identifier of a row the rowid. The rowid is not a memory based construct but a file based one - it contains no direct info where a specific row is located in the buffer cache - the rows starting address in the memory. A lot of CPU-cyles are needed to translate this virtual file cache nature into the a memory addressable one.

Back to the intro, if you would resize the buffer cache to hold the complete data in the cache
you still have all these virtual file based mechanisms; No direct memory-access to a row - you deal still with  a disk based behaving RDBMS; This is not in-memory databasing!
Do not mix up old stuff with new one.

Real In-Memory databasing - SAP HANA

Now as CPU and Memory has increased with it's capacity/capabilities with stellar growth even a larger amount of data could be hold directly completely in-memory.

Hence on startup of a SAP HANA database all data is loaded into memory - then there is no need to check anymore if a data is already in memory or a read from disk is necessary. The data due to column stores (vertical colum wise storage, mean values of one attribute are stored sequential in memory)  is CPU-aligned;
no virtual expensive calculation of LRU, logical block addresses ... but direct (pointer) addressing of data.

Additionally with SAP HANA the data is dictionary compressed means the table itself is modelled as a micro starschema, tables data contains only integers (CPU -friendly and compact) or bitmaps as data
referencing the dictionary maintained values of the column and even more the usage of native advanced features of the CPU for example SIMD (Single instruction, multiple data) is supported.

The main database storage now is the RAM instead of the disks;

with this in mind an SAP HANA is able too be multitudes faster compared to traditional RDBMS
even the data on the old style RDBMS would fit completely in the buffer cache.

In a real in-memory database you won't find any rowids anymore ;)



SAP HANA - No 'The database is the bottleneck' anymore

SAP HANA is an In-Memory Database developed by SAP based on different SAP technologies like MAXDB, ...; 
The database is designed for easy Scale out and Scale Up - almost with linear thru-output and is capable completely to replace current disk based RDBMS like Oracle, DB2, Microsoft SQL Server or Sybase;
Not only that all operations - run on data already placed in memory but due to it's column-store organization and dictionary based compression the data itself is memory optimized/aligned.
The time where everything was tried in the application layer to bypass slow database access is gone. The time the database had a bad reputation in the development-circles specially in the java and .Net community  is gone. To get the best out of the database database-centric development will become important again.
Due to it's speed no cubes must be build or expensive pre-aggregation has to take place - no no - A new star is born.