Big Data, fast data, predictive and prescriptive analytics-… they are here, and they are here to stay. The opportunities are evident, and the technology already available.
Making the right investment choices is a complex issue. Especially when there is already a BI solution in place. In that case: What is the best approach to embrace the promises of Big Data?
The good news is: Big Data and traditional BI can make a powerful couple and still keep their specific strengths. BI will remain delivering dashboards, reports and OLAP (Online Analytical Processing). However, the Enterprise Data Warehouse may also be leveraged as a very valuable input in analytics.
Big Data adds value through the addition of new sources, and by enabling predictive and prescriptive analysis. These may be used in the BI tasks as well.
The bad news is: There is not one best way to move. There are a number of alternatives to make a transition from traditional BI to a BI/Big Data blend. Which alternative suits you best is depending on a number of factors, like strategy, short term objectives, cost, available IT resources and operation.
This article tries to propose a number of alternatives to incorporate Big Data in an existing BI environment.
Differences between BI and Big Data
First let’s have a look at the difference between traditional BI and Big Data.
Traditional BI’s core is an Enterprise Data Warehouse (EDW). It generally is used to store historic transactional data in a structured (relational/dimensional) way. The data is inserted by an ETL (Extract, Transform, Load) process. It generally derives data from operational systems, and may use intermediate data stores (“Staging databases”).
Besides the EDW an ODS (Operational Data Store) may be used to do realtime reporting. And from the EDW Data Marts may be generated to contain a domain specific subset of denormalized data that allows better analysis.
Reporting and Dashboards are all about gaining insights in what has happened or what is happening now: ‘Descriptive Analytics’. OLAP may be used to ‘drill down’ and ‘slice and dice’ through data to pinpoint root causes (e.g. to find out what product is responsible for a decline in operational results). This is addressed to as ‘Diagnostic Analytics’.
BI is run generally within the physical borders and firewall of the enterprise (‘on-premise’). Limitations are the storage-capabilities and processing power of the hardware.
Big Data & Analytics
BI mainly focuses on Descriptive and Diagnostic analysis on comprehendible amounts of structured data. Big Data can (virtually without limits) use any kinds of data, like for instance website-content, mail, documents, twitter-feeds and internet-of-things data. Analytics may leverage big data not only observe the past; Machine learning algorithms may be applied to make educated predictions about the future (predictive analytics). Or even to suggest us what we should best do based on the available data (prescriptive analytics).
Big Data provides new features and takes away a number of the limitations that BI normally copes with. First of all, Big Data technologies (obviously) eliminate restrictions in data size that can be processed. This is mainly achieved by utilizing cloud and cluster based technologies that offer elasticity in storage and processing resources and a low step-in threshold.
Further it opens the possibility to utilize all kinds of data: not just structured but also semi-structured (e.g. XML) and unstructured data (e.g. text and images)
Disruptive technologies as cloud, the ever growing open source community and decreasing prices for storage and computational power have made Big Data available for virtually any organization.
The typical Big Data architecture functions look like this:
The main capabilities of a Big Data solution are
- Storage: To store and retrieve (large amounts of different kinds of) data
- Processing: To clean, transform and enrich the data
- Transfer: To move the data in, out and over the platform
- Analytics: To perform analysis on the data – in batch or realtime
Extending BI with Big Data
Big Data is largely complementary to existing BI / EDW solutions. There are a number of alternatives to blend your BI solution with Big Data. Below we will discuss four options:
- Big Data and BI in parallel
- Big Data platform as data collector
- Data virtualization
- Dedicated BI + Big Data appliance
Big Data and BI in parallel
This solution adds a pretty isolated big data solution to the landscape next to the EDW. An important component in the solution is the ‘Interconnect’. This is a two-way connector that transports data between EDW and Big Data platform, and performs all necessary transformations. (It can be seen as a kind of dedicated high speed Service Bus).
This solution allows to enable all client applications (analytics and dashboard/reporting) to use EDW and big data platform. The solution is an extension to the current EDW. Therefore, it has relative little impact on the operation when implementing it.
Being decoupled by the Interconnect the EDW and Big Data platform remain independent. This means they can evolve separately. That makes the solution flexible and highly scalable.
The interconnect is the crux in this architecture. Complicated data processing (mostly transformations – like from NoSQL to relational format) can make it complex. And when ingesting big loads it may become a bottleneck.
Big Data platform as data collector
In the second option the big data platform is used to collect all data from all sources. This includes the transaction systems that were traditionally disclosed by ETL and directly fed into the EDW. After collection by the Big Data platform all data (even unstructured data) will be transformed into structured data and fed into the EDW.
The architecture is simple, highly scalable and easy to maintain because data only flows in one way (like ETL in traditional BI). The Big Data platforms acts as a one-stop-shop solution for enrichment of operational data with Big Data insights.
The solution however will require a major change in the (ETL) process to feed the data into the EDW.
Data virtualization adds a Big Data platform next to the EDW, and decouples both EDW and BD platform from the data consumers by a service layer. This virtualization hides the details where and how data is stored. Consumers only need to know the service interface in order to utilize the data The services are standardized and based upon semantic technologies, in general in terms of the business.
All consumers have instant access without any additional interconnect tool. Data redundancy is prevented and non-relational data is made accessible through standardized services.
The loose coupling separates data producers from consumers, hence the implementation of services may be altered without modifying the consumers.
However: data virtualization software may be complex and expensive and may become a bottleneck.
Further setting up a data virtualization implementation can be costly. It requires a very good Common Data Model (and hence data management function) in order to be able to setup the right service interfaces in advance.
Dedicated appliance for BI + Big Data
A BI + Big data appliance is a dedicated environment from usually one supplier offering one total solution including relational and a non-relational storage, processing, internal interconnect and operating system.
It is not really an extension to BI as the solution will replace the current BI landscape.
It is one single system (one stop shop) with relative low system maintenance. It allows for real-time analytics.
Though operation has many advantages, it is a very costly solution, as well is the implementation. Especially as it replaces your current BI environment. Together with a risk for a vendor-lock-in a sound business case is very eminent.
There is more than one way to benefit from Big Data together with your BI needs. We discussed four and there may be several solutions that combine aspects from them. All options come with their pro-s and cons and should be carefully analyzed for cost and impact and a proper business case is essential for its success.