Creating Cubes in SSAS
It was the 1980s when the first relational database management system was created and data scientists across the globe began talking about tables and their relationships with one another. The word "table" has been described as a collection of related data held in a structural format within a database. Every piece of meaningful information has been treated as data. We should thank E.F Codd for introducing such a concept that revolutionized the industry. Flat-files and network file storages are no more. It was a history after 1980. If you are wondering why I am talking about this in this article, here is the answer! The total article revolves around data. From storing your personal information during account creation in Amazon (or any e-marketing or ecommerce website) to purchase and delivery, everything is data.
The way the world manipulated data during 1980s was different from the way it is manipulated now. If you are a .Net developer and are developing your applications for a large company handling millions or even trillions of records every day like mine or your application is manipulating data every day, this article will be an eye-opener. This article is a collection of many reference materials used when working in a real-world application.
Table of Contents
- Introduction
- What a Data Warehouse is and why it is used
- Single Version of truth
- Performance
- Simplicity
- Data Persistence
- Creating your first SSAS project
- Data Source
- Data Source Views
- Cubes
- Dimension
- Mining Structures
- Roles
- Assemblies
- Miscellaneous
- Data Source
- Data Source Views
- Dimensions
- Cubes
- What are Measures
- What are Measure Groups
- Conclusion
Introduction
As real-world product developers, our everyday activity involves connections to the database. Either it is the way user information is stored or we retrieve the information from the database. Most of the applications naturally do perform the four operations in common of insert, update, delete and select. We do these four operations from various providers, say SQL Server, Azure, MongoDB and so on. These are from various data sources. But to the end-user, this will be a simple experience when he/she is using this product (application). For example, when logging into Amazon, the authentication is verified by a web service connected to a data source from its servers located in Australia (just saying, for example). While you search products (mobiles, furniture or whatever) the data is retrieved from a data source located in a local server in the USA. We surf Amazon as a single product. If we have our product running in production, the number of users who do these preceding four operations from the first day of our launch until date will not be constant. As our product gains popularity, we gain more users. The solid reason to write this article is performance. The greater the number of users, the slower the performance.
Here are the ways elaborated on of how a Data Warehouse helps us to solve complex problems related to query processing. Again, If you are a .Net developer and developing your application for a large company handling millions or trillions of records every day (as does mine) or your application is manipulating data every day, this article will be an eye-opener.
What a Data Warehouse is and why it is used
A Data Warehouse is a database, as the name implies, but it is actually much more than that in terms of its functionality. First of all, data does not originate from a Data Warehouse. Instead, it originates from source systems, such as a Customer Relationship Management (CRM) system, HR System or Financial Management system and so on, in the form of SQL tables, CSV files and so on. The data from these source systems are brought to a Data Warehouse so that a Data Warehouse becomes a central repository with all of the companie's relevant information. That makes the question of why would we do such a thing? Why do we need to copy data when it already exists in the system and place it into the central repository? In order to answer that question, we need to talk about how decisions are made in organizations and what is the impact of a decision or a cumulative decision of an organization. We could agree that an organization's success is defined by the cumulative ability to make good decisions to have successful outcomes. How are we going to make the decision makers to make their decisions? We are not only talking about the decisions made by top executives but also the operational decisions to be taken in a day-to-day basis. In order to make decisions, we need to provide decision makers with proper information. We have our internal data such as data from the data sources and we may also need to have external information such as physical and financial (such as weather is physical and financial is stock market) information. For decision making, all of these must be brought together to provide meaningful information. This is where a Data Warehouse is important. It combines all the internal and external data from various data sources and keeps it in a single place called a warehouse. This helps decision makers compare the environments and make proper decisions. So, why can't we query those from each source separately? There are four solid reasons for that.
Single Version of truth
We may have user information in our database of financial system and weather information system. Querying each of them differently for user information adds redundancy and Data Warehouse solves this problem by means of having all the data in a single repository.
Performance
Querying millions of data from various data sources is much more complex and gains less performance than querying a single centralized data source.
Simplicity
Our normal database architecture suggests us to use normalized data and when the user wants information, it takes some time for us when we join user information split across tables or data sources. Data Warehouses avoids these problems.
Data Persistence
Organizations want to align their data in a way that sometimes is not the way our data sources align the data. For example, if the user or the decision-maker wants to update information about the user when the temperature is above 10 degrees then it is not persisted when we have different data sources. A Data Warehouse helps you avoid this problem.
A Data Warehouse is not a product or a technology but it is a concept. Let's build a Data Warehouse in this article.
SQL Server Analysis Services (SSAS) in MSBI
This will create a multidimensional database from a single dimensional database.
Had we installed Visual Studio, we would have Business Intelligence Development Studio (BIDS) that does all these activities.
Creating your first SSAS application
I have the SQL Server 2008 installed and the business intelligence suite of vs 2008 is suitable for me. Even though MSBI consists of three main parts, in this article we will have an introduction to what SSAS is.
Step 1
Open Visual Studio and click on the new project. There is an option named Business intelligence to select a BI related project on the left pane of the opened window. Click on the Analysis Services Project, name it and click OK.
Step 2
This project is a SSAS project and it will create a Data Warehouse for us. We can see eight folders and let us know what they are. These folders are the layers to create our SSAS Data Warehouse.
Data Sources
Contains a set of data sources from the Oracle or SQL Server databases. SSAS supports a set of data providers that can be chosen when adding data sources.
Data Source Views
A data source view contains a logical model of the schema that is used by analysis services multidimensional database objects. They can be built over one or more data sources, letting you define multidimensional and data mining objects that integrate data from multiple sources.
Cubes
Without creating it we cannot query a Data Warehouse. SQL Server stores data in the form of tables but in SSAS, the same happens in the form of cubes. We will explain this shortly.
Dimensions
Dimensions contain the necessary information to query the cubes. They contain information about the fact tables to contain measure values to query the cubes.
Mining Structures
This is the least-used function in MSBI nowadays. This feature was not devoloped much after SQL Server 2005. SSAS has included 9 data mining algorithms. We will see them shortly.
Roles
This helps you to create and manage roles for the specific Data Warehouse.
Assemblies
Assemblies let you create external, user-defined functions using any Common Language Runtime (CLR) language, such as Microsoft Visual Basic .NET or Microsoft Visual C#. You can also use Component Object Model (COM) Automation languages such as Microsoft Visual Basic or Microsoft Visual C++.
Miscellaneous
If you are going to add documents, images or any other things apart from all the preceding, you are welcome to add it here.
Let us discuss each one of them clearly.
Creating a Data Source
In an Analysis Services Multidimensional Model, a data source object represents a connection to the data source from which we will process the data. A multi-dimensional model should at least contain only one data source object.
Choosing a data provider
We can choose types of data providers, namely SQL Server data providers and the Oracle provider. For SQL Server data providers, we can use the SQL Server native client and for Oracle we can use some third-party data sources.
Set up credentials
We can set up credentials with either Windows Authentication or SQL Server Authentication as we use in our normal day-to-day applications. In servers it is advisable to use Windows Authentication.
Let us switch back to our application open in Visual Studio to create a data source.
Right-click on the datasource folder and choose new data source.
From the data source wizard, do the following options.
Click on the new button to create a new datasource from the SQL Server or Oracle db.
From the connection manager choose the type of provider, server name and you can set the authentication.
After providing the database information click OK and it takes you back to the Data Source Wizard with the connection added. Click next.
The next window asks for impersonation information. We can choose one of four types of account and I am choosing Inherit since it is the safest for this demo.
And now click Next and it takes you to set the data source name. You can see the connection string for verification in this window.
The data source is finally created!
We can have multiple data sources created the same way.
To edit all the information, just click on the data source you have created and click open. It allows you to edit all the information you have configured. The data source can be anything, like SQL Server, Oracle, access, DB2, Informatica, and so on.
Thus we have created the data source.
Creating a Data Source VIew
A Data Source View contains the metadata about the data source. A data source view can be used with cubes, dimensions and mining structures. This data source view is stored in an XML format. They can be built over one or more data sources or meta data. A data source view may contain relationships, primary keys, object names and calculated columns. This cannot be queried by the client application.
Step 1
Right-click on the "Data Sources" view and then click New Data Source View.
Step 2
Select a specific data source.
Step 3
You can also add a data source by clicking the new data source button. Once the data source is selected, click Next. You can see select tables and options. With this, you can select the view or table you want.
Step 4
After selecting tables and views you can see the completing wizard. Click complete.
That's it. We have created a data source view.
Creating Dimensions
Dimensions are business objects or entities that contain a list of attributes that describe the object (dimension itself) and they are usually derived from some code tables.
The following is the procedure to create dimensions.
Step 1
Right-click on the dimension folder and click select new dimension. Click Next if there are any prompts.
Step 2
A Cube Wizard page appears to "Select Creation Method". In that select "Use an existing table". Then click on the "Next" Button.
Step 3: The dialog will guide you to choose a data source, table name and name column.
Step 4
Choose your data source. Choose from the list of table selected in the data source view and choose a name column. Preferred is the primary key column. Once you click next, it will show you a set of related tables with foreign key constraints to it, if any. Click Next.
Step 5
You need to select what all the dimensions are that you want.
Step 6
In the next screen, you will see the summary of Dimensions. Here provide the name of the Dimension, in other words ProductDimension and then click on the Finish button.
Step 7
Now you can see the dimensions created In the Solution Explorer.
Creating Cubes
A cube includes measures in measure groups, business logic, plus a collection of dimensions that provide context for evaluating the numerical data that a measure provides. Both measures and measure groups are an essential component of a cube. A cube cannot exist without at least one of each.
What Measures are
A measure represents a column that contains quantifiable data, usually numeric, that can be aggregated. Measures represent some aspect of organizational activity, expressed in monetary terms (such as revenue, margins, or costs) or as counts (inventory levels, number of employees, customers, or orders), or as a more complex calculation that incorporates business logic.
What Measure Groups are
Measure Groups are the collection of measures values.
Step 1
Right-click on the cubes folder and select new cube.
Step 2
Select the "Use existing table" option and click Next.
Step 3
A Cube Wizard will appear to "Select Measure Group Table" with the table included in the data source view. Select the tables you want for the cube.
Step 4
In the Cube Wizard to "Select Measures", select the measures you want and then click Next.
Step 5
Select the existing dimensions and click Next.
Step 6
Select the new dimensions you want.
Step 7
The complete wizard appears with the cube name.
Step 8
We can see the cube being created.
Conclusion
Thus cube creation in SSAS is required for reducing the query processing time to improve performance. In the next article I will cover how to query these cubes and how to integrate this cube with our .Net application.