Slice of Health Data: Data models in PowerPivot

“To accomplish great things we must first dream, then visualize, then plan... believe... act! “ Alfred A. Montapert

The above quote is very appropriate for this posting. Before any PowerPivot model can be used for analysis, you must visualize how to put the data together. Thankfully version 2 introduces the “Diagram View” (aka MS Access relationship window). From the “Diagram View”, you can do the following:

· Add/delete tables from one of several data sources (Oracle, SQL Server data cube, MS Excel, etc.)

· Connect the tables together by defining their relationship to one another

· Visualize how the different tables connect and relate to each other

Below are common configurations

Model #1 – the single table

This is the simplest model and probably the most common data model to use. You can connect to a data cube and present the data in PowerPivot as a single table. Using DAX, you can extend the usefulness of the standard Excel PivotTable by creating new metrics.

Model #2 – data blending

Data blending involves combining data from different data sources. Perhaps, you have data in SQL Server and a table in MS Excel. PowerPivot now provides the mechanism to merge this data. This was my main attraction to PowerPivot. Previous to PowerPivot, this task required a developer to create a SSIS package. I’ll admit that I still use MS Access to blend data. However, MS Access can’t connect to a SSAS based data cube.

The picture above shows a line connected the two tables. In SQL terms, I created an outer join between the two tables. In the PowerPivot window, I can make this join (or relationship in PowerPivot terms) by simply dragging the DepartmentName field in the left table to the DepartmentName in the right table.

Caution: Please make sure the values in the second table are truly unique. PowerPivot will not connect the tables and produce an error message. You can always copy a single column from the “Data View” window and paste into an Excel worksheet to check for duplicates. Nothing can be more frustrating watching that error message pop up when you know there are no duplicates. More than likely, there are.

Model #3 – Simple Star Schema

The simple star schema model is the typical format for data cubes. There is one table with all the relevant data (fact table) and several look-up tables (dimension tables).

The fun with this model comes when you create your PivotTable. Excel will display the three tables as separate trees in the PivotTable’s field list. You can drag the dimCustomer’s name in the report filter and the dimFactInternetSales’s SalesAmount in the value section. By Selecting a single customer, the PivotTable will display only the SalesAmount for that customer thanks to the relationship established as pictured.

Model #4 – Market Basket Analysis

Market Basket analysis is the fancy term for a query like, “Find all office visits in 2011 for Dr. Smith that had a MRI after that visit”. One filtered data-set (all office visits in 2011 for Dr. Smith ) is required as input to create another filtered data-set (MRI after that visit). Typically, it takes a sub-query in SQL or MDX using a data cube. You can perform this query using PowerPivot!

In Alberto Ferrari’s blog, he answers the question “Of all the customers who have bought a mountain bike, how many have never bought at least one mountain tire tube?”. To understand the DAX formulas that he uses, the concept of “Filter context” must be understood.

“Filter context” is the topic of my next blog post, PowerPivot’s Filter Context from a SQL perspective

Slice of Health Data

Tuesday, August 14, 2012

Data models in PowerPivot

No comments:

Post a Comment