Tuesday, August 14, 2012

PowerPivot’s Filter Context from a SQL perspective


The attempt to combine wisdom and power has only rarely been successful and then only for a short while.” Albert Einstein

I figure I best write down my thoughts before my new found wisdom escapes me. Every time I think I’ve finally understood “Filter Context”, it turns out that I really don’t get it. 

To be honest, I’ve read many definitions of “Filter Context” and descriptions about “Filter Context”. None of them have helped me to master the concept. I don’t come from the world of the multi-dimensional language MDX. Until recently, I didn’t see the usefulness of PivotTables when a solid SQL query can provide the same information.

Then, it came to me! What if I used the knowledge that I do have? Look at PivotTables and “Filter Context” through the perspective of SQL. Using this new perspective, I believe “Filter Context” is the following:

the key concept in PowerPivot allowing you to use Excel-like functions to dynamically alter the criteria of each cell in a PivotTable. Instead of relying on the PivotTable’s built-in process of creating a criteria based on the row value, column value, report filter, and/or slicer, utilize DAX functions to define the criteria. This new capability removes the PivotTable’s limited ability to solve business problems.

You can now move beyond the standard SUM and COUNT formulas, and create functions similar to those in SQL. “Filter Context” isn’t a feature like something you can select in an Excel ribbon or pop-up menu. It’s a behavior similar to Excel’s cell reference. Once you understand for example what the cell reference “A1:C3” is, you can use it in all sorts of Excel functions. Just like cell reference, “Filter Context” is fundamental to creating DAX formulas. 

I took my thoughts a little further. How does a PivotTable operate? It executes a SQL statement in each cell. Those SQL statements’ “where” clauses are automatically created based on the cell’s position in the PivotTable.

Let me start with a very basic data-set (call it data_table),
ID
Product
Color
amount
1
Pen
Red
2
3
Pencil
Blue
6
5
Pen
Blue
10
7
Pencil
Red
10

Create a standard PivotTable from the above data table,

I create the PivotTable below by:
·         Dragging Color into Row labels
·         Dragging Name into Column labels
·         Dragging amount into Values and leaving the “Summarize values as” SUM function
·         No Slicer or PivotTable’s report filter was used

 You get something like below,
Sum of amount
Column Labels
Row Labels
Pen
Pencil
Blue
10
6
Red
2
10

Not too useful!

If I apply my SQL analogy, the PivotTable really  looks like,
Column Label (Product)
Row Label (Color)
Pen
Pencil
Blue
select sum(amount)
from data_table
where Color = 'Blue' and Product = 'Pen'
select sum(amount)
from data_table
where Color = 'Blue' and Product = 'Pencil'
Red
select sum(amount)
from data_table
where Color = 'Red' and Product = 'Pen'
select sum(amount)
from data_table
where Color = 'Red' and Product = 'Pencil'

Each of the 4 cells in my PivotTable execute a query off of my data_table. In the cell that intersects Pen and Blue, SUM(amount) function will execute only for rows in the data_table where Color = 'Blue' and Product = 'Pen'.  No matter where I place this PivotTable in my worksheet or the entire workbook, this cell will always execute the SUM(amount) function where Color = 'Blue' and Product = 'Pen'.

What’s the big deal?
The standard PivotTable executes the SQL statements each time you refresh the table or alter the criteria of PivotTable. Basically, the PivotTable dynamically creates the “where” clause criteria based on the location of the SQL statement inside the PivotTable. The only way to alter this SQL statement is select a different aggregation function (COUNT, AVERAGE, and MAX). Not very flexible or useful for healthcare analytics!

In order to make a useful function, say a ratio of amount of Blue Pens to all Pens and Pencils, a PowerPivot function needs to be created. More appropriately, a DAX function needs to be created. PowerPivot allows you to create your own “where” clause statement, even join multiple tables, or utilize functions other than COUNT, AVERAGE, and MAX. This fact turns Excel’s PivotTable into a true Business Intelligence tool.

EXAMPLE DAX FORMULA #1 - Create a formula to compute Total Amount of all Pens and Pencils

In terms of SQL, I need to create a “where” clause that ignores both the row label and column label criteria for each cell of the PivotTable. That would look like,

Column Label (Product)
Row Label (Color)
Pen
Pencil
Blue
select sum(amount)
from data_table
where (ignore Color = 'Blue' and ignore Product = 'Pen')
select sum(amount)
from data_table
where (ignore Color = 'Blue' and ignore Product = 'Pencil')
Red
select sum(amount)
from data_table
where (ignore Color = 'Red' and ignore Product = 'Pen')
select sum(amount)
from data_table
where (ignore Color = 'Red' and ignore Product = 'Pencil')

That is, I want my PivotTable to ignore the built-in “where” clause of “where Color = 'Blue' and Product = 'Pen'” and utilize a “where” clause of my creation. I want the total of all Pens and Pencils to appear in each cell of my PivotTable.

To ignore the PivotTable’s built-in where clause statement, I created the following DAX formula,

TotalAmount=CALCULATE(sum(data_table[amount]),all(data_table))

This is analogous to writing,
SELECT sum(data_table[amount])
FROM data_table

I want to compute the total amount regardless of where this function is placed in the PivotTable. I don’t care if the end user slices the PivotTable by Red and dices the PivotTable by Pen resulting in only one cell in the PivotTable, I want to override the PivotTable’s built-in criteria. The SQL statement is a simple one, sum all values in the column [amount] in the table data_table.

Let’s look at the DAX formula (called a measure in PowerPivot terminology) again,

TotalAmount=CALCULATE(sum(data_table[amount]),all(data_table))

DAX function breakdown
·         TotalAmount – name of the measure
                The formula that changes the criteria (changes the “filter context”). In my example, CALCULATE sums all the amounts in the table regardless of its position in the PivotTable or criteria selected by the user.
o   <expression> -
§  sum(data_table[amount]) – simple formula to sum the amounts in the data_table
o   <filter1> -
§  all(data_table) – return all the rows in data_table regardless of the cell this formula is located.

In the PowerPivot’s Field list, right click on the table and select new measure. Excel exposes the dialog box below for you to enter your DAX function,
clip_image001


By selecting the TotalAmount measure only, the PivotTable looks like,

TotalAmount
Column Labels
Row Labels
Pen
Pencil
Blue
28
28
Red
28
28

If I include the sum of amounts and TotalAmount measures, the PowerPivot table looks like,

Column Labels
Pen
Pencil
Row Labels
Sum of amount
TotalAmount
Sum of amount
TotalAmount
Blue
10
28
6
28
Red
2
28
10
28


EXAMPLE DAX FORMULA #2 - rank the 4 items in the data table by amount

I want to rank the each row of data_table by each row’s amount. In SQL, I would write (using T-SQL):

SELECT ID, amount, RANK() OVER (ORDER BY amount) as Rank
FROM data_table

Leave it to Microsoft to provide multiple ways to calculate rank in DAX. I’m choosing to review a longer version for demonstration purposes (PowerPivot version 2 has a built-in RANK function),

Rank=COUNTROWS
(
                FILTER(
                                ALL(data_table[ID]),
                                data_table[SumAmount] >
                                                CALCULATE(data_table[SumAmount],values(data_table[id]))
                                )
) + 1

DAX function breakdown
·         COUNTROWS(<table>) – Count the number of rows in the table defined by formulas defined inside the parenthesis
·         FILTER(<table>,<filter>)  - Returns a table to the COUNTROWS function that only contains rows that have amounts greater than the row that this function is located.
o   First, ignore the row and column criteria in each cell
o   Then compute the ranking algorithm,
data_table[SumAmount] > CALCULATE(data_table[SumAmount],VALUES(data_table[id]))
§  For each row, evaluate whether the current’s rows amount is greater than selected  row
§  In order for this formula to work, the measure, data_table[SumAmount] , must be utilized.
·         SumAmount =sum(data_table[amount])

Again, a view of the dialog box for entering the DAX functionclip_image002

The resulting PivotTable looks like,
ID
SumAmount
Rank
1
2
4
3
6
3
5
10
2
7
12
1

PowerPivot allows you to alter the standard PivotTable by:
·         Creating powerful, dynamic functions
·         Create useful formulas such as pmpm, per 1000, percent compliance

There is a lot of nuance in “Filter Context” that I didn’t describe. I wanted to share my new understanding of the key PowerPivot concept. Key items to remember:
·         When a measure is placed in a PivotTable, it’s execution in each cell depends on the intersection of row, column, report filter, and slicers. In essence, PowerPivot allows you to change this intersection by using DAX functions
·         Think of “Filter Context” as changing the ”where” clause statement in a SQL statement.
·         PowerPivot allows for the creation of useful functions inside PivotTables, similar to functions in a standard Excel worksheet. Percent compliance rates, per member per month, and encounters per 1000 are a few of the metrics that can now be calculated on the fly inside an Excel PivotTable.


Data models in PowerPivot


“To accomplish great things we must first dream, then visualize, then plan... believe... act! “ Alfred A. Montapert



The above quote is very appropriate for this posting. Before any PowerPivot model can be used for analysis, you must visualize how to put the data together. Thankfully version 2 introduces the “Diagram View” (aka MS Access relationship window). From the “Diagram View”, you can do the following:
·         Add/delete tables from one of several data sources (Oracle, SQL Server data cube, MS Excel, etc.)
·         Connect the tables together by defining their relationship to one another
·         Visualize how the different tables connect and relate to each other

Below are common configurations

Model #1 – the single table

clip_image002

This is the simplest model and probably the most common data model to use. You can connect to a data cube and present the data in PowerPivot as a single table. Using DAX, you can extend the usefulness of the standard Excel PivotTable by creating new metrics.

Model #2 – data blending

clip_image004

Data blending involves combining data from different data sources. Perhaps, you have data in SQL Server and a table in MS Excel. PowerPivot now provides the mechanism to merge this data. This was my main attraction to PowerPivot. Previous to PowerPivot, this task required a developer to create a SSIS package. I’ll admit that I still use MS Access to blend data. However, MS Access can’t connect to a SSAS based data cube.

The picture above shows a line connected the two tables. In SQL terms, I created an outer join between the two tables. In the PowerPivot window, I can make this join  (or relationship in PowerPivot terms) by simply dragging the DepartmentName field in the left table to the DepartmentName in the right table.

Caution: Please make sure the values in the second table are truly unique. PowerPivot will not connect the tables and produce an error message. You can always copy a single column from the “Data View” window and paste into an Excel worksheet to check for duplicates. Nothing can be more frustrating watching that error message pop up when you know there are no duplicates. More than likely, there are.

Model #3 – Simple Star Schema

clip_image006

The simple star schema model is the typical format for data cubes. There is one table with all the relevant data (fact table) and several look-up tables (dimension tables).

The fun with this model comes when you create your PivotTable. Excel will display the three tables as separate trees in the PivotTable’s field list. You can drag the dimCustomer’s name in the report filter and the dimFactInternetSales’s SalesAmount in the value section. By Selecting a single customer, the PivotTable will display only the SalesAmount for that customer thanks to the relationship established as pictured.

clip_image008

Model #4 – Market Basket Analysis
clip_image010

Market Basket analysis is the fancy term for a query like, “Find all office visits in 2011 for Dr. Smith that had a MRI after that visit”.  One filtered data-set (all office visits in 2011 for Dr. Smith ) is required as input to create another filtered data-set (MRI after that visit). Typically, it takes a sub-query in SQL or MDX using a data cube. You can perform this query using PowerPivot!

In Alberto Ferrari’s blog, he answers the question “Of all the customers who have bought a mountain bike, how many have never bought at least one mountain tire tube?”. To understand the DAX formulas that he uses, the concept of “Filter context” must be understood.

“Filter context” is the topic of my next blog post, PowerPivot’s Filter Context from a SQL perspective

Sunday, August 12, 2012

What’s the deal with PowerPivot?


“Insanity: doing the same thing over and over again and expecting different results.”Albert Einstein

I’ve struggled for months to not only understand PowerPivot but to make use of it in my daily healthcare analytics. I read the same Microsoft hoping to uncover the “magic” of DAX. As I dive deeper into Microsoft’s BI stack, PowerPivot keeps coming up.
I now realize that there are two core concepts that I must master:
· The various data models supported by PowerPivot (Check out Data models in PowerPivot)
· The practical use of “Filter Context” (Check out PowerPivot’s Filter Context from a SQL perspective)

Before I write about my new found understanding of PowerPivot, I want to describe why PowerPivot is even needed. In my daily life, I use SQL to extract data from relational databases like Oracle and SQL Server. I use Excel to present that data in an interactive model or to simply pass data to an end-user.
PivotTables had become an after thought before using data cubes. PivotTables make possible to display data cubes in Excel. Mainly PivotTables are analytic tools that allow one to dynamically query data. In SQL terms, PivotTables perform aggregation functions like SUM and COUNT in each cell. This is great until you want to produce a useful metric like ER rates per 1000 or percent compliance rates. There are tricks such as using a PivotTable as a source for another PivotTable. PivotTables aren’t as flexible as a well written SQL statement.
Enter PowerPivot! Now you can,
· Extract data from multiple data sources (text, data cubes, relational databases)
· Create complex formulas that answer real-world questions using DAX
· Prototype new data-sets and pass to cube developers for the creation of robust data cubes.
· Perform Business Intelligence analysis and ad-hoc analysis without knowing MDX (the language of cubes)
· Share your PowerPivot model through SharePoint to allow others to directly connect to it
That’s nice but, what’s the big deal with PowerPivot? Let’s take a typical data-set and a PivotTable
ID
Product
Color
amount
1
Pen
Red
2
3
Pencil
Blue
6
5
Pen
Blue
10
7
Pencil
Red
10
I created the PivotTable below by:
· I created an Excel Table (using my favorite shortcut Ctrl-T)
· Click “Summarize with PivotTable” from the table’s design ribbon
· Dragged Color into Row labels
· Dragged Name into Column labels
· Dragged amount into Values and leaving the “Summarize values as” SUM function
· No Slicer or PivotTable’s report filters were used
You get something like below,
Sum of amount
Column Labels
Row Labels
Pen
Pencil
Blue
10
6
Red
2
10
Not too useful. Let me switch to my SQL analogy. One can think of each cell in the PivotTable as executing a single dynamically created SQL statement.
Column Label (Product)
Row Label (Color)
Pen
Pencil
Blue
select sum(amount)
from data_table
where Color = 'Blue' and Product = 'Pen'
select sum(amount)
from data_table
where Color = 'Blue' and Product = 'Pencil'
Red
select sum(amount)
from data_table
where Color = 'Red' and Product = 'Pen'
select sum(amount)
from data_table
where Color = 'Red' and Product = 'Pencil'

Each of the 4 cells in my PivotTable executes a query from my data_table. In the cell that intersects Pen and Blue, SUM(amount) function will execute only for rows in the data_table where Color = 'Blue' and Product = 'Pen', producing the amount of 10. No matter where I place this PivotTable in my worksheet or the entire workbook, this cell will always execute the SUM(amount) function where Color = 'Blue' and Product = 'Pen'. By adding a field to the report filter or adding a slicer, the “where” clause statement automatically includes that additional attribute.
What’s so great about PowerPivot? It allows you to change the “where” clause statement. This wasn’t possible prior to PowerPivot’s introduction. This is typically the domain of the cube developers who utilize Microsoft’s SSAS and MDX. The door is now open for an Excel Power user or analyst to create useful cube-like data models. Now, I can compute the sum of all items from my data_table and place the result next to individual amounts already in the PivotTable. I can blend my data_table with another table and display a related item in the PivotTable.
In my next post, I’ll discuss the data model concept behind PowerPivot. In order to utilize DAX, you have to understand how to arrange your data.

Wednesday, April 4, 2012

Top 5 takeaways from the 2012 Pay for Performance Summit


I attended the 7th National Pay for Performance (P4P) summit last month. Below are my key takeaways.
1. I learned “New” buzz words:
a. Collaboration - Healthcare costs declines can only come through collaboration
“There is a growing recognition that our ability to control costs and improve quality will require an effective partnership with informed and engaged consumers.” – Dr. Judith Hibbard. From the presentation of Elizabeth Mitchell Chief Executive Officer, Maine Health Management Coalition, Portland, ME
b. Win-win game – The zero sum game just isn’t working.
“Would you rather have a small win you can get?  Or a “big win” that you can’t?” - Harold Miller, Executive Director, Center for Healthcare Quality and Payment Reform; President and Chief Executive Officer, Network for Regional Healthcare Improvement, Pittsburgh, PA
Read his presentation at http://www.ehcca.com/presentations/pfpsummit7/miller_2_1.pdf
c. Pioneer ACO – The CMS Innovation Center initiative designed to support organizations with experience operating as Accountable Care Organizations (ACOs) or in similar arrangements in providing more coordinated care to beneficiaries at a lower cost to Medicare. Read CMS’s info on Pioneer ACO at http://innovations.cms.gov/Files/fact-sheet/Pioneer-ACO-General-Fact-Sheet.pdf
Read the P4P Summit presentation at http://www.ehcca.com/presentations/pfpsummit7/thompson_ms4.pdf
2. “There isn’t a good or bad provider [in terms of cost]. Most docs are in the middle. Change is about moving the middle to a new middle” – Howard Beckman MD, FACP
Chief Medical Officer, Focused Medical Analytics, Pittsford, NY
Read one of his presentations at http://www.ehcca.com/presentations/pfpsummit7/2.3_2.pdf
3. “Motivation is human energy directed to a goal” - Arnold Milstein, MD, MPH
Professor of Medicine and Director of the Clinical Excellence Research Center, Stanford University, Stanford, CA
4. Tasks for the healthcare force:
a. Translational efficiency – providers need to copy what the “best of best” are doing which produce great patient outcomes at low costs.
b. Without slowing down quality gains, need to decrease 2% pmpm per capita per year
From Arnold Milstein, MD, MPH Professor of Medicine and Director of the Clinical Excellence Research Center, Stanford University, Stanford, CA
5. “Those who go fast, go alone. Those who go slow, go together” – Gail Amundson, MD
President and Chief Executive Officer, Quality Quest for Health of Illinois, Peoria, IL
Read her presentation at http://www.ehcca.com/presentations/pfpsummit7/1.5.pdf. I didn’t know that she co-authored the Evidenced Based Cervical Cancer screening measure used by NCQA and IHA.

Friday, March 2, 2012

Top 5 reasons for a SSRS report failure


Aladdin: You're a prisoner?
Genie: It's all part and parcel, the whole "genie gig":
[grows to a gigantic size]
Genie: PHENOMENAL COSMIC POWERS!
[shrinks down inside the lamp]
Genie: Itty-bitty living space!
From Aladdin, 1992, Walt Disney Pictures

Like the genie in the bottle, SQL Server Reporting Services provides a phenomenal platform if its in the hands of an experience report developer. You have to know how to exactly arrange your expressions.

If you created just one report, you've already encountered multiple error messages. What’s more frustrating is after successfully completing the report, the end user can encounter unforeseen  error messages. I thought it appropriate to share the “excuses”

Top 5 excuses to give for a SSRS report failure 

5. That’s strange, it works on my computer.

This actually happens. You create the report in the integrated development environment on your computer, deploy it to the server, and the end user views the report based on several permissions.  There are many areas where an error can occur. For true effectiveness, follow-up this excuse with some techno-babble explanation.

4. I gave you the wrong link.

Give this excuse if you need a few moments to fix the issue. You can follow-up this excuse that the end-user doesn’t have access to the link provided. 

3. It must be an issue with the database, I’ll contact the dba.

When in doubt, blame the dba. Sorry dbas, this is a universally acceptable excuse.It’s like saying I didn’t hang up on you, It was my cell phone carrier’s fault. I passed through a “dead zone”.

2. I forgot to grant you permission to view the report

This excuse is probably the most reliable one to give. It can buy you up to a day to address the real report problem.

1. The leap year caused the report to fail but it’s fixed now

This excuse can be used anytime in a leap year. It’s not restricted to 2/29! Everyone remembers the Y2K computer problems. However, you must be careful who you give this excuse to, it could backfire.

Monday, February 20, 2012

New SQL Server 2012 BI tools

 Three things cannot be long hidden: the sun, the moon, and the truth.”- Buddha

In Microsoft’s case, try as they might to be cute in naming their new products that no one understands, BI professionals have uncovered key features. I bring this up because many organizations are Microsoft-centric. Microsoft doesn’t make it easy to understand how to fully utilize their technologies.
Microsoft will launch their newest version of the venerable database system, SQL Server 2012 (Online launch event) on March 7.
New tools in SQL Server 2012 that I found:
  • SQL Server Data Tools (SSDT) – the tool to develop database models, SSRS reports, SSIS packages, and more.

It will replace the Business Intelligence Development Studio (BIDS). Check out SQL Server’s Pro’s write-up.
  • BI Semantic Tabular Model – Allows an analyst, as well as a developer, to create data models that can be published on SharePoint. In Microsoft’s vision (See in diagram below), developers create multi-dimensional models (cubes) and business analysts create tabular models.
Thanks to James Serra’s blog entry for clearing up Microsoft’s marketing mumbo-jumbo.

An analyst can design a data model by visually linking tables and writing DAX (Data Analysis Expressions language) formulas to encapsulate business logic.
Check out this slide deck
Check out demo
  • PowerView PowerView is an interactive data exploration, visualization, and presentation experience. It provides intuitive ad-hoc reporting for business users such as data analysts, business decision makers, and information workers.

















Features:
      • Interactive, web-based report authoring & sharing
      • Powerful BI data layout with new visualization
      • Multi-view story board in tool, and Export to PowerPoint & present interactively with data
      • Runs off PowerPivot BI data, or other SQL 2012 tabular models
      • Smart and powerful querying, with zero configuration for filtering
Check out the Demo 1 or Demo 2

Sunday, February 12, 2012

Trust me, data cubes are great


"Come with me if you want to live" 
 Arnold Schwarzenegger in Terminator 2: Judgment Day 


Two thoughts come to mind as I remember this quote:
  1. Linda Hamilton's character must take a leap of faith and quickly
  2. The Terminator has a plan that will weather the upcoming change, i.e. Terminator T-1000's relentless pursuit.
In the case of Healthcare analystics, perhaps the above quote should read something like  "Trust me, this new technology (EHR, data cubes, etc) will work if you intend to survive healthcare reform". 

Trust and new technology...Can that those words be used in the same sentence? I'm optimistic so yes. 

I'm faced with transitioning to data cubes. I'm told I'll have little need for my old skills. The new technology will be more efficient and robust. Over the years, I've become quite proficient with Microsoft Excel/Access and SQL. I feel comfortable. How can I trust the results of a data cube when time is of essence?

Two actions are needed to answer this question:
  • Design a cube with a solid specification
  • Test the cube with a robust Q/A checklist. This ensures the specification was followed and business logic developed appropriately. 
Given where my mind is today, I want to focus on the Q/A checklist. I googled "cube testing" and most links (this one or this one) are from a developer's perspective. If I relied on these links, it's as if my mechanic believes my car is fixed, but I need to be assured in terms I can understand. 


I developed the list below with the help of Google searches, colleagues, and mostly from frustrating experiences. 
1.    Ensure requested cube attributes are present 
2.    Ensure requested cube attributes have appropriate display names
3.    Challenge the logic behind each attribute. 
     If you are transitioning from a legacy system to a data cube, this is a vital step.
4.    Ensure cube attributes list properties either from the underlying database or custom list.
5.    Ensure relationships between fact table and dimension tables actually work as advertised
6.    Place extreme conditions in to see cube to see how it reacts 
7.    Look for known trends in the legacy system and see if they are present in the data cube
8.    Does the data cube answer specified business questions?
9.    Can data cube based reports replace your legacy reports?