2

Removing Outliers Using Pyramid

Notice - all required files to reproduce the example are attached

A customer has asked our CS team how to remove outliers from a list in a dynamic way.

Before you approach this - how would we define an outlier?

Typically an outlier is a data point that is more than a number of standard deviations away from the average.

The exact number would be you tolerance for outliers. a small number (say - 1 or 0.5) would suggest an outlier is just slightly away from the curve, while a large number (say - 5) would suggest a great deviation from the average.

How to implement:

The foundations:

The first step is just getting the average and STDEV of all members in our column, and setting up a global parameter form

Average:

Standard Deviation:

 

tolerance parameter:

 

Define the threshold

next, we need to create a calculation that will determine for every element (sub category in my example) whether it is within the threshold or not:

 Lets make sure we understand this: the Revenue measure is taken from context (so - it will be the revenue for each sub category in the consequent formulas). The average is the average for all sub categories. The abs() functions essentially returns the distance between the revenue for a specific sub category and the average, regardless if it's above or below.

Then we subtract the tolerance parameter by the actual standard deviation - therefore - a value above 0 means this category should be considered as an outlier and therefore be removed from our desired results.

Create a list

Now all we need is a list that will filter all sub categories based on the defined tolerance:

 So - if the sub category is outside our tolerance - it will be filtered out.

And now we can see it in a discover:

 Notice, when we increase our tolerance we see more results.

Attachments:

Home Depot Model.pie - the data model I've used. 

home_depot_abc_xyz_enriched - will be the data source for the model.

remove outliers.pie - the pie file with all the calculations and a couple of discovers - one with the filtered list and another with the relevant measures.

I hope it's useful, and thank you  for the inspiration

1 reply

null
    • Shir_BenGal
    • 13 hrs ago
    • Reported - view

    Moshe, thank you very much for the clear and detailed explanation.

    Definitely going to give this a try!

Content aside

  • 2 Likes
  • 13 hrs agoLast active
  • 1Replies
  • 32Views
  • 2 Following