Prediction Wizard - Part 1 (Overview)
by David Novick, Technical Writer, Pyramid Analytics
and Bar Amit, Data Scientist, Pyramid Analytics
and Imbar Marinescu bar, Developer, Pyramid Analytics
This blog provides a conceptual overview regarding the Prediction Wizard and describes three data set scenarios for implementing the wizard.
To display the Prediction icon in the Data Discovery/Analytics ribbon, two conditions must be met.
- The data model must be a tabular model created within the Data Modeling tool.
- User must have permission to "process" the data model in the Data Modeling tool.
The wizard provides a series of dialogs which can be used to specify the various data sources and input settings. Once you fill in the dialogs, you can save your settings in a "wizard template" that can be used to generate a "predictive model." You can create as many wizard templates as you like - each with a unique name.
You can also create several versions of the same wizard template - all with the same template name. In this case, you can specify which version will be the active one for generating a predictive model.
Once you have created a wizard template, you can use the template to generate a predictive model. A predictive model operates as a black box algorithm. The predictive model first trains on an input item X and learns how to calculate X according to various input items (A/B/C etc.). For example, your predictive model might learn a sales trend X based on price, discount, location and a list of unique purchaser IDs. Later on, the predictive model can be run on other data sets to "predict" the sales trend X for each data set.
During the "training stage" you can use a sub-set of your database to develop the predictive model quickly. Once you have created a predictive model which produces satisfactory results, you can train the predictive on a larger data set or even train it on the entire data set. Once a predictive model is created, it can be run "as is" on the current data set or on a different data set. When run, the predictive model always executes on the full data set that is currently loaded.
Three are three different scenarios for running your predictive model. In all three scenarios, the predictive model ALWAYS executes on the full data set that is currently loaded.
The simplest scenario is when you run your predictive model on the original data set on which you trained the predictive model. This might seem to be a redundant task - but you could for example train your predictive model on a subset containing 20% of the data and then run it on the full 100%.
The second scenario is when you have a different data set based on the SAME data model. For example, let's assume there are two sales departments who share a common data model but create their own unique data sets for different purposes. In this case, you can use the predictive model freely for both departments without any issue of data compatibility - since both data sets are based on the same data model.
The third scenario is the most challenging. Let's assume that you have developed a powerful predictive model that needs to be shared with another corporate department using a second data model. You can in fact "share" your predictive model with that department, ON CONDITION that all data elements used by the predictive model appear in the second data model as well (dimension, hierarchies, categories, measures, etc.). The relevant data elements need to be defined with the exact same names (cap sensitive) in both data models.
To continue in this blog series, see Prediction Wizard - Part 2 (Wizard Templates vs. Predictive Models).