Prediction Wizard - Part 3 (Use Case for Bike Sales)
by David Novick, Technical Writer, Pyramid Analytics
and Bar Amit, Data Scientist, Pyramid Analytics
and Imbar Marinescu bar, Developer, Pyramid Analytics
In an effort to get our customers off to a quick start with the new Prediction Wizard, the R developers at Pyramid Analytics have created this use case to illustrate how prediction capabilities could be of assistance to a typical bike sales team.
In our use case, the fictional VGD-BIKES is a chain store with sales offices throughout the USA. The Boston branch has a tradition of organizing popular trail races on Thanksgiving Weekend each year. The winner of each age/gender group is awarded a free trail bike. Participation costs $30 per person, but if you fill in an online survey you can enter the race for free. The survey is used to gather personal data from participants, such as name, age, gender, address, profession, income level, home ownership, marital status, number of cars owned, and number of children (dependents).
The Sales Manager in the Boston branch, named Bob, organizes the Thanksgiving races and leads all sales activities in the Boston branch. After the races each year, Bob is left with approximately 1000 names of new participants who serve as a "prospect base" for the coming year.
Bob has detailed sales records for 2016, meaning he can compare the 2016 survey data against the actual 2016 sales results. Bob wants to use the new Prediction Wizard to create a predictive model based on the 2016 data. The predictive model will be used in evaluating the “prospect base” for the coming year (2017) and for future years. The predictive model could also be used by other branches of VGD-BIKES.
Bob loads into BI Office the data model created by the VGD-BIKES database team. Next he opens the Data Discovery/Analytics ribbon and selects the Advanced Prediction command.
NOTE: This is a good time to mention that to activate the Prediction icon, the user needs to be working with a tabular data model and needs to have read/write access to that model.
Prediction icon appears when you load a tabular model for which you have access rights.
The Welcome Screen appears, as shown below. This screen has no fields, but it allows Bob to pause for a second and take in the big picture before he proceeds. The navigation panel on the left (Welcome, Algorithm, etc.) allows Bob to navigate between the various wizard dialogs in an interactive fashion. The recommended approach is to proceed one-by-one through the dialogs in consecutive order. However, it’s OK to go back and make changes in earlier dialogs when necessary.
Proceed step-by-step through dialogs. You can go back and make changes when needed.
In the Algorithm dialog, Bob relies on the default values for the first three parameters.
- Prediction Output
Bob will be using all 1000 names from the 2016 data to train his predictive model, so there is no need to specify a smaller subset. For larger data models, it’s recommended to perform initial training with a smaller sub-set.
To learn more about the prediction output/algorithm options, see the BI Office help.
Bob’s main goal is to evaluate which type of trail race participants are most likely to purchase a new bike in the future. With this goal in mind, Bob makes the two following selections:
- For Dimension, Bob selects “Customer Profile”. This dimension contains lots of personal info that was gathered through the online survey.
- For Hierarchy, Bob selects “Purchase Bike” from the list. When first training a predictive model, this parameter contains a YES/NO value which indicates whether the person did (or did not) purchase a bike in the previous year (2016). When Bob later runs the predictive model on the data set for 2017, the YES/NO parameter will be a predicted YES/NO value.
The Prediction Wizard is designed to predict a SINGLE attribute or metric.
In this dialog, Bob needs to select a unique number, value or string that serves to identify the target of his prediction. Bob selects the "CustomerProfile" dimension and then the "profileKey" hierarchy. The profileKey contains a unique identifier for each person who participates in Bob's online survey.
The wizard requires a unique number, value or string to identify the prediction target.
Now comes the challenging part. Bob needs to choose a number of attribute hierarchies to serve as input for his predictive model. Bob chooses two main attributes, though he could choose as many as he likes. For prediction purposes, his goal is to discover those attributes that serve as major “statistical splicers” of the data.
- Home Owner – By choosing the home owner parameter, Bob hopes to reveal a connection between home ownership and the purchase of new bikes. Are home owners more (or less) likely than renters to purchase new bikes?
- Marital Status – Here Bob seeks a connection between marital status and new bike purchases. Are married people more (or less) likely than single folks to buy new bikes?
Specify one or more attributes as input for the prediction algorithm.
In this dialog, Bob three measures to serve as input for his predictive model.
The selected measures are used together with the selected categories to predict who will (or will not) purchase a bike.
- Sum Cars - The number of cars owned by the person.
- Sum Children - The number of children (dependents) that live with the person.
- Sum Income - The person's income level.
Specify one or more numeric items as input for prediction algorithm.
This dialog shows the R Script generated interactively by the wizard based on Bob's current wizard selections. For most wizard users, the script exists in background mode and can be safely ignored.
NOTE: For R mavens, it’s possible to edit the script when necessary. Just remember that when you check the “Customize Script” box, the script content is disconnected from your wizard selections and becomes a manual script. If you later uncheck the box, the script content will be automatically regenerated based on the current content of the wizard dialogs (and ALL of your manual script changes will be lost).
R Script is auto-generated based on your current wizard selections.
In the “Final” dialog, Bob enters a number of output parameters. Bob types in "BIKE-PRED" as the caption and also types in a short description. Bob also selects the Save Predictive Model and Save Design options. Then Bob clicks OK to put the wizard in action:
- The wizard creates a wizard template named "BIKE-PRED".
- The wizard creates a predictive model named "BIKE-PRED" based on the 2016 data. The predictive model can be run on Bob's 2017 survey data, and may also be run on data collected at other branches of VGD-BIKES.
The caption serves as the name for both the wizard template and the predictive model.
To continue in this blog series, see Prediction Wizard - Part 4 (Use Case Results).