by Michael Raam , Principal Data Analytics - Pyramid Analytics
As machine learning is becoming more and more widespread within the BI community there is a greater need to allow easy / non-technical access to the power of advanced analytics. One of the most used technologies in the machine learning / advanced analytics space is the R language, however it is very technical and requires a steep learning curve which positions R out of reach for most business users.
As of BI Office V6, Pyramid Analytics has introduced the ability to use R language via a visual point-and-click collection of wizards in order to bring the power of R to the greater BI community.
In this blog post, we will focus on clustering and its visualization.
Figures 1 and 2. Cluster Examples.
In order to visualize the clusters we first need to create them, the clustering is done via a point-and-click wizard that does not require any knowledge of R, while allowing code customization for advanced scenarios if R is known.
One of the main advantages of R in Pyramid is that it will work on top of any data source even if the data source by itself does not support R, for example SQL Server 2014.
In our example, we will create three clusters on the Product category based on Sales and Over Head values.
Step 1: Open the clustering wizard via the Analytics menu on the command menu.
Figure 3. Open Clustering from Ribbon.
Step 2: Define the cluster logic.
2.1 - Choose the Algorithm type from a preloaded list: The user can always add additional Algorithms to the list. Here we will choose “KMEANS”.
Figure 4. Begin Cluster Wizard.
2.2 - Choose the category to be clustered: The Product category in this case.
Figure 5. Items to Group.
2.3 – Provide the values to cluster over, please note that different Algorithms require a different number of values and more information is available online.
Figure 6. Numbers to Group by.
2.4 - R script review and customization: On this screen advanced users can review the R code and customize it if needed. Other users can click “OK”.
Figure 7. Customize R Script.
2.5 - Naming and Access definitions: On this screen we can name and add description in addition we can save the logic for reuse and grant access to other users to the clusters we created.
Figure 8. Save as Public, Reusable Script.
2.6 Once we finish with the wizard we will be presented with all the sets and custom calculations created by the wizard, the logic of the result is:
- Sets which contain all the Products flagged as belonging to each individual cluster.
- A calculation that does an aggregation on the items in each set.
- A measure showing the value of the items (Products) in each individual cluster.
Figure 9. Cluster Creation.
2.7 - The final result will be a report where the elements created by the clustering wizard are auto selected and the user has a powerful visualization based on the clustering.
In this example, the color of each data point represents to which cluster out of the three it belongs.
Figure 10. Viewing the Model in a Point Chart.
We look forward to you working with Clustering and the other R-based Advanced Algorithms in BI Office.
Including Michael Raam
Thank you -JohnH