How to Create a Custom Bin Range in Data Model
The Data Model module includes a Bin algorithm under the Machine Learning section that will generate bins based on standard deviation or percentiles. How do I create my own custom ranges?
For example, I would like to use the following logic:
- Ages 0 to 17
- Ages 18 to 25
- Ages 26 to 39
- Ages 40 to 54
- Ages 55 and over
3 replies
-
Custom ranges can be created a number of ways such as using a SQL query, Excel file, R or Python Script. I am going to show you how you can do it using a Calculated Column. Calculated Columns use the PQL language. Refer to the help file for more information on the PQL syntax (LINK).
Here is the PQL syntax to do what you are asking:case( criterias( ([Age]<18) ,([Age]<26) ,([Age]<40) ,([Age]<55) ,([Age]>=55) ) , dataSet( "0-17" , "18-25" , "26-39" , "40-54" , "55+") ,"na")
Here is a screenshot showing the code in action:
-
Here is an example using Python.
Steps
- Add a Python Node
- Paste the following code in a quick script
# check age value ageGroup=[] for i in range(0,len(ageColumn)): if (ageColumn[i] < 18): ageGroup.append("0 - 17") elif (ageColumn[i] < 26): ageGroup.append("18-25") elif (ageColumn[i] < 40): ageGroup.append("26-39") elif (ageColumn[i] < 55): ageGroup.append("40-54") elif (ageColumn[i] >= 55): ageGroup.append("55+") else: ageGroup.append("NA") DataFrame1= pa.DataFrame({'AgeGroup':ageGroup} )
- Configure Inputs and Outputs
-
The original syntax changed at some point. The new PQL logic is as follows:
case(
criteria(
([Age]<18)
,([Age]<26)
,([Age]<40)
,([Age]<55)
,([Age]>=55)
)
, criteriaResults(
"0-17"
, "18-25"
, "26-39"
, "40-54"
, "55+")
,"na")