1

How to Create a Custom Bin Range in Data Model

The Data Model module includes a Bin algorithm under the Machine Learning section that will generate bins based on standard deviation or percentiles.  How do I create my own custom ranges?

For example, I would like to use the following logic:

  • Ages 0 to 17
  • Ages 18 to 25
  • Ages 26 to 39
  • Ages 40 to 54
  • Ages 55 and over

3 replies

null
    • Jason_Picker
    • 6 yrs ago
    • Reported - view

    Custom ranges can be created a number of ways such as using a SQL query, Excel file, R or Python Script.  I am going to show you how you can do it using a Calculated Column.  Calculated Columns use the PQL language.  Refer to the help file for more information on the PQL syntax (LINK).


    Here is the PQL syntax to do what you are asking:

    case(
    criterias(
     ([Age]<18)
    ,([Age]<26)
    ,([Age]<40)
    ,([Age]<55)
    ,([Age]>=55)
    )
    , dataSet(
      "0-17"
    , "18-25"
    , "26-39"
    , "40-54"
    , "55+")
    ,"na")

    Here is a screenshot showing the code in action:

    • Jason_Picker
    • 5 yrs ago
    • Reported - view

    Here is an example using Python.  

    Steps

    • Add a Python Node
    • Paste the following code in a quick script
    # check age value
    
    ageGroup=[]
    for i in range(0,len(ageColumn)):
        if (ageColumn[i] < 18):
            ageGroup.append("0 - 17")
        elif (ageColumn[i] < 26):
            ageGroup.append("18-25")
        elif (ageColumn[i] < 40):
            ageGroup.append("26-39")
        elif (ageColumn[i] < 55):
            ageGroup.append("40-54")
        elif (ageColumn[i] >= 55):
            ageGroup.append("55+")
        else:
            ageGroup.append("NA")
    
    DataFrame1= pa.DataFrame({'AgeGroup':ageGroup}
    )
    • Configure Inputs and Outputs
    • Jason_Picker.1
    • 1 yr ago
    • Reported - view

    The original syntax changed at some point.  The new PQL logic is as follows:

    case(
    criteria(
     ([Age]<18)
    ,([Age]<26)
    ,([Age]<40)
    ,([Age]<55)
    ,([Age]>=55)
    )
    , criteriaResults(
      "0-17"
    , "18-25"
    , "26-39"
    , "40-54"
    , "55+")
    ,"na")

Content aside

  • Status Answered
  • 1 Likes
  • 1 yr agoLast active
  • 3Replies
  • 151Views
  • 3 Following