0

Splitting dataset Train Error

I am trying to spilt dataset into a train and test using Python Node and Learn&Predict. 

I get an error below related to pyramid_eval_out.write. How can it be fixed?

 

Error

PyramidException: scripting error: Scripting process execution for the following command has failed: 

E:\Pyramid\python\Scripts\conda run -p E:\Pyramid\repository\general\scripting_Environments\00000000-0000-0000-0000-000000000001 python E:\Pyramid\repository\general\d04da7f0-194d-49d6-b851-de1272dec2ba.py E:\\Pyramid\\repository\\general\\3a19b566-a6ef-41e8-8387-f3ab492d0ca3.csv E:\\Pyramid\\repository\\general\\d559025a-7019-4f2d-b400-bd6def866f0b.txt E:\\Pyramid\\repository\\general\\1b5012da-2f68-4e65-9f34-19823c453e0d.paModel '' E:\\Pyramid\\repository\\general\\57b7beff-b3c8-43c8-ad49-dfb9ae6d4ac0.txt ERROR conda.cli.main_run:execute(41): `conda run python E:\Pyramid\repository\general\d04da7f0-194d-49d6-b851-de1272dec2ba.py E:\\Pyramid\\repository\\general\\3a19b566-a6ef-41e8-8387-f3ab492d0ca3.csv E:\\Pyramid\\repository\\general\\d559025a-7019-4f2d-b400-bd6def866f0b.txt E:\\Pyramid\\repository\\general\\1b5012da-2f68-4e65-9f34-19823c453e0d.paModel '' E:\\Pyramid\\repository\\general\\57b7beff-b3c8-43c8-ad49-dfb9ae6d4ac0.txt` failed. (See above for error) Traceback (most recent call last): Pyramid Script, line 97, in <module> __pyramid_eval_out.write(__pyramidEval) TypeError: write() argument must be str, not None

6 replies

null
    • Nadav
    • 1 yr ago
    • Reported - view

    Hi Ilya. The direct cause of the error you are getting is that pyramid_eval function returns None while it supposed to return a string. I can't see the pyramid_eval function in the post, I only see pyramid_learn, but if you didn't change the default code then the functions only has the templated comments and no actual code that does evaluation, so the return value is therefore None.

    I recommend using the marketplace icon (next to the Learn & Predict Script dropdown) to select a real script with implementation,  like KNN. Then the script will be operational, and you'll be able to modify it to fit your specific needs.

    Hope that helps. If it's still not working, please copy paste to the post the whole script so I could help you with it more efficiently. 

      • Ilya_Tsyrulnik
      • 1 yr ago
      • Reported - view

      Nadav    many thanks for the explanation.

      My goal is to perform only dataset split on the table generated in the previous steps of data flow.  To do so I connect Python Node. 

      Can you please give an exmaple of configuration? Is there a def pyramid function for test split?
       

      I understand normally something like that should work, but not sure how to configure the node :

      import pandas as pd
      from sklearn.model_selection import train_test_split

      data = pd.read_csv('your_dataset.csv')

      features = ['Column1', 'Column2']

      X = data[features]
      y = data['predict']

      X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

      output['X_train'] = X_train
      output['X_test'] = X_test
      output['y_train'] = y_train
      output['y_test'] = y_test

      • Lara_Grysczyk
      • 1 yr ago
      • Reported - view

      Hi Ilya .
      What you are trying to achive, you need to use instead of the Learn and Predict Script, the regular Script, as you are not doing any machine learning yet. You are just preparing your data.

      You could do this via a python script or with the pre-built blocks by Pyramid.

      For the latter option, you first need to give each column a random number. E.g. from 0-9 (that are 10 numbers). Then in the next step, you can filter for e.g. Number 1 and 2 for your 0.2 Set. With that you have your columns split and randomized.

      Then in the next step, you could use the Learn and Predict script to conduct your Machine Learning Alogrithm.  Although there are multiple prebuild blocks by Pyramid,  you also have prebuild scripts in the Pyramid Marketplace where you can modify more.

       

       

      If that did not answer your question, don't hesitate to reach out

      Lara :)

      • Ilya_Tsyrulnik
      • 1 yr ago
      • Reported - view

      Lara Grysczyk many thanks for your help. Is the distrubution of random values 0-9 equal? I mean when 10 values are assigned each value occurrence in the dataset is 10% ?

      • Lara_Grysczyk
      • 1 yr ago
      • Reported - view

      Hi Ilya Tsyrulnik ,

      I just checked on what I did there:

      So as you can see it is not evenly distributed between 0 until 7. But when I checked all the records I had in total 10585 records before the split, resulting in 79,2% of data after the split. So the split was kind of accurate.
      But you could try it with the regular Python script as well

    • Lead Software Engineer
    • Vijayan_Krishnan
    • 1 yr ago
    • Reported - view

    Please select python environment variable

Content aside

  • Status Answered
  • 1 yr agoLast active
  • 6Replies
  • 55Views
  • 4 Following