Splitting dataset Train Error

Ilya_Tsyrulnik
updated 1 yr ago
6replies
Answered

I am trying to spilt dataset into a train and test using Python Node and Learn&Predict.

I get an error below related to pyramid_eval_out.write. How can it be fixed?

Error

PyramidException: scripting error: Scripting process execution for the following command has failed:

E:\Pyramid\python\Scripts\conda run -p E:\Pyramid\repository\general\scripting_Environments\00000000-0000-0000-0000-000000000001 python E:\Pyramid\repository\general\d04da7f0-194d-49d6-b851-de1272dec2ba.py E:\\Pyramid\\repository\\general\\3a19b566-a6ef-41e8-8387-f3ab492d0ca3.csv E:\\Pyramid\\repository\\general\\d559025a-7019-4f2d-b400-bd6def866f0b.txt E:\\Pyramid\\repository\\general\\1b5012da-2f68-4e65-9f34-19823c453e0d.paModel '' E:\\Pyramid\\repository\\general\\57b7beff-b3c8-43c8-ad49-dfb9ae6d4ac0.txt ERROR conda.cli.main_run:execute(41): `conda run python E:\Pyramid\repository\general\d04da7f0-194d-49d6-b851-de1272dec2ba.py E:\\Pyramid\\repository\\general\\3a19b566-a6ef-41e8-8387-f3ab492d0ca3.csv E:\\Pyramid\\repository\\general\\d559025a-7019-4f2d-b400-bd6def866f0b.txt E:\\Pyramid\\repository\\general\\1b5012da-2f68-4e65-9f34-19823c453e0d.paModel '' E:\\Pyramid\\repository\\general\\57b7beff-b3c8-43c8-ad49-dfb9ae6d4ac0.txt` failed. (See above for error) Traceback (most recent call last): Pyramid Script, line 97, in <module> __pyramid_eval_out.write(__pyramidEval) TypeError: write() argument must be str, not None

python

- Nadav
- 1 yr ago
- Reported - view
Hi Ilya. The direct cause of the error you are getting is that pyramid_eval function returns None while it supposed to return a string. I can't see the pyramid_eval function in the post, I only see pyramid_learn, but if you didn't change the default code then the functions only has the templated comments and no actual code that does evaluation, so the return value is therefore None.

I recommend using the marketplace icon (next to the Learn & Predict Script dropdown) to select a real script with implementation, like KNN. Then the script will be operational, and you'll be able to modify it to fit your specific needs.

Hope that helps. If it's still not working, please copy paste to the post the whole script so I could help you with it more efficiently.
- Ilya_Tsyrulnik
  
  1 yr ago
  
  Reported - view
  Nadav many thanks for the explanation.
  
  My goal is to perform only dataset split on the table generated in the previous steps of data flow. To do so I connect Python Node.
  
  Can you please give an exmaple of configuration? Is there a def pyramid function for test split?
  
  I understand normally something like that should work, but not sure how to configure the node :
  
  import pandas as pd
  from sklearn.model_selection import train_test_split
  
  data = pd.read_csv('your_dataset.csv')
  
  features = ['Column1', 'Column2']
  
  X = data[features]
  y = data['predict']
  
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  
  output['X_train'] = X_train
  output['X_test'] = X_test
  output['y_train'] = y_train
  output['y_test'] = y_test
- Lara_Grysczyk
  
  1 yr ago
  
  Reported - view
  Hi Ilya .
  What you are trying to achive, you need to use instead of the Learn and Predict Script, the regular Script, as you are not doing any machine learning yet. You are just preparing your data.
  
  You could do this via a python script or with the pre-built blocks by Pyramid.
  
  For the latter option, you first need to give each column a random number. E.g. from 0-9 (that are 10 numbers). Then in the next step, you can filter for e.g. Number 1 and 2 for your 0.2 Set. With that you have your columns split and randomized.
  
  Then in the next step, you could use the Learn and Predict script to conduct your Machine Learning Alogrithm. Although there are multiple prebuild blocks by Pyramid, you also have prebuild scripts in the Pyramid Marketplace where you can modify more.
  
  If that did not answer your question, don't hesitate to reach out
  
  Lara :)
- Ilya_Tsyrulnik
  
  1 yr ago
  
  Reported - view
  Lara Grysczyk many thanks for your help. Is the distrubution of random values 0-9 equal? I mean when 10 values are assigned each value occurrence in the dataset is 10% ?
- Lara_Grysczyk
  
  1 yr ago
  
  Reported - view
  Hi Ilya Tsyrulnik ,
  
  I just checked on what I did there:
  
  So as you can see it is not evenly distributed between 0 until 7. But when I checked all the records I had in total 10585 records before the split, resulting in 79,2% of data after the split. So the split was kind of accurate.
  But you could try it with the regular Python script as well
- Lead Software Engineer
- Vijayan_Krishnan
- 1 yr ago
- Reported - view
Please select python environment variable

Content aside

Status Answered
1 yr agoLast active
6Replies
73Views
4 Following

Splitting dataset Train Error

6 replies

Content aside

Home

Splitting dataset Train Error

6 replies

Content aside

Share

Home