Using Python or R to pivot excel workbook
We have a workbook which has a list of names, and then a column with their performance each week i.e.
John Smith, 1,4,10,8,4,6, Finance, Analyst
with column headers being name, week 1, week 2, week 3, Week 4, Week 5, Week 6, Department, Role
I am wondering if its possible if someone could help me "pivot" this data so instead it shows
John Smith, Finance, Analyst, Week 1, 1
John Smith, Finance, Analyst, Week 2, 4
John Smith, Finance, Analyst, Week 3, 10
and so fourth.
With headers being Name, Department, Role, Week, Score
Is there any possibility anyone is comfortable with R / Python and helping me do this? I managed to do it in R using reshape2 however no idea how I'd import it.
Thanks
Nick
4 replies
-
Nicholas Campbell
You can use pivot function in Python script.
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pivot.html for more details.
-
Hi Nicholas,
I'm adding an explanation for using the scripting nodes in pyramid:
Create a data flow , upload the excel file you want to pivot.
Connect a python/ R node to the data you want to process.
You can choose specific columns as input or just click on add all button in order to use the all input data.
In your script you can now process the input DataFrame ( it's name is the one that typed in the Data Frame field under the input section/card).
In the script section remember to import the packages you want to use, if you need to install a specific package type it's name in the packages text boxes.
Put the processed data inside a new DataFrame, this DataFrame will be the data that will come out of the node.
Remember to declare the script output structure in the output section, you can declare the output columns by yourself or just use the auto Detect option and let us do the magic (click on auto detect button > type the name of the output DataFrame in the text field).
In the output options you have two options:
* add to existing table- in order to append your output DataFrame to the input data of the node.
* create new table- In case you want your output DataFrame will be the output of the node.
Let me know if you need further assistance,
Daniel
-
Hi Nick,
An exemple of unpivot function with Python :
import pandas as pd data =pd.DataFrame({'Etab':Param1,'ADM': Param2,'LOG': Param3,'EDUC':Param4,'MED':Param5,'SOIN':Param6}) OutputDF=data.melt(id_vars='Etab', var_name='Fil', value_name='Hor')
You have to adjust names of Output.
I hope that helps you.
Laurent
-
Assuming this is the structure of the data: Department Name Role Week1 Week2 Week3 Week4 Week5 Week6 0 Finance John Smith Analyst 1 4 10 8 4 6 1 Money Cool Guy The Boss 1 4 10 8 4 6 this script I believe does what you need (inputDF is the input, outputDF the output): ############ outputData = [] for i in range(0,len(inputDF)): curr = inputDF.iloc[i] for weekIndex in range(1,7): outputData.append( {'name': curr['Name'], 'WeekIndex': weekIndex,'Value': curr['Week' + str(weekIndex)], 'Department': curr['Department'], 'Role': curr['Role']}) outputDF = pd.DataFrame(outputData) ############ print of the results : Department Role Value WeekIndex name 0 Finance Analyst 1 1 John Smith 1 Finance Analyst 4 2 John Smith 2 Finance Analyst 10 3 John Smith 3 Finance Analyst 8 4 John Smith 4 Finance Analyst 4 5 John Smith 5 Finance Analyst 6 6 John Smith 6 Money The Boss 1 1 Cool Guy 7 Money The Boss 4 2 Cool Guy 8 Money The Boss 10 3 Cool Guy 9 Money The Boss 8 4 Cool Guy 10 Money The Boss 4 5 Cool Guy 11 Money The Boss 6 6 Cool Guy