Trimming and Cleaning Whitespace with Python Node in Model
Hello,
I know how to trim and use regex to replace whitespaces in Model, but I thought I'd challenge myself to try to learn and familiarize myself with the script node.
I can run this block of code in my dev environment to get my desired results:
import pandas as pd
import re
df = pd.read_csv("List table - whitespacemiddleofwords-test.csv")
def whitespace_remover(dataframe):
    # iterate columns
    for i in dataframe.columns:
        # check datatype of each columns
        if dataframe[i].dtype == 'object':
            # strip whitespace from middle and ending
            dataframe[i] = dataframe[i].str.replace(r'\s+', ' ').str.strip()
        else:
            pass
whitespace_remover(df)
df.to_csv('no whitespace.csv', index=False)
           
When I try to appropriate this into the python node, this block gives me an output that is NULL:
import pandas as pd
import re
outputDF = pd.DataFrame()
def whitespace_remover(dataframe):
    for i in dataframe.columns:
        if dataframe[i].dtype == 'object':
            dataframe[i] = dataframe[i].str.replace(r'\s+', ' ').str.strip()
        else:
            pass
df = whitespace_remover(inputDF['Param1'].to_frame())
outputDF['Column1'] = df
           

I've tried other variations without the function and the output is the same or it copies the original column exactly without removing the whitespaces/trimming.
I'm trying to create something like the below output:

Any help will be greatly appreciated!!
2 replies
- 
           
Hi Stefano,
Seems like the function is not returning the df.
Please try this and let me know if it works.
import pandas as pd
def whitespace_remover(dataframe):
for i in dataframe.columns:
if dataframe[i].dtype == 'object':
dataframe[i] = dataframe[i].str.replace(r'\s+', ' ', regex=True).str.strip()
return dataframedf = whitespace_remover(inputDF[['col']])
outputDF = pd.DataFrame()
outputDF['Column1'] = df['col'] - 
           
Hi ,
Works! Thank you so much! I should have known to return the result -_-;
I didn't realize I needed to index twice into 'col'.
Thanks again!