0

Trimming and Cleaning Whitespace with Python Node in Model

Hello,

 

I know how to trim and use regex to replace whitespaces in Model, but I thought I'd challenge myself to try to learn and familiarize myself with the script node. 

 

I can run this block of code in my dev environment to get my desired results:

import pandas as pd
import re

df = pd.read_csv("List table - whitespacemiddleofwords-test.csv")

def whitespace_remover(dataframe):
    # iterate columns
    for i in dataframe.columns:
        # check datatype of each columns
        if dataframe[i].dtype == 'object':
            # strip whitespace from middle and ending
            dataframe[i] = dataframe[i].str.replace(r'\s+', ' ').str.strip()
        else:
            pass

whitespace_remover(df)
df.to_csv('no whitespace.csv', index=False)

 

When I try to appropriate this into the python node, this block gives me an output that is NULL:

import pandas as pd
import re

outputDF = pd.DataFrame()

def whitespace_remover(dataframe):
    for i in dataframe.columns:
        if dataframe[i].dtype == 'object':
            dataframe[i] = dataframe[i].str.replace(r'\s+', ' ').str.strip()
        else:
            pass

df = whitespace_remover(inputDF['Param1'].to_frame())

outputDF['Column1'] = df

I've tried other variations without the function and the output is the same or it copies the original column exactly without removing the whitespaces/trimming.

 

I'm trying to create something like the below output:

 

Any help will be greatly appreciated!!

2 replies

null
    • Solutions Engineering Manager
    • Kelly_Lu
    • 8 days ago
    • Reported - view

    Hi Stefano, 

    Seems like the function is not returning the df.

    Please try this and let me know if it works. 

    import pandas as pd

    def whitespace_remover(dataframe):
        for i in dataframe.columns:
            if dataframe[i].dtype == 'object':
                dataframe[i] = dataframe[i].str.replace(r'\s+', ' ', regex=True).str.strip()
        return dataframe  

    df = whitespace_remover(inputDF[['col']]) 

    outputDF = pd.DataFrame()
    outputDF['Column1'] = df['col']

    • Stefano_Sena
    • 8 days ago
    • Reported - view

    Hi  ,

    Works! Thank you so much! I should have known to return the result -_-;

    I didn't realize I needed to index twice into 'col'. 

    Thanks again!

Content aside

  • Status Answered
  • 7 days agoLast active
  • 2Replies
  • 19Views
  • 2 Following