Python and Excel
What is the best way to read Excel (XLS) files with Python (not CSV files).
Is there a built-in package that is supported by default in Python to do this task?
2 replies
-
Hi Konoha,
There are many and varied Python packages for handling Excel files. Note that .xls and .xlsx are very different formats and require different Python packages to handle.
The standard Pyramid Python virtual machine (VM) does not include any of these packages but they can be added via the administration console. You can check what packages are installed in the potentially multiple Python VMs managed by Pyramid:
Choose the Python VM you want (there may only be one) and click on the Packages button. This will list the Python packages currently installed in that VM:
You can add additional packages to the VM via the administration console. If you don't have access to this, please consult your system administrator:
You will have to research and find the most suitable Python package for what you want to do, then add it to the Pyramid Python VM. Here's a link to get you started:
Hope that helps.
Ian
-
The quickest way I have found to do it is to use the glob2 library. Here is an example that will read in all XLS files in a particular directory and load the results into a data frame that can be used in the model builder.
import xlrd import glob2 import numpy as np import pandas as pd all_files = glob2.glob('C:\Data\*.xls') df = pd.DataFrame() for f in all_files: data = pd.read_excel(f) data['File']=f df = df.append(data)
Here is a screenshot of the code being used in a model:
By the way, the latest versions of the software will load multiple Excel files using the Multi-File node now (version 2020.13 or higher)