Concerns about Python Access
Hello,
On our Pyramid installation, we're looking to host multiple tenants. However, we've encountered an issue where we can access the host's file system through Pyramid, regardless of the security settings in place. This poses a significant security risk as a tenant's data may not be secure.
Through the Python node in Model, we're able to access everything that the user used to run Pyramid. This includes full access to the file system, including the IMDB repositories. Below is an example:
Here's a script that reads the file system, providing all files, the current user, and file access details:
import os
import subprocess
import pandas as pd
import getpass
def list_files(start_path):
file_info = []
for root, dirs, files in os.walk(start_path):
for name in files:
file_path = os.path.join(root, name)
ls_output = subprocess.check_output(['ls', '-ahl', file_path]).decode('utf-8')
user_name = getpass.getuser()
file_info.append({'File_Name': name, 'File_Path': file_path, 'User_Name': user_name, 'LS_Output': ls_output})
for name in dirs:
dir_path = os.path.join(root, name)
ls_output = subprocess.check_output(['ls', '-ahl', dir_path]).decode('utf-8')
user_name = getpass.getuser()
file_info.append({'File_Name': name, 'File_Path': dir_path, 'User_Name': user_name, 'LS_Output': ls_output})
return file_info
file_info_list = list_files('/opt/Pyramid/repository/imdata')
df = pd.DataFrame(file_info_list)
Snippet of the result:
These are the IMDB's.
The issue here is that we can gain access to any file on the host and read them if we have the proper access. For the IMDBs, the setup allows for easy creation of a connection to an IMDB, potentially allowing access to another tenant's data.
How can we ensure that this does not happen and that our clients' data remains secure?
8 replies
-
Pyramid includes a variety of capabilities (Python, R, command line access, SQL) that can all be abused if not managed: "with great power comes great responsibility".
Thats the point of an all-powerful analytics platform: it comes with the tools necessary to wire up complex sophisticated workflows and functions in the company. It's not entirely unique to Pyramid or analytics (you could use the Python engine in Excel to delete a user's hard drive if you wanted to).
So first, if you can't trust your users, remove their access to those features through user and role profiles. You cannot hand open-ended Python scripting to untrustworthy people and expect the platform to magically determine what they are doing with it - regardless of use case or scenario. So too with the other power capabilities in Model.
Secondly, you can install IMDB on separate machines from the servers that run Python and R (DSML service). That way no one can access any IMDB files on those servers. BTW, this also happens to be the correct approach to a deployed Pyramid cluster.
-
To block access to Python capabilities for users, turn off access to these items:
-
As it can be difficult to anticipate the resources consumed by the IMDB service, it is recommended practice for production systems to isolate the IMDB service on its own server so that it does not impact the core Pyramid analytic services.
Hope that helps.
Ian