Best Practice for Isolating Self-Service Workloads in a Single Installation

TKH_TKH
3 mths ago
9replies
Answered

Hello PA-Team,

I'm looking for advice on a common enterprise challenge: how to ensure the stability of my critical corporate reports in a mixed-use environment where self-service users are also active.

The Problem:
In my single Pyramid installation, I have both corporate reports (running on a dedicated In-Memory server) and self-service models. I've experienced situations where a heavy query from a self-service user overloads the runtime servers, causing the entire installation to become unstable. This directly impacts my business-critical corporate reporting.

My Goal:
I want to isolate the workloads to prevent self-service activity from affecting corporate report availability, within a single Pyramid installation.

I know that setting up separate Pyramid instances is one solution, but for operational reasons, I am looking for ways to manage this within one environment.

The Question:
What are the recommended best practices or architectural patterns within Pyramid to achieve this kind of workload isolation?

For example, is there any way to influence the router's behavior to dedicate specific runtime or task servers to specific data sources (like a particular In-Memory server)? Or are there other methods for resource governance that I might be missing?

Any insights or alternative solutions would be greatly appreciated.

Thanks

- VP Product Management
- Ian_Macdonald
- 3 mths ago
- Reported - view
Hi

It's not clear from your description whether you are running a single Pyramid instance on one server, or a single Pyramid instance across multiple servers.

In small deployments it is possible to run all of the Pyramid services on one machine, although we would always recommend splitting out the In Memory DB engine and the database engine hosting the Pyramid repository to their own dedicated servers.

After that, I recommend you consulting our Help page on scaling Pyramid deployments and reading through the PDF Scaling Guide downloadable from that page.

Come back here with further questions if required.

Hope that helps.

Ian
- TKH_TKH
  
  3 mths ago
  
  Reported - view
  Hi Ian,
  
  Thank you for the quick response and the links.
  
  To clarify, my question is specifically about managing workloads within an existing multi-node cluster, which I've set up according to the scaling guide recommendations.
  
  The core issue I'm facing is that all my runtime servers in the cluster seem to form a single, shared pool. This means that a few heavy queries from my self-service users can overload all available runtimes, which in turn makes my critical corporate reports unavailable.
  
  My question is less about physical scaling and more about logical workload isolation.
  
  Is there any way I can configure the Router to dedicate specific runtime servers to specific data sources? For example:
  
  Runtime-Servers 1 & 2 should only handle queries for my Corporate-IMDB.
  
  Runtime-Servers 3 & 4 should handle all other queries (e.g., from my Self-Service-IMDB).
  
  This would allow me to guarantee resources for my critical reports while still providing a sandbox for self-service users, all within a single Pyramid installation.
  
  Thanks
- TKH_TKH
  
  3 mths ago
  
  Reported - view
  Just to follow up: if my suggestion of mapping servers to data sources isn't possible, I'm open to other ideas.
  
  Are there any alternative solutions or workarounds to protect critical reports from heavy self-service queries in a single installation?
  
  Thanks.
- VP Product Management
- Ian_Macdonald
- 3 mths ago
- Reported - view
Hi

Currently you cannot allocate a specific runtime engine to a specific data source. You could have multiple IMDB servers though and reserve one for your production solutions and the other(s) for self service / sand box use. That way at least the IMDBs can be isolated between production and self serve.

With regard to balancing the load, take a read of this Help page, in particular the RunTime servers load balancing algorithm settings and their failover strategy. Judicious selection of these settings may help you, although which selections will give you the best scenario will be a bit of an art and trial and error based on your workloads.

There are also detailed settings for the RunTime engine that you can read up on and potentially tweak to help you as well.

Hope that helps.

Ian
- TKH_TKH
  
  3 mths ago
  
  Reported - view
  Thanks!
- "making the sophisticated simple"
- AviPerez
- 3 mths ago
- Reported - view
For real time queries:

99.9% of the workload in almost all querying in Pyramid is handled by the data source. Therefore the primary focus for ensuring critical query performance revolves around the servers powering those data sources. And if you use Pyramid's in-memory database, ensure its well resourced to get the best performance, running on its own hardware.

Earmarking the application tier to specific data sources is a fool's errand and will produce a complex maintenance mess, tremendous inefficiency and very little impact (if any). There is no need for a workaround - just ensure that the main application tiers are ample for the volume of queries flowing through it. If you are still having performance problems focus on the data-source servers.

For publications and ETL/data flows:

The resources of Pyramid are much greater with these activities. That's why the task engine is separate from the runtime - and it can be bifurcated into publication jobs and ETL jobs. This allows you to run 2 or more different task engines for each flavor - each with job queue throttling and the ability to set that by peak/off-peak times. If you find that these activities are eating your real-time querying resources, ensure they are running on their own hardware, they are all scheduled to run off-peak, and lower the number of concurrent jobs that are executed at any one time.
- TKH_TKH
- 3 mths ago
- Reported - view
Thank you for the detailed architectural insight. I need to clarify that my primary concern here is fault tolerance and stability, not just query performance.

You are correct that the IMDB server is the root cause of the load. However, the symptom I am experiencing is the complete failure (crash) of the runtime servers, which leads to a total application outage for all users (both corporate and self-service).

I appreciate the suggested solutions, but let me explain why they don't solve this specific stability issue:

Isolating IMDB Servers: I already run separate IMDB servers for corporate and self-service models. The problem remains because the runtime server pool is still shared. A single query to the self-service IMDB can crash a runtime server, which then causes a cascading failure that takes down the entire pool, making the corporate reports unavailable.

Scaling Resources: My IMDB servers are already well-resourced according to your recommendations (64+ vCPUs, 3x+ RAM to disk size). However, infinite hardware scaling is not a substitute for a robust fault tolerance mechanism within the software.

So, my question remains: how can I contain a failure? If a self-service query overloads an IMDB and causes a runtime server to crash, how can I prevent that failure from bringing down the entire platform for everyone?

The ability to dedicate specific runtime servers to specific data sources seems like the most direct way to create this "bulkhead" and ensure fault tolerance. I am open to any other architectural patterns you might suggest to solve this problem of failure isolation.

Thanks!
- "making the sophisticated simple"
- AviPerez
- 3 mths ago
- Reported - view
if your RTE's are dying, figure out why they are dying.
- TKH_TKH
  
  3 mths ago
  
  Reported - view
  Hi Avi,
  
  Thank you for the direct feedback. You mentioned that if my RTEs are dying, I need to "figure out why they are dying". I completely agree, but this is where I need your team's expertise.
  
  Could you please provide some guidance on the specific steps I should take to diagnose this?
  
  For example:
  
  What specific logs (and log levels) should I be looking at in Pyramid to find the root cause of an RTE crash due to a query?
  
  Are there particular performance metrics or threads within the runtime server that I should monitor?
  
  Are there any configuration settings (in Pyramid or the OS) that could help make the RTEs more resilient to overload instead of crashing?
  
  My goal is to understand the failure and prevent it, but I lack the deep diagnostic tools to do so effectively. Any "best practices" for this kind of investigation would be incredibly helpful.
  
  Thanks!

Content aside

Status Answered
3 mths agoLast active
9Replies
143Views
3 Following

Best Practice for Isolating Self-Service Workloads in a Single Installation

9 replies

Content aside

Home

Best Practice for Isolating Self-Service Workloads in a Single Installation

9 replies

Content aside

Share

Home