Best Practice for Isolating Self-Service Workloads in a Single Installation
Hello PA-Team,
I'm looking for advice on a common enterprise challenge: how to ensure the stability of my critical corporate reports in a mixed-use environment where self-service users are also active.
The Problem:
In my single Pyramid installation, I have both corporate reports (running on a dedicated In-Memory server) and self-service models. I've experienced situations where a heavy query from a self-service user overloads the runtime servers, causing the entire installation to become unstable. This directly impacts my business-critical corporate reporting.
My Goal:
I want to isolate the workloads to prevent self-service activity from affecting corporate report availability, within a single Pyramid installation.
I know that setting up separate Pyramid instances is one solution, but for operational reasons, I am looking for ways to manage this within one environment.
The Question:
What are the recommended best practices or architectural patterns within Pyramid to achieve this kind of workload isolation?
For example, is there any way to influence the router's behavior to dedicate specific runtime or task servers to specific data sources (like a particular In-Memory server)? Or are there other methods for resource governance that I might be missing?
Any insights or alternative solutions would be greatly appreciated.
Thanks
8 replies
-
Hi
It's not clear from your description whether you are running a single Pyramid instance on one server, or a single Pyramid instance across multiple servers.
In small deployments it is possible to run all of the Pyramid services on one machine, although we would always recommend splitting out the In Memory DB engine and the database engine hosting the Pyramid repository to their own dedicated servers.
After that, I recommend you consulting our Help page on scaling Pyramid deployments and reading through the PDF Scaling Guide downloadable from that page.
Come back here with further questions if required.
Hope that helps.
Ian
-
Hi
Currently you cannot allocate a specific runtime engine to a specific data source. You could have multiple IMDB servers though and reserve one for your production solutions and the other(s) for self service / sand box use. That way at least the IMDBs can be isolated between production and self serve.
With regard to balancing the load, take a read of this Help page, in particular the RunTime servers load balancing algorithm settings and their failover strategy. Judicious selection of these settings may help you, although which selections will give you the best scenario will be a bit of an art and trial and error based on your workloads.
There are also detailed settings for the RunTime engine that you can read up on and potentially tweak to help you as well.
Hope that helps.
Ian
-
For real time queries:
99.9% of the workload in almost all querying in Pyramid is handled by the data source. Therefore the primary focus for ensuring critical query performance revolves around the servers powering those data sources. And if you use Pyramid's in-memory database, ensure its well resourced to get the best performance, running on its own hardware.
Earmarking the application tier to specific data sources is a fool's errand and will produce a complex maintenance mess, tremendous inefficiency and very little impact (if any). There is no need for a workaround - just ensure that the main application tiers are ample for the volume of queries flowing through it. If you are still having performance problems focus on the data-source servers.
For publications and ETL/data flows:
The resources of Pyramid are much greater with these activities. That's why the task engine is separate from the runtime - and it can be bifurcated into publication jobs and ETL jobs. This allows you to run 2 or more different task engines for each flavor - each with job queue throttling and the ability to set that by peak/off-peak times. If you find that these activities are eating your real-time querying resources, ensure they are running on their own hardware, they are all scheduled to run off-peak, and lower the number of concurrent jobs that are executed at any one time.
-
Thank you for the detailed architectural insight. I need to clarify that my primary concern here is fault tolerance and stability, not just query performance.
You are correct that the IMDB server is the root cause of the load. However, the symptom I am experiencing is the complete failure (crash) of the runtime servers, which leads to a total application outage for all users (both corporate and self-service).
I appreciate the suggested solutions, but let me explain why they don't solve this specific stability issue:
Isolating IMDB Servers: I already run separate IMDB servers for corporate and self-service models. The problem remains because the runtime server pool is still shared. A single query to the self-service IMDB can crash a runtime server, which then causes a cascading failure that takes down the entire pool, making the corporate reports unavailable.
Scaling Resources: My IMDB servers are already well-resourced according to your recommendations (64+ vCPUs, 3x+ RAM to disk size). However, infinite hardware scaling is not a substitute for a robust fault tolerance mechanism within the software.
So, my question remains: how can I contain a failure? If a self-service query overloads an IMDB and causes a runtime server to crash, how can I prevent that failure from bringing down the entire platform for everyone?
The ability to dedicate specific runtime servers to specific data sources seems like the most direct way to create this "bulkhead" and ensure fault tolerance. I am open to any other architectural patterns you might suggest to solve this problem of failure isolation.
Thanks!
-
if your RTE's are dying, figure out why they are dying.