1

Publication Performance

I have a 5 page Publication with over 300 dynamic text elements and 4 Discovers.  I work in Higher Education and this is an academic program profile that looks at 5 years of data for a degree major.  Filters consist of College, Department, and Major.

All of the data is coming from 6 Microsoft OLAP Cubes.

Generating one Publication as a PDF takes between 5 and 6 minutes. This is too long.

 

We have a pretty vanilla installation of Pyramid 2023 on a single server. Server specs are an Intel Xeon CPU E5-2670 2.60 GHz  (12 processors) and 20 GB of RAM. OS is Windows 2022 Server.  

 

Is there anything that can be done to reduce the Publication runtime either from a Pyramid settings perspective or a hardware perspective?

 

Thanks!

8 replies

null
    • NPANS
    • 3 mths ago
    • Reported - view

    Please provide more context on your environment and setup:

    1.  Who else and what else is using this machine when these things are being processed? Are other people using this solution at the same time?
      1. A single server is not the minimum recommended approach for Pyramid (check the scaling guide in help). 
      2. Where is the database repository? it should not be on the same machine. It can slow things down considerably.
      3. How many threads are you running concurrently? Too many concurrent threads, on a single machine no less,  will slow things down.
    2. Have you looked at the query times in the transaction log.
      1. How long are the queries taking to resolve on your cubes?
      2. Are your cubes performing well in general?
    3. How many reports are you generating per batch?
    4. Is the server a VM? If so, is it sharing cores with other VM's, such that you do not truly have 12 cores fully at your disposal?
    5. The 300 text elements are being driven off the 4 queries:
      1. If you have very large queries to answer so many figures, you should change them into several smaller queries that do less work. Large queries with irrelevant data impact performance - unless you are using every element to resolve the 300 items.
      2. Are your 300 items reading 300 independent objects and building them into sentences? You are better off constructing larger, whole sentences with multiple values that resolve from multiple data points, producing fewer DT objects.
    • Ray_Buechler
    • 3 mths ago
    • Reported - view

    Thanks for the response. My answers to your questions are in blue.

     

    Please provide more context on your environment and setup:

    1.  Who else and what else is using this machine when these things are being processed? Are other people using this solution at the same time?

      Currently this is on our development server. We have 4 users who have access to this server. Normally only one person, usually me, is using the server when this publication is being processed.
      1. A single server is not the minimum recommended approach for Pyramid (check the scaling guide in help). 

        We have 198 total users and have no where near 25% concurrent use in our production environment.
      2. Where is the database repository? it should not be on the same machine. It can slow things down considerably.

        The database repository is on a separate server.
      3. How many threads are you running concurrently? Too many concurrent threads, on a single machine no less,  will slow things down.

        There are about 4,000 concurrent threads
    2. Have you looked at the query times in the transaction log.
      1. How long are the queries taking to resolve on your cubes?

        It's taking the queries 223,030 MS to run
         
      2. Are your cubes performing well in general?

        I am not a SSAS expert but we do have an outside consultant that monitors the health and performance of our cubes and ensure that they are running optimally. 
    3. How many reports are you generating per batch?

      1 report per batch is being run
    4. Is the server a VM? If so, is it sharing cores with other VM's, such that you do not truly have 12 cores fully at your disposal?

      It is a VM. I will have to do some checking with our IT team on the VM setup.
    5. The 300 text elements are being driven off the 4 queries:

      Let me try to provide a little more context. There are 3 filters (College, Department, Major) on the Publication but it is the Major filter that is doing most of the heavy lifting. The College filter is only being used to populate a page heading placeholder, the Department filter is also used to populate the page heading plus filter 1 Discover and 1 dynamic text element. Major is what filters the rest of the data.
      1. If you have very large queries to answer so many figures, you should change them into several smaller queries that do less work. Large queries with irrelevant data impact performance - unless you are using every element to resolve the 300 items.

        I don't think these are particularly large queries. I am using approximately 32 Discovers to populate the dynamic text. These Discovers are looking at data for a 5 year period for undergraduate students filtered by Major. 
      2. Are your 300 items reading 300 independent objects and building them into sentences? You are better off constructing larger, whole sentences with multiple values that resolve from multiple data points, producing fewer DT objects.

        I am using Matrix tables with dynamic text populating the the cell values. I have attached a screenshot to show you what I am doing.

    • NPANS
    • 3 mths ago
    • Reported - view

    Based on your responses, the most obvious issue is the query times.  If the queries are taking 223 seconds, or just under 4 minutes to run, then Pyramid itself is only consuming about 1minute of your 5minute processing time (80% vs 20%) - which is not too bad considering it needs to read, compile and render a PDF with at least 300 objects in it. This means your cubes are likely the limiting factor. It's worth trying to work out why the queries are so slow. 

    Your responses are not quite clear on concurrency: 4000 threads sounds really wrong. It should be far lower. You have only got 12 CPUs on this machine. This could be a contributing factor. You're stretching your processing too much and its working against you. 

    Last comment: using dynamic text to construct your one example grid is awfully cumbersome. If all these figures are coming from 1 cube, you could easily build the entire grid as a single query. It should respond far quicker; the results will be easier to display and maintain. You can easily construct these values using formulas and then just build a grid that pulls all the formulas into that singular query. Another approach, if you have licensed it, is to use the Tabulate tool. It's easier than dynamic text for this purpose. 

    If you cannot resolve some of this, you may have top contact Pyramid support or the CS team to help you find the issue. We run reports that are 450-500 pages long, with about 1000 different queries, visualizations and dynamic text. They run on relational data but render in about 8 minutes each. Longer, but the workload is far bigger. And for scale, we run the reports with the task engine running on a dedicated server in the cluster.

      • Ray_Buechler
      • 3 mths ago
      • Reported - view

       

       

      Based on your responses, the most obvious issue is the query times.  If the queries are taking 223 seconds, or just under 4 minutes to run, then Pyramid itself is only consuming about 1minute of your 5minute processing time (80% vs 20%) - which is not too bad considering it needs to read, compile and render a PDF with at least 300 objects in it. This means your cubes are likely the limiting factor. It's worth trying to work out why the queries are so slow. 

      I will definitely investigate this. It looks like around 40 our of 300 plus queries account for almost half the total query runtime.

      Your responses are not quite clear on concurrency: 4000 threads sounds really wrong. It should be far lower. You have only got 12 CPUs on this machine. This could be a contributing factor. You're stretching your processing too much and its working against you. 

      Maybe I am looking at the wrong thing. The number I provide was CPU threads.

      Last comment: using dynamic text to construct your one example grid is awfully cumbersome. If all these figures are coming from 1 cube, you could easily build the entire grid as a single query. It should respond far quicker; the results will be easier to display and maintain. You can easily construct these values using formulas and then just build a grid that pulls all the formulas into that singular query. Another approach, if you have licensed it, is to use the Tabulate tool. It's easier than dynamic text for this purpose. 

      It is definitely cumbersome. We do not have Tabulate, unfortunately it is cost prohibitive for us. 

      I guess I am not sure what you mean by building the grid as a single query. In the example I provided there are 3 dimensions (Major, Student Type, and Ethnicity) and I could put them on a single Discover but the way that those dimensions get nested you end up with something that visually does not work 

      If you cannot resolve some of this, you may have top contact Pyramid support or the CS team to help you find the issue. We run reports that are 450-500 pages long, with about 1000 different queries, visualizations and dynamic text. They run on relational data but render in about 8 minutes each. Longer, but the workload is far bigger. And for scale, we run the reports with the task engine running on a dedicated server in the cluster.

      Thanks again for the insight.  It appears that something is definitely slowing things down for us.

      • "making the sophisticated simple"
      • AviPerez
      • 3 mths ago
      • Reported - view

       On point #3  - the complexity of your various queries is what can be solved quite easily -using the Formulate tools. The way I read this is that the columns are dates (or pseudo dates) from 1 hierarchy. Everything else, going down the rows, can be construed as different measures - each with their own complex mixture of selections and elements. These custom measures are easily built, point-and-click in Formulate. 

      For instance, using Adventure Works cubes as an example, I have this grid showing years from the date dimension (similar to you). Then I have these measures "dp1" and "dp2" on the rows. These are not standard measures - they're composite measures. 

      In fact, dp1 is Sales (measures) for the United States (Geography dimension).

      While dp2, is completely independent of dp1. It is expenses (measures) for Australia (geography dimension) and bikes (Product dimension). 

      Together, we are showing a bunch of different "data points" or "tuples" by each year in the original grid above. We're mixing up selections and logic without building multiple different queries. 

      This also produces a single query - which should be much faster than multiple queries - delivered in a single grid that just needs some custom formatting (very do-able in Pyramid). Much easier than gluing it together as dynamic text.

      • Ray_Buechler
      • 3 mths ago
      • Reported - view

      @Avi Perez

      Thanks Avi. Just so I am clear in my example. I would have Formulates for  Continuing, New Freshman, New Transfer, Readmit/Returning, etc. that I would add to a formulate that I would then use on the Publication?

      • "making the sophisticated simple"
      • AviPerez
      • 3 mths ago
      • Reported - view

       Yes, Continuing, New Freshman, New Transfer, Readmit/Returning etc. are all separate "formulas" - which would produce "custom measures". You use the "Formulate" tool (orange) to build out each formula and save each as a separate item into your content folders. Then you build a new query in Discover (green) with time on the columns and a selection of all your new measure formulas. It will build a singular query and should be a lot more efficient overall. It will also be a lot easier to manage/change and should be far simpler to format. 

      • So one formula, representing "Asian" above (and guessing on my part) would be a simple data point made up of measure .studentCount and Ethnicity.asian.
      • Another one: "Hispanics" above would be another simple data point: measure .studentCount and Ethnicity.hispanic
      • Another one: "Transfer" would be another simple data point: measure .studentCount and StudentType.transfer
      • Another one: "Freshman" would be another simple data point: measure .studentCount and StudentType.freshman

      These would be  4 separate formulas. Build each one. Save them. Then add them as the measures to your grid. And you'll recreate what you have above. 

      • Ray_Buechler
      • 3 mths ago
      • Reported - view

       Thanks. That's what I thought you meant. I should be able to use this approach with portions of the Publication. 

Content aside

  • Status Answered
  • 1 Likes
  • 3 mths agoLast active
  • 8Replies
  • 54Views
  • 4 Following