3

Direct Query vs Ingestion: When to Use Each

Choosing the right data connection approach for your Pyramid model

What's the Difference?

Direct Query queries data at its source in real time. Every report interaction sends a query to the source database — no data is copied.

Ingestion uses Pyramid Data Flow to extract, transform, and load data into a target — typically Pyramid's In-Memory Database (IMDB), but it can be any SQL database.

Side-by-Side Comparison

Factor Direct Query Ingestion (Data Flow)
Data freshness Always current As fresh as last refresh
Number of sources One database only* Multiple sources combined
Transformation Minimal - Custom Columns Full control via Data Flow
Query performance Depends on source engine Depends on the target (see note regarding IMDB**)
Cost to source High — every interaction hits source One-time cost per refresh cycle
Setup effort Low — point and connect Higher — design flows, schedule refreshes
Source dependency Reports break if source is down Reports work on cached data

* Notice that while a Direct Query model is limited to one database, Pyramid allows you to mix models in a Present, or Mash data in Tabulate - so this limitation will not always require an ingestion model.

** Pyramid IMDB is a high end In Memory Database - it's capacity is capped by the In Memory limit of 2B rows in a single table, however it's performance is also dependant on the underlying hardware. For very large data sets (hundreds of millions of rows, hundreds of columns, etc.) you may see the performance slowing heavily. read more about it here, or contact our team for further advice.

When to Use Direct Query

Direct Query is the right choice when all three of these conditions are true:

  1. Single source — all data for the model lives in one database.
  2. Source can handle it — the database engine is built for analytical workloads (e.g., Snowflake, BigQuery, Redshift, or a well-tuned data warehouse) - or there just isn't a lot of data.
  3. Query cost is acceptable — frequent, heavy queries against the source won't blow your budget or degrade other workloads.

The main advantage is real-time data with zero pipeline maintenance. If your data warehouse is already built for analytics, Direct Query lets you use that investment directly.

When to Use Ingestion

Use Pyramid Data Flow to ingest when any one of these applies:

  1. Multiple sources — your model combines data from 2+ databases or systems.
  2. Transformation needed — raw data needs cleaning, reshaping, or enrichment before analysis.
  3. Subset of large data — the source is massive but you only need a filtered slice for this use case.
  4. Speed over freshness — you want IMDB query speed rather than depending on a transactional source that isn't optimized for analytics.

Pyramid's Data Flow is built in — you don't need external ETL tools. The trade-off is that data is only as fresh as your last refresh, and you need to design and maintain the flows.

Quick Decision Flowchart

1. Do you need data from more than one source?

  • Yes → Use Ingestion.
  • No → continue.

2. Do you need to transform or filter the data before analysis?

  • Yes → Use Ingestion.
  • No → continue.

3. Is your source an analytics-grade database that can handle the query load at acceptable cost?

  • Yes → Direct Query works.
  • No → Use Ingestion to offload queries to IMDB.

Example Scenarios

All data in Snowflake, real-time dashboard → Direct Query. Single source, engine built for analytics, data freshness matters. Skip pipeline maintenance and always show current numbers.

Sales in Salesforce + inventory in SAP + finance in a data warehouse → Ingestion. Multiple sources need to be combined. Use Data Flow to consolidate nightly into IMDB and build one unified model.

Large transactional database, only need 6 months for reporting → Ingestion. Source isn't optimized for analytics and the full dataset is too heavy. Ingest a filtered subset into IMDB for fast, cost-effective querying.

Key Takeaways

Direct Query = real-time, single-source, zero pipeline overhead — but your source database needs to be up to the task.

Ingestion via Data Flow = multi-source, transformed, optimized — but data freshness depends on your refresh schedule.

You can use both. Many teams use Direct Query for operational dashboards that need live data, and Ingestion for historical or cross-system analytics.

Reply

null