Querying Unstructured Data with Pyramid
The lack of effective methods for how to analytically use unstructured data is a common problem that we often hear from our customers. Getting the data in, is proving to be far simpler than getting the analysis of the data out. Pyramid now has an answer - starting with data connectors for MongoDB.
Unstructured vs Structured
It seems many organizations have found numerous data related problems can be solved effectively with the simplicity and speed of unstructured data stores - especially in the realm of IoT. Since the target system requires no formalized structure, it is significantly faster to store the loads of incoming data into a framework that is not concerned with how it looks, its data types, structure or even its size. This makes the 'storing of data' business significantly simpler (and more cost effective) than the relational data business: where data needs to be carefully inserted into databases and tables; it needs to meet data type and sizing standards; and it needs to be committed carefully to ensure it can be rolled back in the event of failure. The unstructured universe, in many respects, is unconcerned with most of these 'requirements'.
Indeed, it goes a step further. What would be related data added to multiple tables in a relational world, can be stored as a simple parcel of data in a single addition to the unstructured system - using 'nested' data fragments and records. Again, simplifying the data capture process dramatically.
Anybody in the data business will comfortably admit that there is no value collecting data if you cannot get some type of insight from it. That includes accounting data to see the status of a business; web logs to see which pages are most visited; and fitness device recordings to monitor your health. Or put simply, if we cannot analyze the data to derive some understanding, the collection of it is pointless. In business terms, there needs to be a way to monetize the information - usually done with numerous tools and applications; some specialized and others generic.
Using structured data in classic business analytics is the easiest example of such monetization - a very mature concept that is almost universally embraced. However, the same cannot be said for unstructured data. Marrying classic business analytic tools and technologies with the fluidity of unstructured data has proven to be a challenge.
And the reason is obvious: classic data analytic technologies today require structure.
There are various intermediate tools that address the gap, by using specialized tools for 'scanning' unstructured data; proposing a likely structure; and, allowing the analysis tools to query them in a common language. Pyramid with MongoDB's BI Connector, is one such example. (Pyramid with Apache Presto or Apache Drill are others).
The connector, using a 'probabilistic schema' model, allows Pyramid to directly query the unstructured MongoDb data sources giving end users the ability to analyze the data in the same way as analyzing a relational/structured data stack.
Whether the data is stored as flat fragments, as separated entities or nested data fragments in more complicated structures, the tools are able to decipher the schemas and present them to the end users for analysis.
Pyramid and MongoDb
What makes the Pyramid implementation of this solution interesting, however, is that we have brought the same level of simplicity and elegance available on all our data sources to MongoDb.
Using Pyramid's virtual schema and modeling framework, classic joins and metrics can be used to embellish the schema - giving the end-user the simplicity and freedom to analyze MongoDb data the same way they might analyze data in MS SQL Server, Oracle, Teradata or SAP Hana.
In Pyramid's discovery tool, MongoDb data is treated like any other data source, with all associated analytic and visualization capabilities intact. We can build custom formulas, lists and KPIs against the data, while putting it into presentations, dashboards and analytic publications like any other data source.
By giving unstructured data a full seat at the proverbial main table, users can start unlocking the many interesting insights captured in the data itself.