PDF / Document files as a Data Source in Modeller
 
           Hi Avi / Pyramid Dev team,
Often, information useful for a Pyramid client is stored in PDF or other document formats - for example where the Pyramid client itself has many suppliers who do not interact via any type of direct data interchange.
I deal with many customers who would be good examples of this use case - smaller retailers who purchase goods from numerous wholesale suppliers and receive invoices in a similar format to those attached here - PDFs, Word docs etc.
As shown in the image, perhaps a way of tagging certain label names and extracting the both the label name and the associated value - either by name, relative position or similar - would allow Pyramid to extract the relevant fields and associated values from word or PDF documents to be used alongside additional customer data for greater overall insight.
I understand this functionality is being included in some other BI toolsets as new functionality.
I look forward to your thoughts on this.
Thanks and regards,
 Richard.
2 replies
- 
           Hi Richard, You can do this already through the Python Source Block in Model. Generally, we would be looking for tables of data in the PDF for loading into a DB for analysis. This can be done with a single line of Python code. This article give a good explanation and guide on how to do this, as well as more complex examples where one needs to deal with multiple tables. There may also be a need to retrieve individual elements of the document, like invoice number or Name and Address that comprise the "header" for the table(s) and / or elements from specific PDF 'fields' used in online PDF forms for example. You can also do this via Python, with a good explanation available on the same site as the previous one. In the meantime, we'll monitor the votes for this idea. Hope that helps. Ian