I would like to open a discussion regarding the knowledge base (KB). As you all know, it is now possible to upload a PDF to KB (@Team Thanks guys :D). The issue with unstructured data is that AI sometimes/often struggles with it, which is why it is extremely important to have a high-quality pipeline when dealing with unstructured data.
I already suggested to have the possibility to use the built-in service to process PDF as well as using an external API, such as Mistral OCR https://mistral.ai/news/mistral-ocr or alternatives such as Undatas.io (read https://undatas.io/blog/posts/can-undatasio-really-deliver-superior-pdf-parsing-quality-sample-based-evidence-speaks/).
I think that when it comes to optimisation, it is useful to have flexibility and the possibility to be cost/efficient with respect to the treated material.
I would be interested in gathering your feedback, especially from people with experience in the matter.
------ What would be your dreamed piepline when bilding your KB ------
Please authenticate to join the conversation.
Planned
Feature Request
10 months ago

John Bubliner
Get notified by email when there are changes.
Planned
Feature Request
10 months ago

John Bubliner
Get notified by email when there are changes.