Structured OCR with GPT Vision

The code for this post is here. I’m continuously impressed with GPT Vision the more I use it. I’ve recently been working on a side project that involves parsing PDFs with embedded tables/images/flow charts for a RAG (retrieval augmented generation) chatbot, and found the most painful part to be dealing with parsing and structuring the PDFs. There’s a lot of knobs to mess with (query rewriting, chunking, embedding, retrieval process, reranking etc) which are all next to useless if the input text is garbled....

December 1, 2023 · Binal Patel