Structured OCR with GPT Vision

The code for this post is here. I’m continuously impressed with GPT Vision the more I use it. I’ve recently been working on a side project that involves parsing PDFs with embedded tables/images/flow charts for a RAG (retrieval augmented generation) chatbot, and found the most painful part to be dealing with parsing and structuring the PDFs. There’s a lot of knobs to mess with (query rewriting, chunking, embedding, retrieval process, reranking etc) which are all next to useless if the input text is garbled....

December 1, 2023 · Binal Patel

Ranking Anything with GPT4

I’ve been fascinated by this paper: Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent and have been trying out the ideas with success in a few personal projects. They essentially found that GPT4 is excellent at ranking things. Given a set of items (candidates) and some query it can rank these items very well (as well as or better than the current SOTA models for the benchmarks they evaluated)....

April 30, 2023 · Binal Patel