Binal Patel

Structured OCR with GPT Vision

The code for this post is here. I’m continuously impressed with GPT Vision the more I use it. I’ve recently been working on a side project that involves parsing PDFs with embedded tables/images/flow charts for a RAG (retrieval augmented generation) chatbot, and found the most painful part to be dealing with parsing and structuring the PDFs. There’s a lot of knobs to mess with (query rewriting, chunking, embedding, retrieval process, reranking etc) which are all next to useless if the input text is garbled....

Ranking Anything with GPT4

I’ve been fascinated by this paper: Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent and have been trying out the ideas with success in a few personal projects. They essentially found that GPT4 is excellent at ranking things. Given a set of items (candidates) and some query it can rank these items very well (as well as or better than the current SOTA models for the benchmarks they evaluated)....

Extracting and Structuring Recipes Using GPT3

The code for this post is here. The second experiment I’ve tried is GPT3 to extract and structure data and have been pretty impressed. The below example took me about an hour to setup, most of it just being iterating on the prompts I’m using as directions to the model. An additional bonus I wasn’t expecting - this also turned out to be a decent recipe generator. If I input in just a recipe name like Pumpkin Pie it’ll generate/hallucinate structured ingredients and instructions....