A donut with the word “document” written in sprinkles sits on a desk covered with papers. A rubber stamp with the word “Data” on it is also on the desk.

“Multimodal AI models are getting even faster with Groq’s newest LLaVA v1.5 7B release. It’s reportedly 4x as fast as GPT-4o and can have conversations with images, audio, and text. LLaVA v1.5 7B is currently free in “Preview Mode” for developers. 

“friends, here I’m talking about multimodal RAG or document retrieval if you want short, structured and concise answers from documents of same structure and have labelled data I suggest fine-tuning a model like Donut or LayoutLM series or UDOP” / X

Google

“🩺 Enhancing Healthcare Diagnostics with Multimodal RAG Systems With Qdrant & Gemini, you can transform the way healthcare professionals approach diagnostics by combining the power of *both* text and image data. 🖼 🔠 In this article, Pragnesh Prajapati shows how to create a 

Google Gemini will again support AI image generation of people

Google Photos: Search improvements and early access to Ask Photos

Google’s AI-powered Ask Photos feature begins US rollout | TechCrunch

Segmentation

“Those are some crispy and consistent depth maps! DepthCrafter looks like the new SOTA for video depth estimation tasks. Take any video, of any length, and get temporally coherent depth maps that you can use for VFX, or as an input to other AI models. Quick thread 

Leave a Reply

Trending

Discover more from Ethan B. Holland

Subscribe now to keep reading and get access to the full archive.

Continue reading