Ethan B. Holland

Over 56,100 manually organized AI links and counting

security footage of a black bear walking through a Japanese village. The bear has a green computer recognition square around it. --ar 5:3 --style raw

Multimodality News: Week Ending 05/03/2024

May 3, 2024

security footage of a black bear walking through a Japanese village. The bear has a green computer recognition square around it. –ar 5:3 –style raw

Japan To Trial AI Bear Spotting System After Sharp Rise In Attacks – News18

https://www.news18.com/viral/japan-to-trial-ai-bear-spotting-system-after-sharp-rise-in-attacks-8872190.html

Meta’s Llama 3 400b: Multi-modal , longer context, potentially multiple models : r/LocalLLaMA

Meta’s Llama 3 400b: Multi-modal , longer context, potentially multiple models
byu/domlincog inLocalLLaMA

“Made this quick fun project AI recipe helper to learn some LLMs implementation on web in JS. Running on: – GPT-4 Vision to seek ingredients in groceries, receipts, or finished meals – GPT-4 to generate recipes with guidance – DALL-E 3 to generate recipe image previews

Made this quick fun project AI recipe helper to learn some LLMs implementation on web in JS.

Running on:
– GPT-4 Vision to seek ingredients in groceries, receipts, or finished meals
– GPT-4 to generate recipes with guidance
– DALL-E 3 to generate recipe image previews pic.twitter.com/Q8CI6om2ex
— Jakub Zegzulka (@jakubzegzulka) April 29, 2024

“Delighted to share ✨Med-Gemini✨ – our new family of multimodal models for medicine unlocking new possibilities for health –

Delighted to share ✨Med-Gemini✨ – our new family of multimodal models for medicine unlocking new possibilities for health – https://t.co/oemI52WBou

More accurate multimodal conversations about medical images🩻, surgical videos📽️, genomics🧬, ultra-long health records📚, ECGs🫀… pic.twitter.com/gZ6WT4Mw3y
— Alan Karthikesalingam (@alan_karthi) April 30, 2024

A large Shangtang multi-modal model with 600 billion parameters was released, and the performance surpassed GPT-4 Turbo

https://news.futunn.com/en/post/41290101/a-large-shangtang-multi-modal-model-with-600-billion-parameters?level=2&data_ticket=1716075562321790

“LLM-AD Large Language Model based Audio Description System The development of Audio Description (AD) has been a pivotal step forward in making video content more accessible and inclusive. Traditionally, AD production has demanded a considerable amount of skilled labor,

LLM-AD

Large Language Model based Audio Description System

The development of Audio Description (AD) has been a pivotal step forward in making video content more accessible and inclusive. Traditionally, AD production has demanded a considerable amount of skilled labor, pic.twitter.com/8UKqPPmpVD
— AK (@_akhaliq) May 3, 2024

“PLLaVA Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands

PLLaVA

Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands pic.twitter.com/CrQBwCUpIo
— AK (@_akhaliq) April 29, 2024

Heads up! You’ve scrolled to the end of this category. There may have been just one or two links (above), so go back up and double check to be sure you didn’t quickly scroll down past it.

Be Sure To Read This Week’s Main Post:

This week’s executive overview and top links are here:

AI News #31: Week Ending 05/03/2024 with Executive Summary and Top 95 Links

The post you just read is an deep dive extension of my weekly newsletter, This Week In AI, an executive summary of the top things to know in AI. Each week, I create an accessible overview for laypeople to feel confident they are conversant with the week’s AI developments. I include a curated list of must-click links of the week, to offer everyone a hands-on opportunity to explore the most intriguing updates in artificial intelligence across various categories, including robotics, imagery, video, AR/VR, science, ethics, and more. Beyond the overview, I post these topic-based deeper dives (below). If you haven’t read this week’s overview, I recommend starting there.

Credits/Sources

Most of these weekly links come from just a few prolific oversharing sources. Please follow them, as they work hard to find the news each week and they make it a lot easier for me to compile.