Ethan B. Holland

Over 56,100 manually organized AI links and counting

Augmented Reality (AR/VR), Features, Video

The Viggle AI Meme’s Impact on Image-to-Video Awareness

April 8, 2024

In April 2024, a company called Viggle AI powered a meme frenzy. For free, anyone could take a single still photo and swap themselves with a person in a video. The immediate hit was Lil Yachty Walks Out On Stage. A 2022 clip of Lil Yachty began morphing into the Joker, politicians, celebrities, and tech leaders.

Here’s my daughter (a dancer) going on stage based on a still image of her at a dance competition.

This is the single still image that I uploaded to Viggle. No need to mask it or anything. It guessed the back of the outfit.

Here’s a bit about how this works.

The critical prep for the Lil Yachty meme was done by a guy on Twitter with the handle AIWarper. He is the unsung hero of this one.

Since the Viggle template needs a clean reference video, AIWarper rotoscoped Lil Yachty using After Effects to create an isolated video of just Yachty.

Firstly, I rotoscoped the subject using After effects. This isn’t necessary but I needed a mask for a later step anyways so – why not right? pic.twitter.com/Gg2wbu26zK
— A.I.Warper (@AIWarper) April 8, 2024

How does one use Viggle?

Viggle is based in Discord, which is a sad gatekeeper for a lot of people who don’t understand how Discord works. It’s basically a simple chat-based command line interface where Viggle is a bot, and you can give it commands.

For the Lil Yachty meme, AIWarper loaded this into Viggle as a stored prompt.

From there, anyone can join Viggle on Discord and upload a still photo and call up the reference.

Discord commands usually start with / and with Viggle it’s /animate. After /animate you simply make four choices (below)

Image: is the image you want to animate.

MotionPrompt: $lil_yachty_stage_entrance (you have to find that by looking through the stored prompts, but it’s easy to spot since everyone’s using it)

Background: you use select “from template”. In this case, the choices are plain white background, a green screen, or the MotionPrompt template (aka the concert and the stage).

What is Viggle?

Viggle announced their new video swapping feature in March, and it looks a LOT like earlier tools from ByteDance.

Here’s the Viggle announcement:

VIGGLE'S NEW MODEL IS HERE!

Controllable video generation is our mission, and character is where we start.

Animate any character just as you want, simply with a prompt, or by uploading a reference video featuring clear motion.

Try Viggle for FREE: https://t.co/11cDhfqisU! pic.twitter.com/hVWWyE4IOa
— ViggleAI (@ViggleAI) March 14, 2024

ByteDance Work Based on TikTok Training

Here’s are some similar tools that preceded Viggle:

Text to Video
“ByteDance Introduces MagicVideo-V2: A Groundbreaking End-to-End Pipeline for High-Fidelity Video Generation from Textual Descriptions – MarkTechPost”
https://www.marktechpost.com/2024/01/16/bytedance-introduces-magicvideo-v2-a-groundbreaking-end-to-end-pipeline-for-high-fidelity-video-generation-from-textual-descriptions/

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation – https://magicvideov2.github.io/

Another precursor was DiffPortrait 3D

Bytedance announces DiffPortrait3D

Controllable Diffusion for Zero-Shot Portrait View Synthesis

paper page: https://t.co/lOpCdNXgUi

given an unposed portrait image, diffportrait3d can synthesize plausible but consistent facial details with retained both identity and facial… pic.twitter.com/AODei632q9
— AK (@_akhaliq) December 22, 2023

Motion/Object Isolation (aka Segmentation)

Depth Anything：把图片和视频转化成深度图
字节推出的模型，效果非常好，实际效果见视频。

项目地址：https://t.co/QHbp8tPQbC
处理视频的在线体验：https://t.co/ROdrmoIF2G
处理图片的在线体验：https://t.co/hnuxjjQoWM
Github：https://t.co/76X2Cv7Tb8 pic.twitter.com/mGc3l1wFbs
— Gorden Sun (@Gorden_Sun) February 12, 2024

Segmentation will eventually remove the need for video rotoscoping and masking.

This AI helps you go to sleep real fast. Kidding, it counts sheep pic.twitter.com/sYW1R7NWi4
— Hieu 🚀 (@hieuSSR) February 6, 2024

Tracking Anything with Decoupled Video Segmentation

paper page: https://t.co/xqfwTkf78V

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To… pic.twitter.com/6FSShhURHV
— AK (@_akhaliq) September 8, 2023

Once the computer can track all objects in a video… it can replace all objects.

REAL-TIME object detection WITHOUT TRAINING

YOLO-World is a new SOTA open-vocabulary object detector that outperforms previous models in terms of both accuracy and speed. 35.4 AP with 52.0 FPS on V100.

↓ read more pic.twitter.com/SoqFyEk41V
— SkalskiP (@skalskip92) February 6, 2024

The Biggest Precursor- Magic Animates

Magic Animates – Bring Your Photos to Dance – https://magicanimates.com/
See how close this looks to segmentation (above)

And now… Viggle.. is just the beginning.

Viggle’s website is here – https://viggle.ai/ and it sends you to Discord.

Viggle says it’s powered by “JST-1, the first video-3d foundation model with actual physics understanding”. I’m about two weeks behind on my weekly newsletter, but I’ve never heard of JST-1… we shall see.

Matt Wolf has a great tutorial on how to use Viggle, on YouTube, if you want to try it yourself.

What does this mean?

Images: Stable diffusion got people addicted to text to image AI, and led to common usage of MidJourney and Dalle.

Chat: ChatGPT broke open chat bots.

Face Swapping: InsightFace (the API most tools use) brought face swapping into mainstream.

Video swapping: Viggle is a big deal.. because it starts to connect many themes: virtual reality, augmented reality, even Gaussian Splatting… with text and image based generative video… and also latent consistency models. Just as chat became multimodal (chatting about videos and images), generative AI just went multimodal as well.

In one year, we’ve gone from making silly images of otters on a plane, to face swapping, to full 3D video people swapping… using a single image.

I’d recommend following two trends: object segmentation and Gaussian splatting. Segmentation will all for tracking and swapping. Splatting will assist with the generation of smooth frames for the tough to render angles and elements that are not shown in a single photo.

I dedicated an entire newsletter to segmentation in December 2023.

https://ethanbholland.com/2023/12/24/ai-news-11-week-ending-12-22-2023-with-executive-summary-and-top-7-stories/

Here are a few more articles, if you’re interested in skimming about segmentation. I usually put any new links into the AR/VR category of this newsletter.

4 responses to “The Viggle AI Meme’s Impact on Image-to-Video Awareness”

AI News #29: Week Ending 04/19/2024 with Executive Summary, Top 55 Links, and Helpful Visuals – Ethan B. Holland

May 10, 2024 at 5:29 pm

[…] can take a single image of a person and create a viable deep fake video. It is a bit like Viggle. Or many of the Bytedance products (DreamTalk, DiffPortrait3D, MagicVideo-V2, and DreamTuner). […]

Loading…

Reply
A short video of a panda frying an egg in a modern kitchen with a Dodgers hat on – Ethan B. Holland

August 24, 2024 at 11:04 pm

[…] tools: object segmentation, generative video stitching (like this example), video-to-image mapping (Viggle, LivePortrait), Gaussian splatting and NeRFs, context windows v. RAG… agents, multimodality, […]

Loading…

Reply
Runway Meetup | Milton, DE – October 16th – Ethan B. Holland

October 16, 2024 at 1:57 pm

[…] Vigglehttps://ethanbholland.com/2024/04/08/the-viggle-ai-memes-impact-on-image-to-video-awareness/enigmatic_e on X: “Luchador Action Figure Animation 💪 Tools I used for this: @ideogram_ai for generating reference images @ViggleAI to transform me into a Luchador @AdobeAE for compositing @ComfyUI to improve the results X – https://twitter.com/8bit_e/status/1828530971164995715 KlingJavi Lopez vacation videohttps://twitter.com/javilopen/status/1827077427933122689Kid eating noodleshttps://twitter.com/rowancheung/status/1825911087960641836Guy eating noodleshttps://twitter.com/rowancheung/status/1825911226779463891 Jon Finger on X: “I finally tried @Kling_ai ’s image to video. It only gave credits for 6 tests but I was fairly impressed with how consistent it did what I asked pretty well first try (eg: “clean the debris off the old woman”) I’ll do some creative pipeline oriented tests when I get more credits. https://twitter.com/mrjonfinger/status/1817643812233347317 MadMax Beer Commercialhttps://twitter.com/rowancheung/status/1825911155300139045 Kling AI on X: “Kling AI’s drone fly-through effect https://twitter.com/Kling_ai/status/1823275917638283395 […]

Loading…

Reply
2024 AI in Review – Lessons and Confidence – Ethan B. Holland

January 1, 2025 at 8:18 pm

[…] are a few posts that helped me realize the connection to robot training:Viggle: https://ethanbholland.com/2024/04/08/the-viggle-ai-memes-impact-on-image-to-video-awareness/SAM2: https://ethanbholland.com/2024/09/14/trying-metas-segment-anything-2-sam2-demo/AR Glasses: […]

Loading…

Reply