On Tuesday, the AI stabilized Released Stable Video Diffusion, a new free AI-powered research tool that can turn any still image into a short video, with mixed results. It’s an open-weight preview of two AI models that use a technology called image-to-video, and can run locally on a machine with an Nvidia GPU.
Last year, Stability AI made waves with the release of Stable Diffusion, an “open weight” image synthesis model that launched a wave of open image synthesis and inspired a large community of hobbyists who built the technology with their own custom improvements. Setting. Stability now wants to do the same with AI video compositing, although the technology is still in its infancy.
Currently, Stable Video Diffusion consists of two models: one that can produce image-to-video synthesis at 14 frames (called “SVD”), and the other that generates 25 frames (called “SVD-XT”). It can operate at varying speeds from 3 to 30 frames per second, and outputs short MP4 videos (usually 2 to 4 seconds in length) at a resolution of 576 x 1024.
In our local tests, creating a 14-frame generation took about 30 minutes on an Nvidia RTX 3060 graphics card, but users can experience models running much faster on the cloud through services like Face hugging And cloning (Some of them you may need to pay for). In our experiments, the generated animation typically keeps part of the scene still and adds pan and zoom effects or animated smoke or fire. The people in the photos often don’t move, although we did get a Getty photo of Steve Wozniak to liven up a bit.
(Note: Other than the Steve Wozniak Getty Images image, the other animations in this article were created using DALL-E 3 and animated using Stable Video Diffusion.)
Given these limitations, stability emphasizes that the model is still early and intended for research only. “While we eagerly update our models with the latest developments and are working to incorporate your feedback, this model is not intended for real-world or commercial applications at this stage. Your insights and feedback on safety and quality are important to improving this model for its final release,” the company wrote on its website.
It is worth noting, but perhaps not surprising, the prevalence of stabilized video Research paper It doesn’t reveal the source of the training datasets for the models, but only says that the research team used a “large video dataset of approximately 600 million samples” that they organized into the Large Video Dataset (LVD), which consists of 580 million annotated videos. Spanning 212 years of content.
Stable Video Diffusion is not the first AI model to offer this type of functionality. We’ve previously covered other AI video synthesis methods, including those from Meta, Google, and Adobe. We’ve also covered the open source ModelScope and what many consider to be the best AI video model out there right now, Runway’s Gen-2 model (Becca Laboratories is another AI-powered video provider.) Stability AI says it is also working on a text-to-video model, which would allow short videos to be created using written prompts instead of images.
Source and propagation weights of stable video Available On GitHub, another easy way to test it locally is to run it via a file Pinocchio platformwhich easily handles installation dependencies and runs the model in its own environment.
“Analyst. Web buff. Wannabe beer trailblazer. Certified music expert. Zombie lover. Explorer. Pop culture fanatic.”