Another image-to-video model, this time wan-i2v which claims to be the next evolution in video generation.

Built upon the mainstream diffusion transformer paradigm, Wan2.1 achieves significant advancements in generative capabilities through a series of innovations, including our novel spatio-temporal variational autoencoder (VAE), scalable pre-training strategies, large-scale data construction, and automated evaluation metrics. These contributions collectively enhance the model’s performance and versatility