通过这个 ComfyUI 工作流,我们可以直接输入一段音频(如歌曲或说话声)和一张参考图,生成与之口型同步的高质量视频。

This technique allows you to generate videos that perfectly lip-sync to any audio track using just a reference image and a sound file. I have included two versions of the workflow, focusing heavily on the Low VRAM optimized version using GGUF models, which surprisingly delivered better quality in my tests.