Comparison of Image-to-Video WAN2.2 and LTX2.3 Locally

What if you could take a single photo and transform it into actual moving video? Now this is possible on your own PC, locally, without any cloud fees. Today, based on the local AI video generation tool WAN2GP, I’ll walk you through from start to finish how to actually use two models—WAN 2.2 and LTX 2.3—that convert images into videos.

What is WAN2GP? — The Starting Point for Local AI Video Generation

WAN2GP is, simply put, a local launcher (execution tool) that lets you run multiple AI video and image generation models in one window. Just like installing one smartphone app that gives you YouTube, Netflix, music, and more all in one place, installing WAN2GP gives you access to various AI models like WAN 2.1, WAN 2.2, LTX Video, Hunyuan Video, Flux, and more to choose from.

The core of this tool is that instead of sending requests to internet servers, your computer’s GPU performs the computation directly. In other words, without monthly subscription fees or credit depletion worries, once you install it, you can use it indefinitely. However, you need an NVIDIA GPU on your PC and VRAM (graphics card dedicated memory) of at least 6GB or more.

The two models I’ll focus on in this article are as follows: WAN 2.2 is the latest version of an open-source video generation model developed by Alibaba’s Tongyi Lab, supporting up to 1080p Full HD output and optimized for quick results verification. LTX 2.3 is a model specialized in maximizing mood and quality of videos, used when creating final polished versions of longer videos. Understanding when and how to use these two models separately is the key focus of today’s content.

Installation Method — Just Get It Right the First Time

There are two main ways to install WAN2GP. If you’re a beginner, I strongly recommend the Pinokio-based one-click installation. Pinokio is an installation management tool that automatically handles complex dependency issues (version conflicts between programs). However, if the program gets corrupted after an update, reinstallation may be necessary.

If you want more direct control, you can choose manual installation. The manual installation procedure is as follows:

Open a terminal (command prompt) and enter the following command: git clone https://github.com/deepbeepmeep/Wan2GP.git — This command copies the WAN2GP source code from GitHub to your PC.
Use the cd Wan2GP command to move into the downloaded folder.
Create a virtual environment with Python 3.10.x version. If using conda, enter conda create -n wan2gp python=3.10.9. A virtual environment creates an isolated workspace just for this program, preventing conflicts with other programs.
After activating the virtual environment, install PyTorch that matches your CUDA driver version. CUDA is a software layer that allows NVIDIA GPUs to handle AI computations. If versions don’t match, speed becomes extremely slow or VRAM errors occur.
Finally, install the remaining required packages all at once with the pip install -r requirements.txt command.

⚠️ Warning: On first run, the model file download and cache configuration process happens automatically. This process can take 20 minutes or more, and the screen may appear frozen, which is normal. Do not force quit and wait for it to complete.

Converting Images to Video with WAN 2.2 — Quick Results Verification Strategy

WAN 2.2 uses a MoE (Mixture of Experts) structure. To explain simply, instead of one giant AI, multiple specialist AIs divide roles and process tasks. As a result, quality has improved compared to previous versions and it’s advantageous for quick result verification. It supports up to 1080p Full HD output, artifacts (image corruption) have been greatly reduced, and the ability to express text stably within videos has improved.

Here’s how to start an image-to-video task with WAN 2.2:

①Run the WAN2GP launcher and select WAN 2.2 (I2V, Image to Video) mode from the model selection menu.
②Upload the image to use as the first frame. This image becomes the opening scene of the video.
③Set the resolution to 720p or lower. Attempting 1080p from the start can cause VRAM shortage errors. After results stabilize at 720p, gradually increase it.
④Set FPS (frames per second) to 16, and specify video length using the num_frames value. If num_frames=16, you get approximately 1 second of video.
⑤Enter a prompt (description of motion to apply to the video). Example: Handheld feel, micro-shakes, realistic motion blur, cinematic mood — with this one line you can simultaneously specify handheld feel with realistic shakes and cinematic atmosphere.
⑥Create several short video versions and when a result you like appears, be sure to record its Seed value (unique number for reproducing results). By fixing the Seed, you can recreate the same result.

💡 TIP
It’s best to avoid 480p low resolution settings. According to user reports in practice, frame corruption artifacts frequently occur at 480p. Working at a minimum of 720p yields the most realistic results.

Raising Completion Quality with LTX 2.3’s Two-Stage Workflow

LTX 2.3 is a model focused on mood and quality. If you try everything from the start with LTX 2.3, it takes a long time, and if the direction doesn’t match, you waste time. So in practice, it’s best to always follow a two-stage workflow.

①Stage 1 — Direction Verification (WAN 2.2’s role): First, quickly generate several short videos with WAN 2.2. This is the stage to verify whether the direction of movement, speed, and overall atmosphere match what you want. In this process, find the optimal prompt and Seed value.
②Stage 2 — Final Version Production (LTX 2.3’s role): Once direction is decided, input the same image and prompt into LTX 2.3 to generate the final version. LTX 2.3 supports longer video lengths and expresses texture (surface quality) and overall video mood more richly.
③When entering a prompt in LTX 2.3, describe the cinematic atmosphere more specifically. Example: A calm cinematic shot of a rainy street at night, neon reflections, slow dolly forward — by specifying even camera movement (slow dolly forward, meaning slowly moving forward), you get video much closer to your intention.
④When using images containing people, facial features may appear to shake at roughly 2-second intervals. This is a known current limitation of the model, so you can expect more stable results from landscape or object images rather than people.

💡 TIP
If you feel the results lack consistency, try making your prompt simpler. Often, a concise prompt with just 2-3 core descriptions creates more consistent video than complex ones.

Common Errors and Solutions

I’ve compiled the most common errors you’ll encounter when starting local AI video generation and their solutions. Don’t panic and check them in order below.

①When VRAM shortage (OOM error) occurs: First, reduce the num_frames value. Cutting frame count in half significantly reduces VRAM usage. If still not resolved, lower the resolution. Resolution reduction is used as a last resort.
②When generation speed is abnormally slow: This is when the PyTorch and CUDA version combination doesn’t match. Check which CUDA version your GPU driver supports, and reinstall the PyTorch version that matches it. You can check the version combination table on the PyTorch official website (pytorch.org).
③When the program malfunctions after an update: This is when old launcher and new files conflict. Rather than spending time trying to repair the environment, a clean reinstall from the formal installation route is a faster solution.
④When there’s no response after first run: Model download and cache configuration are in progress. This can take 20 minutes or more, and even if the screen appears frozen, wait without force quitting and it will proceed normally.

⚠️ Warning: If PyTorch and CUDA versions don’t match, even with a GPU present, computation happens only on CPU, resulting in dozens of times slower speed. If generation speed after installation is much slower than expected, check this first.

Key Summary You Can Use Right Now

I’ve summarized today’s learning one line at a time. Just remembering this lets you create results immediately.

①WAN2GP is a launcher for running multiple AI video models locally. You need an NVIDIA GPU and VRAM of 6GB or more.
②Beginners should choose Pinokio one-click installation; those wanting direct control should choose manual installation.
③Use WAN 2.2 for quick direction verification and LTX 2.3 for final version production, dividing their roles.
④Always start resolution at 720p or lower from the beginning, and once you get a result you like, record the Seed value for reproducibility.
⑤For VRAM errors, respond in order: reduce num_frames → reduce resolution. For speed issues, first check PyTorch + CUDA version combination.

Pick one image now and turn it into a video

The two-stage workflow: direction-setting with WAN 2.2, finalization with LTX 2.3. Starting short, creating often, and making plenty—this is the fastest path to skill improvement.

Like Comment Share

Posted on Jan 29, 2025

Post Views: 7

AI Tools