Wan Video Lora Training

This is under heavy development. Take it as inspiration to WAN Video Generation only for now!

1) Umgebung

bash

# 1) Conda/venv
conda create -n wan22 python=3.10 -y
conda activate wan22

# 2) Pytorch + Tools (CUDA 12.x Build)
pip install --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install xformers accelerate transformers datasets peft bitsandbytes==0.43.3 safetensors einops
pip install opencv-python pillow tqdm

# 3) Trainer
git clone https://github.com/Wan-Video/DiffSynth-Studio.git
cd DiffSynth-Studio
pip install -r requirements.txt || true
cd ..

(A100 kann BF16 — das nutzen wir konsequent.)

2) Modelle ablegen

Lege die WAN 2.2-Gewichte (T2V/I2V/V2V/S2V, je nach Task) plus VAE/Text-Encoder in die erwarteten Ordner des Trainers bzw. gib sie per --model_name_or_path (Hugging Face Pfad oder lokaler Pfad) an. Beispiele (ersetzen nach Bedarf):

--model_name_or_path Wan-AI/Wan2.2-T2V-A14B (Text-zu-Video)
--model_name_or_path Wan-AI/Wan2.2-I2V-A14B (Image-zu-Video)

3) Datensatz vorbereiten

JSONL mit je einer Zeile pro Clip. Minimalfelder:

json

{"video": "/data/myset/clip_0001.mp4", "prompt": "a cozy coffee shop scene at golden hour", "fps": 24, "seconds": 4, "resolution": "1280x720"}
{"video": "/data/myset/clip_0002.mp4", "prompt": "rainy neon city street, cinematic", "fps": 24, "seconds": 4, "resolution": "1280x720"}

Für T2V kannst du statt video auch Stills/Frames referenzieren (oder ein leeres Dummy-Video) – wichtig sind Prompts & Ziel-Specs (fps, seconds, resolution). Pfad speicherst du z. B.: /data/wan22/train.jsonl und optional /data/wan22/val.jsonl.

4) `accelerate`-Config (einmalig)

bash

accelerate config default

Oder fix per Datei ~/.cache/huggingface/accelerate/default_config.yaml:

yaml

compute_environment: LOCAL_MACHINE
distributed_type: NO
mixed_precision: bf16
num_processes: 1
gpu_ids: "0"
dynamo_backend: NO

(Für Multi-GPU auf dem selben Host einfach distributed_type: MULTI_GPU setzen.)

5) LoRA-Finetuning (A100 40 GB, stabiler Start)

T2V-Beispiel (720p, 4 s, BF16):

bash

conda activate wan22
cd DiffSynth-Studio

accelerate launch \
  train_wan_lora.py \
  --model_name_or_path "Wan-AI/Wan2.2-T2V-A14B" \
  --output_dir /data/wan22_lora_out \
  --dataset_json /data/wan22/train.jsonl \
  --resolution 720 --fps 24 --clip_seconds 4 \
  --train_batch_size 1 \
  --gradient_accumulation_steps 8 \
  --max_train_steps 20000 \
  --learning_rate 1e-4 --warmup_steps 500 \
  --lora_rank 64 --lora_alpha 64 \
  --use_bf16 --enable_xformers --gradient_checkpointing \
  --checkpointing_steps 1000 \
  --validation_json /data/wan22/val.jsonl --validation_steps 2000

I2V (Bild-zu-Video) wechselst du nur das Modell:

bash

accelerate launch train_wan_lora.py \
  --model_name_or_path "Wan-AI/Wan2.2-I2V-A14B" \
  ... (Parameter wie oben)

Bewährte Tweaks auf A100-40 GB

Wenn VRAM locker: --train_batch_size 2 oder --lora_rank 96/128.
Wenn knapp: erhöhe --gradient_accumulation_steps (z. B. 12–16).
Für Style/Character-LoRAs oft genug: max_train_steps 6k–12k, rank 32–64.
BF16 > FP16 auf A100 (Stabilität).
Aktiv: --gradient_checkpointing + --enable_xformers.

6) Resume / Checkpoints

bash

accelerate launch train_wan_lora.py \
  ... (gleiche Flags) \
  --resume_from_checkpoint "/data/wan22_lora_out/checkpoint-10000"

7) Inferenz (schneller Test)

Viele WAN-Workflows (CLI/ComfyUI) können LoRAs im Loader-Knoten referenzieren. CLI-Beispiel (sinngemäß):

bash

python infer_wan.py \
  --model_name_or_path "Wan-AI/Wan2.2-T2V-A14B" \
  --lora_path "/data/wan22_lora_out" \
  --prompt "cozy coffee shop at golden hour, bokeh" \
  --negative_prompt "distorted faces, artifacts" \
  --resolution 720 --fps 24 --seconds 4 \
  --output /data/wan22/samples/test001.mp4 \
  --use_bf16 --enable_xformers

(In ComfyUI: WAN-Loader → LoRA(s) einhängen → Render.)

8) Multi-GPU (optional auf einem Host)

A100-40 GB × N beschleunigt Training, auch bei LoRA. Beispiel 2 GPUs:

bash

accelerate config  # setze distributed_type=MULTI_GPU, num_processes=2
accelerate launch \
  --multi_gpu \
  train_wan_lora.py \
  ... (wie oben) \
  --train_batch_size 1 --gradient_accumulation_steps 8

Für 4+ GPUs ggf. --seq_parallel falls im Trainer verfügbar aktivieren (schont VRAM).

9) Hyperparam-Schnellstart

General: lr=1e-4, rank=64, alpha=64, steps=10k–20k, bs=1, ga=8–12.
Character: eher rank=64–128, steps=8k–12k, kurze 2–4 s Clips.
Style: rank=32–64, steps=6k–10k, mehr Diversität.
Evaluation: alle 1–2k Steps 2–4 feste Prompts + 2 deiner Real-Prompts rendern.

Wan Video Lora Training

1) Umgebung

2) Modelle ablegen

3) Datensatz vorbereiten

4) accelerate-Config (einmalig)

5) LoRA-Finetuning (A100 40 GB, stabiler Start)

Bewährte Tweaks auf A100-40 GB

6) Resume / Checkpoints

7) Inferenz (schneller Test)

8) Multi-GPU (optional auf einem Host)

9) Hyperparam-Schnellstart

4) `accelerate`-Config (einmalig)