These are the billions of numerical “dials” that store what the model has learned.
What are weights (or parameters)?
A diffusion model starts generation with this.
What is random noise?
This component reads prompts and translates them into model-understandable meaning.
What is the Text Encoder (Gemma 3)?
This acronym stands for Low-Rank Adaptation.
What is LoRA?
This is LTX-2’s biggest differentiator versus most competitors.
What is joint audio-video generation?
This is the learning phase where the model studies millions of examples.
What is training?
This analogy compares diffusion generation to gradually carving a statue.
What is the sculptor analogy?
This component acts as the creative engine generating video and audio.
What is the Transformer (DiT)?
LoRA customization trains this instead of retraining the full 22B parameter model.
What is a small adapter?
This competitor is best known for strong UI familiarity among creative teams.
What is Runway?
This is the phase where the trained model actually generates new content.
What is inference?
This photography analogy explains how outputs sharpen over time.
What is the Polaroid analogy?
This component compresses video into latent space and expands it back to full quality.
What is the VAE?
This analogy compares LoRA to changing the look of a camera without replacing it.
What is the camera filter analogy?
This phrase explains why LTX-2 audio sync feels more realistic.
What is “sync is built in, not bolted on”?
LTX-2 contains approximately this many parameters.
What is 22 billion?
This process removes noise step-by-step until a clean image or video appears.
What is denoising?
This component converts generated audio representations into playable sound.
What is the Vocoder?
This LoRA variant uses reference video context during generation.
What is IC-LoRA?
This sales conversation order should always guide customer calls.
What is Outcome → Differentiator → Technical?
This term means customers can download and run the model on their own infrastructure.
What are open weights?
Instead of predicting noise directly, LTX-2 predicts this, allowing faster and straighter generation paths.
What is velocity (flow matching)?
This architectural feature keeps audio and video synchronized throughout generation.
What is cross-attention?
These three enterprise advantages are unlocked by open weights.
What are customization, self-hosting, and IP ownership?
This is the best answer when a rep doesn’t know a technical detail.
What is “Let me verify that with engineering and get back to you”?