Every model, one workspace.
Explore the AI models available in Scenetra. Generate images, create videos, and synthesize audio — all in a visual node-based workflow editor.
Image Models
(10)Nano Banana 2
Generate high-quality images at up to 4K resolution using Google's Gemini 3.1 Flash model. Nano Banana 2 is optimized for speed, supports text-to-image and image editing with reference images, and offers flexible aspect ratios.
GPT Image 1.5
OpenAI's latest image generation model with superior instruction following and prompt adherence. GPT Image 1.5 supports text-to-image generation and image editing when input images are provided, with transparent background support.
Flux 2 Pro
Black Forest Labs' professional-grade image model delivering high-quality outputs with precise control. Flux 2 Pro supports text-to-image generation and image editing with flexible sizing, safety controls, and megapixel-based pricing.
Seedream 4.5
ByteDance's upgraded image generation model with improved quality, text rendering, and image editing capabilities. Seedream 4.5 excels at detailed compositions and accurate text in images, making it ideal for marketing and creative content.
Grok Image
xAI's new entrant in AI image generation. Grok Image creates and edits images from text prompts with flexible aspect ratios. A fresh competitor bringing xAI's reasoning capabilities to visual content creation.
Flux 2
Black Forest Labs' versatile image generation model with customizable acceleration and flexible sizing. Flux 2 delivers high-quality text-to-image outputs with support for image editing when input images are provided.
Z-Image Turbo
A blazing-fast text-to-image model with high quality outputs and customizable acceleration. Z-Image Turbo generates images in as few as 1-8 inference steps, making it ideal for rapid prototyping and real-time creative workflows.
Nano Banana Pro
The professional tier of Google's Gemini-powered image generation. Nano Banana Pro delivers higher quality outputs with support for up to 4K resolution, flexible aspect ratios, multiple output formats, and image editing with reference images.
Nano Banana
Google's Nano Banana is a fast, low-cost image generation and editing model. Supports text-to-image and reference-based image editing with up to 14 reference images. Best for high-volume creative iteration.
Flux 2 Max
Black Forest Labs' most powerful Flux model for text-to-image generation and image editing. Highest quality outputs in the Flux family with support for up to 4 megapixels and multi-image editing (up to 8 reference images).
Video Models
(19)Kling 3.0 Pro
Kuaishou's flagship video generation model delivering stunning visual fidelity. Kling 3.0 Pro supports text-to-video, image-to-video with start/end frames, element references for character consistency, and native audio generation.
Veo 3.1
Google's premier video generation model supporting text-to-video, image-to-video, and first-last-frame video creation. Veo 3.1 delivers up to 4K resolution with native audio synthesis, making it one of the most versatile video models available.
Sora 2 Pro
OpenAI's professional video generation model with 1080p support. Sora 2 Pro generates high-quality videos from text prompts or input images with flexible aspect ratios and duration controls.
Seedance 1.5 Pro
ByteDance's advanced video generation model supporting text-to-video, image-to-video, audio synthesis, and extended duration videos. Seedance 1.5 Pro offers fast iteration with competitive pricing and flexible resolution options.
Hailuo 2.3
MiniMax's Hailuo 2.3 video generation model delivering 768p videos with both text-to-video and image-to-video support. A cost-effective option for social media content, quick iterations, and template-based video creation.
Grok Video
xAI's video generation model creating videos from text prompts or images. Grok Video supports up to 15 seconds of output with flexible aspect ratios and resolution options, bringing xAI's capabilities to video content creation.
Wan 2.6
A versatile text-to-video and image-to-video model with built-in audio support and prompt expansion. Wan 2.6 delivers up to 1080p resolution with 15-second videos, making it a solid all-rounder for creative video generation.
LTX-2 Pro
Lightricks' LTX Video 2.0 Pro delivers high-fidelity text-to-video and image-to-video generation with audio synthesis at resolutions up to 4K. A strong budget-friendly option with professional output quality.
Kling 3.0 Standard
Kuaishou's standard-tier video generation model offering text-to-video and image-to-video with support for start/end frames, element references for character consistency, and native audio generation. A cost-effective alternative to Kling 3.0 Pro.
Kling o3 Standard
Kling's Omni 3 Standard model for text-to-video and image-to-video generation with native audio support. Features start/end frame control, flexible durations up to 15 seconds, and multiple aspect ratios for versatile video creation.
Sora 2
OpenAI's video generation model for text-to-video and image-to-video creation. Sora 2 generates high-quality 720p videos up to 12 seconds with flexible aspect ratios, making it accessible for creative video generation at a competitive price.
Seedance 2
ByteDance's Seedance 2 generates cinematic videos from text prompts with synchronized audio, dialogue, and sound effects. Native multi-input support including reference images, videos, and audio for fine-grained creative control.
Seedance 2 Reference-to-Video
Generate videos from up to 9 reference images, 3 reference videos, and 3 audio tracks using Seedance 2.0 by ByteDance. Reference subjects directly in your prompt as [Image1], [Image2], [Video1], etc. for character consistency and motion transfer.
Happy Horse
Alibaba's Happy Horse generates cinematic videos from text prompts or animates a first-frame image into a full video. 1080p output, durations 3-15 seconds, support for both text-to-video and image-to-video workflows.
Happy Horse Reference-to-Video
Alibaba's Happy Horse reference-to-video generates videos from 1-9 reference images. Reference each subject in your prompt as `character1`, `character2`, ... `character9` (order matches your uploaded images) for consistent multi-character scenes. 720p or 1080p output, 3-15 second durations.
Grok Reference-to-Video
Generate videos with consistent subject appearance from up to 7 reference images using xAI's Grok Imagine Video. Reference each image in your prompt as @Image1, @Image2, etc. for style and content guidance.
Kling v2.6
Kling v2.6 generates videos from text prompts or first-frame images with native synchronized audio support, including dialogue in Chinese and English. Built for short-form social and cinematic content.
Veo 3.1 Fast
Google Veo 3.1 Fast generates high-quality videos from text or images at a fraction of the cost of standard Veo 3.1. Supports text-to-video, image-to-video, and first-last-frame mode with native audio at up to 4K resolution.
Veo 3.1 Lite
Google Veo 3.1 Lite is the most affordable Veo tier, optimized for high-volume video generation. Generates 720p or 1080p videos from text or images with optional synchronized audio.
Your vision, every model.
Use all these models together in Scenetra's visual workflow editor. Chain outputs, batch variations, and bring your own API keys.
Get Started Free