Every model, one workspace.

Explore the AI models available in Scenetra. Generate images, create videos, and synthesize audio — all in a visual node-based workflow editor.

Image Models

(10)

Nano Banana 2

Generate high-quality images at up to 4K resolution using Google's Gemini 3.1 Flash model. Nano Banana 2 is optimized for speed, supports text-to-image and image editing with reference images, and offers flexible aspect ratios.

Text → ImageImage Editing
Google GeminiFrom $0.045 per image

GPT Image 1.5

OpenAI's latest image generation model with superior instruction following and prompt adherence. GPT Image 1.5 supports text-to-image generation and image editing when input images are provided, with transparent background support.

Text → ImageImage Editing
OpenAI$0.02 per image

Flux 2 Pro

Black Forest Labs' professional-grade image model delivering high-quality outputs with precise control. Flux 2 Pro supports text-to-image generation and image editing with flexible sizing, safety controls, and megapixel-based pricing.

Text → ImageImage Editing
Black Forest LabsFrom $0.03 per image

Seedream 4.5

ByteDance's upgraded image generation model with improved quality, text rendering, and image editing capabilities. Seedream 4.5 excels at detailed compositions and accurate text in images, making it ideal for marketing and creative content.

Text → ImageImage Editing
ByteDance$0.04 per image

Grok Image

xAI's new entrant in AI image generation. Grok Image creates and edits images from text prompts with flexible aspect ratios. A fresh competitor bringing xAI's reasoning capabilities to visual content creation.

Text → ImageImage Editing
xAI$0.02 per image

Flux 2

Black Forest Labs' versatile image generation model with customizable acceleration and flexible sizing. Flux 2 delivers high-quality text-to-image outputs with support for image editing when input images are provided.

Text → ImageImage Editing
Black Forest LabsFrom $0.025 per image

Z-Image Turbo

A blazing-fast text-to-image model with high quality outputs and customizable acceleration. Z-Image Turbo generates images in as few as 1-8 inference steps, making it ideal for rapid prototyping and real-time creative workflows.

Text → Image
PrunaAI$0.01 per image

Nano Banana Pro

The professional tier of Google's Gemini-powered image generation. Nano Banana Pro delivers higher quality outputs with support for up to 4K resolution, flexible aspect ratios, multiple output formats, and image editing with reference images.

Text → ImageImage Editing
Google GeminiFrom $0.06 per image

Nano Banana

Google's Nano Banana is a fast, low-cost image generation and editing model. Supports text-to-image and reference-based image editing with up to 14 reference images. Best for high-volume creative iteration.

Text → ImageImage Editing
Google GeminiFrom $0.039 per image

Flux 2 Max

Black Forest Labs' most powerful Flux model for text-to-image generation and image editing. Highest quality outputs in the Flux family with support for up to 4 megapixels and multi-image editing (up to 8 reference images).

Text → ImageImage Editing
Black Forest LabsFrom $0.03 per megapixel

Video Models

(19)

Kling 3.0 Pro

Kuaishou's flagship video generation model delivering stunning visual fidelity. Kling 3.0 Pro supports text-to-video, image-to-video with start/end frames, element references for character consistency, and native audio generation.

Text → VideoImage → Video
KuaishouFrom $0.084 per second

Veo 3.1

Google's premier video generation model supporting text-to-video, image-to-video, and first-last-frame video creation. Veo 3.1 delivers up to 4K resolution with native audio synthesis, making it one of the most versatile video models available.

Text → VideoImage → Video
GoogleFrom $0.20 per second

Sora 2 Pro

OpenAI's professional video generation model with 1080p support. Sora 2 Pro generates high-quality videos from text prompts or input images with flexible aspect ratios and duration controls.

Text → VideoImage → Video
OpenAIFrom $0.30 per second

Seedance 1.5 Pro

ByteDance's advanced video generation model supporting text-to-video, image-to-video, audio synthesis, and extended duration videos. Seedance 1.5 Pro offers fast iteration with competitive pricing and flexible resolution options.

Text → VideoImage → Video
ByteDance$0.05 per second

Hailuo 2.3

MiniMax's Hailuo 2.3 video generation model delivering 768p videos with both text-to-video and image-to-video support. A cost-effective option for social media content, quick iterations, and template-based video creation.

Text → VideoImage → Video
MiniMaxFrom $0.28 per video

Grok Video

xAI's video generation model creating videos from text prompts or images. Grok Video supports up to 15 seconds of output with flexible aspect ratios and resolution options, bringing xAI's capabilities to video content creation.

Text → VideoImage → Video
xAIFrom $0.05 per second

Wan 2.6

A versatile text-to-video and image-to-video model with built-in audio support and prompt expansion. Wan 2.6 delivers up to 1080p resolution with 15-second videos, making it a solid all-rounder for creative video generation.

Text → VideoImage → Video
Wan VideoFrom $0.10 per second

LTX-2 Pro

Lightricks' LTX Video 2.0 Pro delivers high-fidelity text-to-video and image-to-video generation with audio synthesis at resolutions up to 4K. A strong budget-friendly option with professional output quality.

Text → VideoImage → Video
LightricksFrom $0.06 per second

Kling 3.0 Standard

Kuaishou's standard-tier video generation model offering text-to-video and image-to-video with support for start/end frames, element references for character consistency, and native audio generation. A cost-effective alternative to Kling 3.0 Pro.

Text → VideoImage → Video
KuaishouFrom $0.042 per second

Kling o3 Standard

Kling's Omni 3 Standard model for text-to-video and image-to-video generation with native audio support. Features start/end frame control, flexible durations up to 15 seconds, and multiple aspect ratios for versatile video creation.

Text → VideoImage → Video
KuaishouFrom $0.042 per second

Sora 2

OpenAI's video generation model for text-to-video and image-to-video creation. Sora 2 generates high-quality 720p videos up to 12 seconds with flexible aspect ratios, making it accessible for creative video generation at a competitive price.

Text → VideoImage → Video
OpenAIFrom $0.20 per second

Seedance 2

ByteDance's Seedance 2 generates cinematic videos from text prompts with synchronized audio, dialogue, and sound effects. Native multi-input support including reference images, videos, and audio for fine-grained creative control.

Text → Video
ByteDanceFrom $0.08 per second

Seedance 2 Reference-to-Video

Generate videos from up to 9 reference images, 3 reference videos, and 3 audio tracks using Seedance 2.0 by ByteDance. Reference subjects directly in your prompt as [Image1], [Image2], [Video1], etc. for character consistency and motion transfer.

Image → Videoreference-to-video
ByteDanceFrom $0.08 per second

Happy Horse

Alibaba's Happy Horse generates cinematic videos from text prompts or animates a first-frame image into a full video. 1080p output, durations 3-15 seconds, support for both text-to-video and image-to-video workflows.

Text → VideoImage → Video
AlibabaFrom $0.14 per second

Happy Horse Reference-to-Video

Alibaba's Happy Horse reference-to-video generates videos from 1-9 reference images. Reference each subject in your prompt as `character1`, `character2`, ... `character9` (order matches your uploaded images) for consistent multi-character scenes. 720p or 1080p output, 3-15 second durations.

Image → Videoreference-to-video
AlibabaFrom $0.14 per second

Grok Reference-to-Video

Generate videos with consistent subject appearance from up to 7 reference images using xAI's Grok Imagine Video. Reference each image in your prompt as @Image1, @Image2, etc. for style and content guidance.

Image → Videoreference-to-video
xAIFrom $0.05 per second

Kling v2.6

Kling v2.6 generates videos from text prompts or first-frame images with native synchronized audio support, including dialogue in Chinese and English. Built for short-form social and cinematic content.

Text → VideoImage → Video
KuaishouFrom $0.07 per second

Veo 3.1 Fast

Google Veo 3.1 Fast generates high-quality videos from text or images at a fraction of the cost of standard Veo 3.1. Supports text-to-video, image-to-video, and first-last-frame mode with native audio at up to 4K resolution.

Text → VideoImage → Video
GoogleFrom $0.10 per second

Veo 3.1 Lite

Google Veo 3.1 Lite is the most affordable Veo tier, optimized for high-volume video generation. Generates 720p or 1080p videos from text or images with optional synchronized audio.

Text → VideoImage → Video
GoogleFrom $0.03 per second

Your vision, every model.

Use all these models together in Scenetra's visual workflow editor. Chain outputs, batch variations, and bring your own API keys.

Get Started Free