The State of AI Image and Video Generation: Midjourney, Sora, and What Actually Works

Quick summary

Image generation reaches new quality benchmarks
Video AI finally becomes practical
Comparative analysis of current generation tools
Practical workflows for content teams
What research says about AI generation quality

Tool Updates

Why This Matters Now

The point of The State of AI Image and Video Generation: Midjourney, Sora, and What Actually Works is not to chase every announcement. The useful signal is what changed for builders, creators, teams, and buyers who have to make decisions with imperfect information.

For this issue, I have kept the analysis grounded in what can be acted on: which workflows are becoming more practical, which claims still need verification, and where teams should slow down before treating a polished demo as production reality.

The State of Image and Video Generation

As 2025 closes, the generation AI landscape has solidified into clear categories. The chaos of early experimentation has settled into understood tradeoffs and established workflows. Here’s where we stand.

Image Generation: The Maturation

Midjourney Maintains Quality Lead

Midjourney V7巩固了其作为最高质量图像生成工具的地位。The improvements in character consistency and prompt adherence address the last significant weaknesses. For professional content requiring maximum quality, Midjourney remains the choice.

Key improvements in V7:

Significantly better face consistency across images
Improved text rendering (still not perfect, but much better)
Better style transfer from reference images
More reliable prompt interpretation

Cost remains a factor—subscription required, per-image economics not ideal for high-volume use. But for quality-critical work, the output justifies the expense.

FLUX.1 Changes the Economics

Black Forest Labs’ FLUX.1 fundamentally changed the economics of AI image generation. Open weights with commercial licensing means teams can generate unlimited images without per-image fees.

FLUX.1’s position:

Quality matches Midjourney for most use cases
Open weights enable customization and fine-tuning
Commercial licensing removes legal uncertainty
Infrastructure costs much lower than API fees at scale

The tradeoffs: requires more technical expertise to deploy and optimize. Not a simple API call like Midjourney. Infrastructure management becomes the team’s responsibility.

The Vector Generation Breakthrough

Recraft v3 changed what’s possible with AI-generated vector content:

Consistent icon sets across multiple images
SVG export that works cleanly in design tools
Style transfer maintaining brand guidelines
Complex illustrations with multiple elements

For teams building design systems or marketing materials, Recraft v3 significantly accelerates workflows.

Video Generation: The Breakthrough Year

Video AI spent years as “almost ready.” 2025 was the year it actually became ready.

Sora’s Disney Deal Changes the Landscape

OpenAI’s Sora secured a major licensing deal with Disney, bringing AI-generated video into mainstream content production. While the full implications take years to play out, the signal is clear: video generation reached quality where major studios take it seriously.

According to recent comparisons, Sora 2 represents OpenAI’s most advanced video generation model, released in September 2025 and continuously improved since. The model excels at realistic motion and scene consistency.

Key Sora improvements:

Consistent character and object representation over time
Improved physics simulation (liquids, fabrics, physics interactions)
Better camera movement and scene composition
Text-to-video with reasonable quality on simple scenes

Runway Gen-3 Development

Runway continued development of Gen-3, establishing itself as the choice for professional video production workflows:

Better control over camera movement
Improved consistency with provided reference images
Stronger motion handling for complex scenes
Better integration with traditional editing workflows

Pika and the Short-Form Revolution

Pika established dominance for short-form video content:

Quick generation of social media appropriate content
Easier prompt interface for non-experts
Better for abstract/conceptual content than photorealistic
Lower compute requirements enabling faster iteration

Comparative Analysis: When to Use Each Tool

For Product Photography

Best choice: FLUX.1 (commercial production) or Midjourney (highest quality)

Product photography workflows:

Generate base images with FLUX.1 or Midjourney
Use background removal tools for clean composites
Manual refinement for final production assets
A/B test variations before full deployment

The combination of quality and cost-effectiveness makes FLUX.1 particularly attractive for ongoing product photography needs.

For Marketing Campaigns

Best choice: Midjourney for hero images, Recraft for vector elements

Marketing workflow optimization:

Concept exploration with Midjourney (better at abstract interpretation)
Variation generation with FLUX.1 (more consistent, cheaper at scale)
Vector elements from Recraft (icons, illustrations, brand elements)
Text elements from Ideogram (best text rendering)

For Video Content

Best choice: Sora for quality, Pika for speed

Video content strategy:

Conceptual/short-form: Pika (fast iteration, good for social)
Professional/quality: Sora or Runway (depends on control requirements)
Hybrid approaches: AI generation + manual refinement

Video generation still requires significant post-production work for professional use. Expect to spend time refining AI outputs rather than using them directly.

Practical Considerations

Prompt Engineering Still Matters

Despite improvements in model understanding, prompt engineering remains valuable:

Be specific about subject: “A woman” produces generic results. “A 40-year-old woman with silver-streaked dark hair, wearing a leather jacket, standing in morning light” produces distinctive results.

Reference styles explicitly: “Photorealistic, editorial photography, soft shadows, shallow depth of field” gives better control than “good quality.”

Describe the mood: “Contemplative, slightly melancholic, morning light, city background” helps generate images with emotional coherence.

Negative Prompting Continues to Matter

All current models benefit from explicit negative prompts:

Common negative prompts:

“blurry, low quality, distorted”
“watermark, text, logo”
“extra fingers, asymmetric face”
“amateur, stock photo feel”

The specific negatives depend on the model and use case. Experimentation pays off.

Iteration Over Perfection

The highest quality images often come from iterative refinement rather than single generation:

Iteration workflow:

Generate initial images (3-5)
Select strongest or most promising elements
Refine prompts based on results
Generate variations on strongest options
Select and refine again

This approach produces better results than trying to get perfect output in one generation.

Looking Ahead to 2026

Next year promises further improvements:

Video generation quality approaches photo quality
Real-time image generation becomes viable
3D generation makes meaningful progress
Integrated workflows across image, video, and 3D

The line between AI-generated and human-created content continues to blur. The teams that master these tools will have significant advantages in content velocity.

That’s the briefing for this week. See you next Tuesday.

Verification Note

This issue was reviewed in the April 27, 2026 content audit. Product names, model availability, pricing, and regulatory details can change quickly, so high-stakes decisions should be checked against the original provider, regulator, or research source before publication or purchase.