Why This Matters Now
The point of The State of AI Image and Video Generation: Midjourney, Sora, and What Actually Works is not to chase every announcement. The useful signal is what changed for builders, creators, teams, and buyers who have to make decisions with imperfect information.
For this issue, I have kept the analysis grounded in what can be acted on: which workflows are becoming more practical, which claims still need verification, and where teams should slow down before treating a polished demo as production reality.
The State of Image and Video Generation
As 2025 closes, the generation AI landscape has solidified into clear categories. The chaos of early experimentation has settled into understood tradeoffs and established workflows. Here’s where we stand.
Image Generation: The Maturation
Midjourney Maintains Quality Lead
Midjourney V7巩固了其作为最高质量图像生成工具的地位。The improvements in character consistency and prompt adherence address the last significant weaknesses. For professional content requiring maximum quality, Midjourney remains the choice.
Key improvements in V7:
- Significantly better face consistency across images
- Improved text rendering (still not perfect, but much better)
- Better style transfer from reference images
- More reliable prompt interpretation
Cost remains a factor—subscription required, per-image economics not ideal for high-volume use. But for quality-critical work, the output justifies the expense.
FLUX.1 Changes the Economics
Black Forest Labs’ FLUX.1 fundamentally changed the economics of AI image generation. Open weights with commercial licensing means teams can generate unlimited images without per-image fees.
FLUX.1’s position:
- Quality matches Midjourney for most use cases
- Open weights enable customization and fine-tuning
- Commercial licensing removes legal uncertainty
- Infrastructure costs much lower than API fees at scale
The tradeoffs: requires more technical expertise to deploy and optimize. Not a simple API call like Midjourney. Infrastructure management becomes the team’s responsibility.
The Vector Generation Breakthrough
Recraft v3 changed what’s possible with AI-generated vector content:
- Consistent icon sets across multiple images
- SVG export that works cleanly in design tools
- Style transfer maintaining brand guidelines
- Complex illustrations with multiple elements
For teams building design systems or marketing materials, Recraft v3 significantly accelerates workflows.
Video Generation: The Breakthrough Year
Video AI spent years as “almost ready.” 2025 was the year it actually became ready.
Sora’s Disney Deal Changes the Landscape
OpenAI’s Sora secured a major licensing deal with Disney, bringing AI-generated video into mainstream content production. While the full implications take years to play out, the signal is clear: video generation reached quality where major studios take it seriously.
According to recent comparisons, Sora 2 represents OpenAI’s most advanced video generation model, released in September 2025 and continuously improved since. The model excels at realistic motion and scene consistency.
Key Sora improvements:
- Consistent character and object representation over time
- Improved physics simulation (liquids, fabrics, physics interactions)
- Better camera movement and scene composition
- Text-to-video with reasonable quality on simple scenes
Runway Gen-3 Development
Runway continued development of Gen-3, establishing itself as the choice for professional video production workflows:
- Better control over camera movement
- Improved consistency with provided reference images
- Stronger motion handling for complex scenes
- Better integration with traditional editing workflows
Pika and the Short-Form Revolution
Pika established dominance for short-form video content:
- Quick generation of social media appropriate content
- Easier prompt interface for non-experts
- Better for abstract/conceptual content than photorealistic
- Lower compute requirements enabling faster iteration
Comparative Analysis: When to Use Each Tool
For Product Photography
Best choice: FLUX.1 (commercial production) or Midjourney (highest quality)
Product photography workflows:
- Generate base images with FLUX.1 or Midjourney
- Use background removal tools for clean composites
- Manual refinement for final production assets
- A/B test variations before full deployment
The combination of quality and cost-effectiveness makes FLUX.1 particularly attractive for ongoing product photography needs.
For Marketing Campaigns
Best choice: Midjourney for hero images, Recraft for vector elements
Marketing workflow optimization:
- Concept exploration with Midjourney (better at abstract interpretation)
- Variation generation with FLUX.1 (more consistent, cheaper at scale)
- Vector elements from Recraft (icons, illustrations, brand elements)
- Text elements from Ideogram (best text rendering)
For Video Content
Best choice: Sora for quality, Pika for speed
Video content strategy:
- Conceptual/short-form: Pika (fast iteration, good for social)
- Professional/quality: Sora or Runway (depends on control requirements)
- Hybrid approaches: AI generation + manual refinement
Video generation still requires significant post-production work for professional use. Expect to spend time refining AI outputs rather than using them directly.
Practical Considerations
Prompt Engineering Still Matters
Despite improvements in model understanding, prompt engineering remains valuable:
Be specific about subject: “A woman” produces generic results. “A 40-year-old woman with silver-streaked dark hair, wearing a leather jacket, standing in morning light” produces distinctive results.
Reference styles explicitly: “Photorealistic, editorial photography, soft shadows, shallow depth of field” gives better control than “good quality.”
Describe the mood: “Contemplative, slightly melancholic, morning light, city background” helps generate images with emotional coherence.
Negative Prompting Continues to Matter
All current models benefit from explicit negative prompts:
Common negative prompts:
- “blurry, low quality, distorted”
- “watermark, text, logo”
- “extra fingers, asymmetric face”
- “amateur, stock photo feel”
The specific negatives depend on the model and use case. Experimentation pays off.
Iteration Over Perfection
The highest quality images often come from iterative refinement rather than single generation:
Iteration workflow:
- Generate initial images (3-5)
- Select strongest or most promising elements
- Refine prompts based on results
- Generate variations on strongest options
- Select and refine again
This approach produces better results than trying to get perfect output in one generation.
Looking Ahead to 2026
Next year promises further improvements:
- Video generation quality approaches photo quality
- Real-time image generation becomes viable
- 3D generation makes meaningful progress
- Integrated workflows across image, video, and 3D
The line between AI-generated and human-created content continues to blur. The teams that master these tools will have significant advantages in content velocity.
That’s the briefing for this week. See you next Tuesday.
Verification Note
This issue was reviewed in the April 27, 2026 content audit. Product names, model availability, pricing, and regulatory details can change quickly, so high-stakes decisions should be checked against the original provider, regulator, or research source before publication or purchase.