{
  "meta": {
    "date": "2026-06-25",
    "title": "Gemini Omni Flash \u2014 Scientific Video Assessment"
  },
  "executive_summary": "**Bottom line: Omni Flash is NOT ready for scientific or educational video.** Its visuals are stunning and it is often conceptually correct, but for *factual* content that polish is a liability rather than an asset. Across this 35-case study the model showed three blocking problems:\n\n- **Scientific text is unreliable.** Dense labels, equations and annotations render as convincing gibberish \u2014 the volcano's crowded labels (A8), the chalkboard 'physics' behind the presenter (A10), the blueprint annotations (G4) and the Krebs-cycle labels (G5) are all nonsense. For a teaching medium, garbled-but-authoritative-looking text is disqualifying.\n- **Physics is approximated, not simulated.** The cargo-ship ocean impact (I4) badly under-scales water displacement, with no air-cavity collapse or rebound jet and stylised 'hovering' dynamics; the Hummer novelty (I6) is physically impossible. The model renders the *look* of a phenomenon, not its mechanics.\n- **Factual accuracy and consistency are unpredictable.** The succulent time-lapse (I1) drew a veined broad-leaf seedling from a grossly oversized seed; the solar system (A3) is mis-counted and mis-ordered; and multi-turn edits under-apply requested structural changes (the DNA 'fork' edit, G2). Because correct and subtly-wrong outputs are *visually indistinguishable*, every clip needs expert review \u2014 which defeats the point for at-scale educational production.\n\nThis is not a dismissal of the model's quality. **34 of 35 generations succeeded** (the lone hard failure was a deliberate 1080p probe revealing no resolution control), visual quality and prompt adherence are consistently excellent, and several clips were genuinely accurate \u2014 the water-cycle whiteboard (A1), methane combustion (A6), the odd-harmonic Fourier series (A4), photosynthesis labels (C2), canyon incision (I2), a black hole's gravitational lensing (B3) and the image-to-video neuron and Earth-interior diagrams (E1, E2). But strong individual results are not the same as reliability, and **reliability is the bar for science**.\n\nThe dominant limitation is **dense on-screen text**: large 'hero' text and short labels/formulae render cleanly (A1's water-cycle labels, A4's 'n=1,3,5', A6's atom symbols, A10's giant 'E=mc\u00b2', C2's photosynthesis labels), but crowded labels and hand-written equations collapse into authentic-looking gibberish (A8, A10's chalkboard, G4's blueprint). In a teaching medium that is a real risk, because the nonsense looks convincing.\n\nTwo workarounds proved decisive for science. **(1) Image-to-video on a text-perfect diagram** generated by the image model preserves every label crisply while Omni adds motion (E1, E2) \u2014 this side-steps the text weakness entirely. **(2) Explicitly requesting no text reliably yields text-free clips** (C1). Multi-turn editing \u2014 style transfer, character swap and reference-guided restyle \u2014 is a genuine strength with strong scene and motion consistency (G3, G4, G6), though large structural content edits can be muted (G2).\n\n**The accuracy scores survived three escalating layers of review, each of which surfaced more errors the prior pass had missed \u2014 while also confirming the genuinely accurate clips against cited sources.** A naive single-pass visual judge flagged ~10 problem clips. An independent **adversarial expert panel** (8-frame strips, web-grounded rubrics) downgraded several more it had over-credited \u2014 most strikingly mitosis 5\u21922 (the 'separating' chromosomes are still paired sister chromatids). Then a **human domain reviewer went through all 34 clips one by one** and caught still more that even the panel passed: DNA base-pairing with identical bases bonded together (A2), oxygenated blood routed to the wrong side of the heart (A5), protein folding 'not grounded in any real biochemistry' (D1), and a ribosome showing 4-base 'codons' instead of triplets (I3). After this expert pass, **only 16 of the 34 successfully-generated clips (the 1080p probe D2 failed and is unscored) are reliably accurate (\u22654/5), 4 are mixed, and 14 contain serious scientific errors (\u22642/5)** \u2014 fewer than half are dependable \u2014 with several teaching-critical labels outright fabricated. That more scrutiny keeps surfacing more errors \u2014 and that the wrong clips look completely authoritative \u2014 is the core reason Omni is unsafe for unvetted scientific use. (See *Judging methodology & confidence*.)\n\n**Where it fits today:** decorative or ambient, science-*flavoured* B-roll, hooks and establishing shots where nothing on screen has to be factually correct; and \u2014 strictly with a human-in-the-loop and the image-to-video diagram workaround \u2014 assistive *draft* generation. **Where it does not fit yet:** unattended generation of explainer, diagram, equation or physics content for teaching, publication, or any setting where a viewer will take the depiction as fact. The combination of persuasive realism and unpredictable factual errors is precisely the wrong profile for science communication. **Recommendation: do not deploy Omni for scientific/educational video without expert human verification of every clip; re-evaluate as the text, physics and consistency issues are addressed.**",
  "scorecard": [
    {
      "dimension": "Readiness for scientific/educational video",
      "score": 1,
      "note": "**NOT READY for unattended use.** Blocked by unreliable scientific text, non-rigorous physics, and unpredictable factual/consistency errors (below) \u2014 and because every clip is a single one-shot generation (no iteration), this rates the *un-iterated floor*, not the model's ceiling. On this scale **1 = not safely usable unsupervised for factual content** (it is not 'worthless' \u2014 visual quality and prompt adherence both score 5); every clip needs expert review. Usable today only for decorative, non-factual B-roll, or as human-vetted drafts. Re-evaluate when the blockers are fixed."
    },
    {
      "dimension": "Scientific text & labels",
      "score": 2,
      "note": "Hard blocker. Short 'hero' text is fine, but dense or hand-written labels, equations and annotations render as convincing gibberish (A8, A10, G4, G5). Authoritative-looking nonsense is disqualifying for teaching."
    },
    {
      "dimension": "Physics & dynamics fidelity",
      "score": 2,
      "note": "Renders the look of a phenomenon, not its mechanics. The ship-ocean impact under-scales displacement with no cavity/rebound jet (I4); the Hummer paraglide is physically impossible (I6). Fine for vibe, wrong for mechanism."
    },
    {
      "dimension": "Factual reliability & consistency",
      "score": 2,
      "note": "Correctness is unpredictable and correct-vs-wrong clips look identical: identical DNA bases shown pairing (A2), oxygenated blood routed to the wrong side (A5), protein folding not grounded in real biochemistry (D1), 4-base 'codons' (I3), veined succulent seedling (I1), mis-ordered solar system with mis-assigned rings (A3). After expert-panel + human-reviewer scrutiny, 14 of 34 clips have serious errors and only 16 are reliably accurate (fewer than half). No clip can be trusted without expert review."
    },
    {
      "dimension": "Scientific accuracy (process-level, when it lands)",
      "score": 3,
      "note": "Often conceptually correct (canyon incision, Fourier, black-hole lensing, water cycle, methane combustion, photosynthesis labels) \u2014 but 'often' is not 'reliably'. The adversarial panel downgraded mitosis 5\u21922, Krebs labels 3\u21921, volcano 4\u21922 and the solar system 3\u21922 once the facts were actually checked; the polish hides the misses."
    },
    {
      "dimension": "Visual quality & realism",
      "score": 5,
      "note": "Consistently broadcast-grade across every style \u2014 which is exactly what makes the factual errors dangerous: they look authoritative."
    },
    {
      "dimension": "Prompt adherence (style/composition)",
      "score": 5,
      "note": "Reliably produces the requested style, framing, subject and aspect ratio."
    },
    {
      "dimension": "Image- / Reference-to-video",
      "score": 4,
      "note": "The one dependable route to correct on-screen labels: animate a text-perfect diagram from the image model. Strong, and the recommended mitigation \u2014 but a workaround, not readiness."
    },
    {
      "dimension": "Latency, reliability & developer experience",
      "score": 4,
      "note": "Clean conversational Interactions API; ~39 s/clip. Points off for opaque 400s, content-word rejections and transient chained-edit failures."
    }
  ],
  "judging": "**How these accuracy scores were produced \u2014 and why the method changed.** An earlier version of this report scored accuracy in a single pass, by one reviewer viewing four still frames per clip and judging from memory. That proved unreliable for science: it over-credited polished-but-wrong content. The clearest proof is the mitosis clip (B2), first rated 'textbook-accurate' (5/5) \u2014 yet the chromosomes moving to the poles are still X-shaped **paired sister chromatids** rather than the separated single chromatids that *define* anaphase. It was wrong, and it looked right. (The succulent error you can see in I1 was likewise missed until a domain check.)\n\nTo fix this, every clip was re-judged by an **independent, adversarial domain-expert panel** \u2014 seven reviewers across molecular & cell biology, neuroscience & physiology, chemistry & biochemistry, physics & mathematics, astronomy & astrophysics, earth & environmental science, and botany/geography. Each reviewer (1) viewed a **denser 8-frame** time-strip; (2) built a rubric of specific, *verifiable* claims and **checked the underlying facts against web sources** instead of relying on memory; (3) inspected the frames **adversarially**, trying to disprove the depiction; and (4) returned a per-claim PASS/FAIL/UNSURE verdict, a calibrated **confidence** level, and an **'expert-review-needed'** flag for anything that can't be confirmed from stills.\n\nThe panel **downgraded several clips** the first pass had over-rated: mitosis 5\u21922, the Krebs-cycle diagram 3\u21921, the volcano labels 4\u21922, the solar system 3\u21922, microscope mitosis 5\u21923, and the DNA clips 4\u21923 (the helix reads left-handed; B-DNA is right-handed). It also **confirmed** the genuinely accurate ones (water cycle, Fourier, methane combustion, black-hole lensing, photosynthesis labels, Earth interior, canyon, the animated neuron diagram) with cited sources. The accuracy score on each card below is the **panel's** value; its confidence and the specific errors it found are shown under 'Expert-panel check'. We also enforce one consistency rule: for factual content **educational suitability cannot exceed accuracy** \u2014 a clip you can't trust isn't teachable.\n\n**Finally, a human domain reviewer went through all 34 clips one by one.** Their notes appear in the blue *Reviewer feedback* box on each card and are treated as authoritative \u2014 where the reviewer identified an error the AI layers had missed, the score was lowered accordingly (and raised where a full-video pass cleared a stills-based concern). Watching the motion (not just frames) surfaced errors neither AI layer caught: identical DNA bases bonded together (A2), oxygenated blood going to the wrong side of the heart (A5), protein folding with no real biochemical pathway (D1), 4-base 'codons' at the ribosome (I3), and a microwave-like appliance with ice cream dripping from the top (I5). Each escalating layer of review found more errors than the last.\n\n**Even this is not the end state.** The assessment still judges mostly from keyframes, the AI layers remain AI (not peer-reviewed sign-off), and the human pass was one reviewer, not a panel of subject specialists. The 'expert-review-needed' flags mark where a true specialist is still required. That caveat is the report's own thesis turned on itself: AI-generated science \u2014 and AI judging of it \u2014 needs human expert verification.",
  "methodology": "All videos were generated through the Gemini **Interactions API** (`client.interactions.create`) against the Omni Flash model `bouncybohr`, synchronously (`background=false, stream=false`). Source and reference images were generated separately via **OpenRouter** using `google/gemini-3.1-flash-image` and fed to Omni as inline image inputs.\n\n- **35 cases** across 9 feature areas: a 28-case core feature matrix (text/image/reference-to-video, multi-turn edits, aspect ratio & duration control, text-rendering probes, a resolution probe, URI delivery) plus a **7-case extended set** of creative & accuracy probes (group I: geological, molecular-biology, fluid-dynamics, everyday-thermodynamics, a physics novelty, and an ukiyo-e tourism montage). Domains span biology, chemistry, physics, astronomy, mathematics, medicine, geology, botany, climate and cultural geography; styles from photoreal to whiteboard to woodblock.\n- Each clip was saved as MP4; **4 evenly-spaced keyframes** were extracted with ffmpeg and **visually inspected** to score scientific accuracy, visual quality, prompt adherence, on-screen-text legibility, motion coherence and educational suitability (1\u20135).\n- **Observed parameters:** default duration is **10 s**; explicit `5s` and `10s` honoured; multi-turn edits inherit the parent clip's duration (8 s). Output resolution is fixed at **1280\u00d7720**. Both **inline base64** and **`delivery:'uri'`** (download via the Files API) work. Mean generation time **39 s** (range 31\u201361 s; edits run longer).\n- Everything is stored locally; nothing was uploaded, per the EAP confidentiality terms.",
  "videos": {
    "A1": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 5,
        "text": 5,
        "coherence": 5,
        "edu": 5
      },
      "verdict": "Reference-quality whiteboard explainer. A hand progressively draws the entire cycle and **every label is correctly spelled and legible** \u2014 SUN, OCEAN, EVAPORATION, CONDENSATION, PRECIPITATION, RUNOFF \u2014 with a complete, scientifically correct loop of arrows. Directly usable in a lesson as-is.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material",
      "user_feedback": "Good."
    },
    "A2": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "Gorgeous photorealistic 3D helix with a believable sugar-phosphate backbone \u2014 but, as the reviewer flagged, the colour-coded base pairing is wrong: identical (same-colour) bases are shown bonded to each other, violating complementary A\u2013T / G\u2013C pairing. Visually excellent, chemically incorrect on the one rule that matters most. Correctly produced no text.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "base-pair rungs not cleanly two-base complementary in close-ups; helix handedness unverifiable from stills",
      "user_feedback": "Reviewer caught a fundamental error: the colour-coding shows **identical bases pairing with each other** (e.g. purple bonded to purple). That's scientifically wrong \u2014 DNA base pairing is complementary (A\u2013T and G\u2013C), so a base never pairs with one of the same type. The helix looks beautiful but depicts the central base-pairing rule incorrectly."
    },
    "A3": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 4,
        "text": null,
        "coherence": 3,
        "edu": 2
      },
      "verdict": "Spectacular rendering \u2014 banded Jupiter, ringed Saturn, recognisable Earth \u2014 but it is an artistic 'solar-system poster' rather than accurate orbital mechanics: planet count, spacing and order are not reliable. Use it for wonder, not for teaching orbital facts.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "prominent rings mis-assigned to multiple non-Saturn planets; non-heliocentric order; planet count/positions inconsistent across frames; an Earth-like planet sits implausibly close to the Sun",
      "user_feedback": "Reviewer noticed a temporal-consistency glitch: one of the planets nearest the Sun **disappears** during the first part of the clip. On top of the panel's findings (wrong planet count/order, rings on the wrong planets), the scene isn't even stable frame-to-frame."
    },
    "A4": {
      "scores": {
        "accuracy": 5,
        "quality": 4,
        "adherence": 5,
        "text": 5,
        "coherence": 5,
        "edu": 5
      },
      "verdict": "Mathematically correct and a top result. Sine components sum into a square wave with visible Gibbs ringing, and it even labels **'n = 1, 3, 5\u2026' \u2014 the correct odd-harmonic series**. Legible, correct on-screen math.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 correct odd harmonics (n=1,3,5\u2026), ~1/n amplitudes, Gibbs ringing",
      "user_feedback": "Looks visually correct and conveys the message. Not 100% sure the exact summed waveform is numerically correct, but it doesn't need to be for this explanatory purpose."
    },
    "A5": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "Anatomically detailed cross-section (four chambers, valves, chordae) with colour-coded directional flow \u2014 but the reviewer flagged that the colour-coded routing mixes the two circulations: oxygenated (red) blood correctly belongs on the left, yet red also appears on the right side (where deoxygenated blue blood should be), so the systemic and pulmonary flows are not kept separate. Looks clinical, but not reliable for teaching how blood actually moves through the heart.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "chamber/valve anatomy and the blue=deoxygenated / red=oxygenated convention look correct from stills; the full-clip domain reviewer caught the two circulations being mixed (red appears on the right side) \u2014 score reflects the reviewer",
      "user_feedback": "Reviewer (domain): the blood flow is anatomically incorrect \u2014 **the model fails to keep the two circulations separate.** Oxygenated (red) blood correctly belongs on the left, but here red also appears on the right side (where deoxygenated blue blood should be), so the colour-coded routing mixes the systemic and pulmonary circulations and is not reliable for teaching how blood moves through the heart. (The panel had only been able to call this 'unverifiable from stills'.)"
    },
    "A6": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 5,
        "text": 5,
        "coherence": 4,
        "edu": 5
      },
      "verdict": "Accurate chemistry. Reactants (CH4 + 2 O2) shown with correct atom colours and labels, a combustion flash, then products (CO2 + 2 H2O). Stoichiometry and molecular structures are right \u2014 excellent.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 correct geometries (tetrahedral CH4, linear CO2, bent H2O) and conserved CH4+2O2\u2192CO2+2H2O",
      "user_feedback": "Checked the reaction: CH4 + 2 O2 \u2192 CO2 + 2 H2O is balanced and atoms are conserved, and the molecular geometries are right (tetrahedral methane, two O2, linear CO2, bent water). Accurate."
    },
    "A7": {
      "scores": {
        "accuracy": 4,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 4
      },
      "verdict": "Cinematic and dramatic: supergiant \u2192 core collapse \u2192 explosion \u2192 expanding remnant, a roughly correct sequence. Spectacular for intros/hooks; artistic rather than data-accurate.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "red-supergiant swelling phase under-represented (stays orange/yellow); ejecta morphology artistically floral",
      "user_feedback": "Visually appealing, but reviewer is unsure how to judge the scientific accuracy of this one (stellar evolution is hard to assess from a short clip)."
    },
    "A8": {
      "scores": {
        "accuracy": 3,
        "quality": 5,
        "adherence": 4,
        "text": 2,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "Excellent cutaway (magma chamber, conduit, strata, ash plume, lava). The two large labels ('Magma Chamber', 'Conduit') render correctly, but the dense second-pass labels degrade to gibberish ('Mora Claw', 'Pyno Blath', 'Angon Chamber'). Great visuals, untrustworthy labels.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "multiple gibberish labels ('Pyro Black','Mora Glse','Aspra Chamber'); duplicate 'Magma Chamber' labels both pointing at the conduit, not a basal chamber",
      "user_feedback": "Reviewer: good enough for what it's trying to show \u2014 the structural cutaway (magma chamber, conduit, eruption, layered strata) conveys the process adequately. (The garbled/duplicate text labels the panel flagged remain a caveat for any labelled use.)"
    },
    "A9": {
      "scores": {
        "accuracy": 3,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 3
      },
      "verdict": "Beautiful stylised 3D action potential travelling soma \u2192 axon \u2192 synaptic terminal with vesicle release. Direction and sequence are correct.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "neurotransmitter 'vesicles' oversized and released diffusely without a defined synaptic cleft (stylised, not literal)",
      "user_feedback": "Reviewer: good as a visualization but not really scientifically accurate \u2014 acceptable for this conceptual/illustrative use case, not for rigorous teaching. (Direction soma\u2192axon\u2192terminal is right; the synapse/vesicle depiction is loose.)"
    },
    "A10": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 4,
        "text": 2,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "Photorealistic, consistent presenter. The hero equation **'E=mc\u00b2' is perfect**, but every surrounding chalk equation is authentic-looking gibberish ('Anoul phicts', 'A energy farstie'). The realism is a double-edged sword: fake equations look convincing enough to mislead.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "every chalkboard element except 'E=mc\u00b2' is pseudo-physics word-salad / malformed equations (e.g. 'Aroul phcts', 'f=m\u00b2/2\u00b2')",
      "user_feedback": "Reviewer confirms the panel: the main formula (E=mc\u00b2) is correct, but everything else on the board is made-up, inconsistent gibberish."
    },
    "B1": {
      "scores": {
        "accuracy": 4,
        "quality": 5,
        "adherence": 5,
        "text": 4,
        "coherence": 4,
        "edu": 4
      },
      "verdict": "Correct vertical 9:16. Engaging kawaii motion-graphics; greenhouse process correct; gas labels 'CO2'/'CH4' and titles ('THE GREENHOUSE EFFECT', 'TRAPPED!') legible. Ideal for education shorts/Reels.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "greenhouse gases drawn as a discrete outer shell rather than well-mixed; cartoon 'trapping' slightly oversimplifies re-emission",
      "user_feedback": "Looks good."
    },
    "B2": {
      "scores": {
        "accuracy": 1,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 5,
        "edu": 1
      },
      "verdict": "Honoured the 5 s request and the stage *order* is right (condensation \u2192 metaphase plate \u2192 poleward movement \u2192 cytokinesis) with lovely microscopy realism \u2014 but the expert panel caught a defining error the first pass missed: the bodies pulled to the poles in 'anaphase' are still **X-shaped paired sister chromatids**, not the separated single chromatids that anaphase is *defined* by. It looks textbook-correct and is biologically wrong \u2014 the canonical generative-video mitosis error.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "anaphase shows X-shaped PAIRED sister chromatids moving to both poles instead of separated single chromatids \u2014 the defining event is misrepresented",
      "user_feedback": "Reviewer: many scientific issues with this one \u2014 beyond the anaphase error the panel flagged (chromosomes moving to the poles as paired X-shaped chromatids rather than separated single chromatids). Not trustworthy as a depiction of mitosis."
    },
    "B3": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 5,
        "edu": 5
      },
      "verdict": "Honoured the 10 s request. Interstellar-grade black hole: accretion disk, bright photon ring, and **gravitational lensing showing the disk's far side above and below the shadow \u2014 physically correct**. Outstanding.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 correct shadow, photon ring and over-the-top gravitational lensing of the far disk",
      "user_feedback": "Reviewer defers \u2014 not their area of expertise. Score relies on the expert panel's web-verified assessment (correct event-horizon shadow, photon ring and gravitational lensing of the far disk; NASA-sourced)."
    },
    "C1": {
      "scores": {
        "accuracy": 3,
        "quality": 4,
        "adherence": 4,
        "text": 5,
        "coherence": 3,
        "edu": 2
      },
      "verdict": "KEY PROBE \u2014 explicitly asked for NO text and the model produced **none**. Text-free output is reliably achievable by prompting. The physics is only loosely right, though: E/B perpendicularity is unclear and one frame becomes chaotic.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "E and B rendered coplanar (mirror images) rather than mutually perpendicular (E\u22a5B\u22a5k); one frame collapses to a non-physical localized spike",
      "user_feedback": "Reviewer couldn't tell what the clip is actually showing \u2014 it doesn't read as a clear EM-wave visualization. Reinforces the panel's finding that the E/B field geometry is wrong/ambiguous (fields not mutually perpendicular)."
    },
    "C2": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 5,
        "text": 4,
        "coherence": 4,
        "edu": 5
      },
      "verdict": "Best 'many labels' result. sunlight, CO2, H2O-from-roots, chloroplast, O2 and glucose are all present and mostly legible/correct \u2014 short words and chemical formulae render far better than sentences. Genuinely teachable.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 correct inputs/outputs, organelle and label spelling",
      "user_feedback": "Looks good."
    },
    "D1": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "A convincing ribbon animation (recognizable \u03b1-helices and \u03b2-sheet arrows collapsing into a compact fold; native 1280\u00d7720) \u2014 but the reviewer (biochemistry) flags it as not grounded in real biochemistry: the folding is arbitrary morphing, not an actual sequence\u2192structure pathway. It looks like protein folding without being it \u2014 a clear case of the panel over-crediting visual motifs that a domain expert rejects.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "panel found the ribbon motifs correct from stills (extended chain \u2192 \u03b1-helices + \u03b2-sheet arrows \u2192 compact fold); the full-clip domain reviewer (biochemistry) judged the folding pathway non-physical \u2014 score reflects the reviewer",
      "user_feedback": "Reviewer (biochemistry): many issues \u2014 **not grounded in any real biochemistry.** The 'folding' is arbitrary morphing rather than a real sequence\u2192structure pathway; recognizable \u03b1-helix / \u03b2-sheet motifs appear but don't represent an actual folding process. The panel over-credited the ribbon look; a domain reviewer does not."
    },
    "D2": {
      "scores": {
        "accuracy": null,
        "quality": null,
        "adherence": null,
        "text": null,
        "coherence": null,
        "edu": null
      },
      "verdict": "**Deliberate probe that failed informatively.** Adding `\"resolution\":\"1080p\"` to response_format was rejected with *\"Unknown parameter 'resolution'\"*. Omni Flash exposes **no resolution control** in this EAP build \u2014 output is fixed at 720p. This directly answers the EAP resolution question."
    },
    "E1": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 5,
        "text": 5,
        "coherence": 4,
        "edu": 5
      },
      "verdict": "Image-to-video standout and a strategic finding. The OpenRouter labelled diagram is preserved pixel-faithfully \u2014 **all nine labels stay crisp** \u2014 while a bright pulse animates soma \u2192 axon \u2192 synapse. This is the workaround for Omni's native text weakness.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 all labels correctly spelled/placed; correct soma\u2192axon\u2192terminal saltatory propagation",
      "user_feedback": "Fine as a high-level teaching tool."
    },
    "E2": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 4,
        "text": 5,
        "coherence": 4,
        "edu": 5
      },
      "verdict": "Labelled Earth cross-section preserved (CRUST / MANTLE / OUTER CORE / INNER CORE plus depth values) with subtle interior motion. Convection is gentle, but diagram fidelity is excellent.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 correct layer order, depth values, phase states and mantle (not inner-core) convection",
      "user_feedback": "Reviewer thinks it's good but isn't an expert in this area; score relies on the panel's web-verified assessment (correct layer order, depth values and mantle convection)."
    },
    "E3": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 5,
        "edu": 2
      },
      "verdict": "Convincing light-microscopy styling and the right stage order, but the panel flagged the same anaphase problem as B2 (poleward bodies appear to remain paired, not resolved into single chromatids) plus a **chromosome count that drifts between frames** \u2014 implausible for one dividing cell. Needs expert review of the segregation step.",
      "confidence": "med",
      "expert_review": true,
      "panel_errors": "likely unresolved sister chromatids at anaphase; chromosome count drifts between frames",
      "user_feedback": "Reviewer confirms an inconsistency with the chromosomes (count/appearance drifts between frames) \u2014 the same class of error as B2; not a reliable depiction of cell division."
    },
    "F1": {
      "scores": {
        "accuracy": 4,
        "quality": 4,
        "adherence": 5,
        "text": 3,
        "coherence": 4,
        "edu": 4
      },
      "verdict": "Used BOTH reference images: a realistic lab pour of blue solution with bubbling and a colour change, plus an overlaid copper-sulfate complex matching the molecule reference. Beaker graduations are partly legible. Strong reference-to-video.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "octahedral Cu and tetrahedral sulfate overlays and blue colour are correct, but the vigorous bubbling is not chemically justified by simply mixing copper-sulfate solution",
      "user_feedback": "Reviewer: scientifically fine, but the latter part isn't that interesting and looks like cheap stock footage (an aesthetic/production critique, not an accuracy issue)."
    },
    "G1": {
      "scores": {
        "accuracy": 4,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 3
      },
      "verdict": "Clean photorealistic chain anchor \u2014 a rotating helix on a bokeh field (store=true so later edits can reference it). The panel docks it for a subtle but real error: the helix reads **left-handed** in several frames, whereas biological B-DNA is right-handed \u2014 a flaw inherited by the whole G-chain (G2\u2013G4).",
      "confidence": "med",
      "expert_review": true,
      "panel_errors": "helix appears LEFT-handed (B-DNA is right-handed); major/minor grooves not differentiated",
      "user_feedback": "Fine."
    },
    "G2": {
      "scores": {
        "accuracy": 3,
        "quality": 5,
        "adherence": 3,
        "text": null,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "Content edit preserved the style perfectly but **under-applied the requested structural change** \u2014 the strands barely separate. Multi-turn keeps consistency strongly; large structural edits can be muted. (Note: the original 'unzip/split' wording was rejected by input validation and had to be reworded \u2014 see limitations.)",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "the requested unzip/replication-fork never forms \u2014 the helix stays fully zipped through all frames; also inherits the left-handed twist",
      "user_feedback": "Fine."
    },
    "G3": {
      "scores": {
        "accuracy": 4,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 3
      },
      "verdict": "Excellent style transfer: the identical helix and motion redrawn as white chalk on a green board, composition fully preserved across the turn.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "DNA structure preserved through the chalkboard restyle, but inherits the likely left-handed twist of the base clip",
      "user_feedback": "Fine."
    },
    "G4": {
      "scores": {
        "accuracy": 4,
        "quality": 5,
        "adherence": 5,
        "text": 1,
        "coherence": 4,
        "edu": 3
      },
      "verdict": "Reference-guided restyle nailed the blueprint aesthetic (grid, title block, dimension lines) over the same DNA motion. The decorative drafting annotations are gibberish, as expected for dense text.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "DNA structure preserved, but inherits the left-handed twist; decorative blueprint annotations are gibberish",
      "user_feedback": "Fine."
    },
    "G5": {
      "scores": {
        "accuracy": 1,
        "quality": 5,
        "adherence": 4,
        "text": 2,
        "coherence": 2,
        "edu": 1
      },
      "verdict": "Photorealistic, consistent presenter beside a circular Krebs-cycle diagram. 'Krebs Cycle', 'ATP' and 'Citric Acid' are correct; the remaining hand-written labels are gibberish.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "node labels are fabricated words ('Promiustat','Evention','CH2COrH2'); the real cycle intermediates (isocitrate, \u03b1-ketoglutarate, succinate, malate, oxaloacetate\u2026) and carriers (NADH/FADH2/CO2) are all absent",
      "user_feedback": "Reviewer: pretty bad text consistency \u2014 a node label flips between 'ATP' and 'Citric Acid' across frames, then back to ATP \u2014 and the cycle itself is not correct. Confirms and extends the panel's finding (fabricated labels, missing real intermediates)."
    },
    "G6": {
      "scores": {
        "accuracy": 1,
        "quality": 5,
        "adherence": 5,
        "text": 2,
        "coherence": 2,
        "edu": 1
      },
      "verdict": "Character-edit standout: the human teacher becomes a **consistent cartoon robot** while the classroom, whiteboard diagram and pointing gestures are all preserved across the clip.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "identical fabricated/garbled Krebs labels to G5; swapping the presenter to a robot changes nothing about the wrong diagram",
      "user_feedback": "Reviewer: same issues as G5 \u2014 fabricated/garbled Krebs labels, the same text-consistency flipping, and an incorrect cycle (the diagram is unchanged from G5; only the presenter was swapped)."
    },
    "H1": {
      "scores": {
        "accuracy": 4,
        "quality": 5,
        "adherence": 4,
        "text": null,
        "coherence": 4,
        "edu": 3
      },
      "verdict": "Confirms `delivery:'uri'` (downloaded via the Files API rather than inline base64). A gorgeous cinematic flythrough (Jupiter, a Mars-like surface, ringed Saturn, Earth) \u2014 but an artistic montage, not the ordered Mercury\u2192Neptune tour requested.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "Mars omitted from the Mercury\u2192Neptune order; a Saturn-like ringed giant appears in two consecutive frames; Neptune not clearly rendered",
      "user_feedback": "Fine."
    },
    "I1": {
      "scores": {
        "accuracy": 1,
        "quality": 5,
        "adherence": 3,
        "text": null,
        "coherence": 4,
        "edu": 1
      },
      "verdict": "Visually a polished, smooth seed-to-rosette time-lapse, but **not scientifically accurate** (reviewer feedback). The early seedling is shown as a broad-leafed plant with prominent leaf **veins** \u2014 wrong for a succulent, whose leaves are thick, fleshy and smooth without net venation. The **seed is far too large**: real succulent seeds are essentially dust-fine. The mature rosette looks convincingly succulent, but the germination/seedling morphology is botanically incorrect. A textbook example of the recurring pattern in this study \u2014 photorealistic polish masking subtle but important biological errors, which is exactly what makes it risky for unvetted science use.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "oversized visible 'seed' (succulent seeds are dust-fine); broad, thin, prominently net-veined non-fleshy seedling leaves (wrong for a succulent \u2014 whose cotyledons and early leaves are small, plump and fleshy with inconspicuous venation; rosette succulents are themselves dicots, so the tell is the leaf texture/venation and the oversized seed, not the dicot habit); unbridged jump from leafy seedling to glaucous rosette",
      "user_feedback": "Reviewer (reconfirm): inaccurate \u2014 the seedling has leaf **veins** and is **not a succulent** at the start; many issues. The seed and seedling morphology are wrong for a succulent (succulent leaves are fleshy and smooth, seeds dust-fine)."
    },
    "I2": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 5
      },
      "verdict": "**Geologically sound.** The cutaway block-diagram correctly shows fluvial canyon formation: a flat plateau of horizontal sedimentary strata (sandstone/mudstone/limestone) \u2192 a river that **incises downward** through successive layers \u2192 progressive deepening with **rockfall and mass wasting** widening the walls \u2192 a mature canyon with exposed horizontal strata and an incised meandering river at the base (a real phenomenon, as at the Grand Canyon / Goosenecks). The process, the layer-by-layer downcutting and the differential-erosion wall profile are all correct. Caveats: it implies erosion alone without explicitly showing the tectonic uplift / base-level fall that usually drives incision, and the timescale is obviously compressed. Strong, classroom-usable.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "none material \u2014 correct fluvial incision through strata, mass-wasting widening, meandering base channel",
      "user_feedback": "Fine."
    },
    "I3": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 2
      },
      "verdict": "**Comprehensive and accurate at the conceptual/structural level** for an unfilmable nanoscale process. Correctly depicts a two-subunit ribosome clamped on an mRNA strand with nucleotide bases, a cloverleaf tRNA delivering an amino acid, peptide-chain elongation, spent-tRNA release and the polypeptide emerging from the exit region. Simplifications a specialist would flag: the literal codon\u2013anticodon base-pairing isn't shown register-accurate, the A/P/E sites aren't differentiated, and tRNA reads as a 2D cloverleaf rather than its true L-shaped 3D tertiary fold. Excellent for teaching the central dogma; not a structural-biology reference.",
      "confidence": "med",
      "expert_review": true,
      "panel_errors": "subunit arrangement, tRNA shape and emerging peptide are correct; A/P/E-site usage and codon\u2013anticodon register can't be confirmed from stills",
      "user_feedback": "Reviewer (domain) caught a core error: a **codon is a triplet \u2014 3 nucleotides** (read three bases at a time, coding one amino acid), but the clip shows groups of **4**. Highly inaccurate. The panel had flagged codon register as 'unverifiable from stills'; the reviewer resolved it as wrong."
    },
    "I4": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 3,
        "text": null,
        "coherence": 3,
        "edu": 2
      },
      "verdict": "**Qualitatively plausible, not rigorously accurate.** It gets the splash crown, outward-radiating displacement waves, water sheeting off, and buoyant settling \u2014 but the physics is wrong where it counts: **displacement is badly under-scaled** for a ~200kt hull, there is **no air-cavity collapse or central Worthington rebound jet**, the dynamics look stylised and 'hovering' rather than impulsive, and the draft is too shallow. Confirms the study's core finding that Omni produces artistic plausibility, not CFD-grade rigid-body/fluid simulation. A strict accuracy bar is not met.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "displacement & splash energy grossly under-scaled for a laden ship; settled draft floats far too high; no Worthington rebound jet; rigid-body structural failure ignored",
      "user_feedback": "Reviewer: the physics is pretty bad \u2014 on impact the **containers should be flung off the ship and fall**, but they stay put (cargo inertia ignored), on top of the panel's under-scaled displacement / no cavity-collapse findings. Acceptable only if you care purely about the water effect; depends on the use case."
    },
    "I5": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 2,
        "text": null,
        "coherence": 3,
        "edu": 2
      },
      "verdict": "**Physically believable everyday thermodynamics.** Heat-driven melting is depicted correctly: the scoops soften, slump and collapse from solid into a glossy viscous liquid that **drips through the mesh basket** (correct \u2014 an air-fryer basket would let melt drain to the drawer) and pools below, with light bubbling that is plausible under strong convective heat. Minor omissions: over real time the puddle would likely brown/scorch and the fat/water would separate, which isn't shown. As a depiction of melting, it's accurate.",
      "confidence": "med",
      "expert_review": false,
      "panel_errors": "heat-driven melting and drip-through-mesh are believable; late-stage surface frothing somewhat overstated for a pure dairy melt",
      "user_feedback": "Reviewer: pretty bad \u2014 the appliance looks like a **microwave**, not an air fryer, and there's ice cream inexplicably **dripping from the top**. The panel rated the melt from stills; the full clip is wrong on the setting and has a nonsensical artifact."
    },
    "I6": {
      "scores": {
        "accuracy": 2,
        "quality": 5,
        "adherence": 5,
        "text": null,
        "coherence": 4,
        "edu": 1
      },
      "verdict": "**Deliberate novelty \u2014 physically implausible by design, and correctly flagged as such.** Execution is flawless (drive-off \u2192 roof chute deploys \u2192 controlled glide \u2192 soft wheels-down landing, nobody hurt). But the premise violates physics: a ~3-tonne Hummer has an enormous wing loading, so no roof-stowed paraglider canopy could realistically arrest it to a gentle, controlled, safe touchdown \u2014 the descent shown is far too slow and stable for the mass. Scored low on accuracy *as expected* for a requested novelty; it is not meant to be scientific.",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "intentional novelty, but physically impossible \u2014 a ~3 t SUV is ~12\u201315\u00d7 any real paraglider canopy's capacity and the canopy shown is far too small for the lift; gentle landing unachievable",
      "user_feedback": "It's fine \u2014 accepted as an intentional novelty (the low accuracy score reflects physical plausibility, which the stunt deliberately ignores)."
    },
    "I7": {
      "scores": {
        "accuracy": 5,
        "quality": 5,
        "adherence": 2,
        "text": null,
        "coherence": 4,
        "edu": 4
      },
      "verdict": "**Geographically and culturally accurate.** All five depicted sites are genuinely Kyoto and individually correct \u2014 Kinkaku-ji reflected in its pond, the Fushimi Inari torii tunnel, the Arashiyama bamboo grove, the stilted Kiyomizu-dera over a maple hillside, and the Yasaka Pagoda over Gion's machiya streets with a kimono figure under sakura. Crucially it **correctly omitted Mount Fuji** (a frequent Kyoto error \u2014 Fuji is ~270 km away in a straight line, far out of sight). Cultural elements are respectful and unstereotyped. The one deduction is stylistic: it renders as soft watercolour / modern landscape illustration rather than literal Edo *ukiyo-e* woodblock (no bold key-block outlines or woodgrain), and seasons are mixed (autumn maple vs spring blossom).",
      "confidence": "high",
      "expert_review": false,
      "panel_errors": "every landmark is genuinely Kyoto and Mt. Fuji is correctly absent (geography/factual accuracy is excellent); the only miss is stylistic \u2014 modern anime-watercolour rather than true Edo woodblock ukiyo-e \u2014 scored under prompt adherence, not accuracy",
      "user_feedback": "Reviewer: not the right style \u2014 it looks **very generic**, not the requested ukiyo-e woodblock (confirms the panel's style finding). The Kyoto landmarks/geography remain accurate; the miss is stylistic."
    }
  },
  "feature_findings": [
    {
      "feature": "Text-to-Video (breadth)",
      "finding": "The core capability is excellent. 10/10 domain-and-style prompts produced broadcast-grade clips with high prompt adherence. Process-level accuracy was strong; the recurring weak spot is precise quantitative layout (e.g. an accurate, ordered solar system)."
    },
    {
      "feature": "On-screen text",
      "finding": "The decisive finding for science. Large hero text and short tokens (single words, chemical formulae, 'E=mc\u00b2', 'n=1,3,5') render correctly; dense small/hand-written text becomes gibberish. Keep on-screen text to a few large labels, or add labels downstream."
    },
    {
      "feature": "Image-to-Video",
      "finding": "Best-in-class for our use case: a text-perfect labelled diagram from the image model is preserved faithfully while Omni animates it. This is the recommended pattern for any diagram that needs correct labels."
    },
    {
      "feature": "Reference-to-Video",
      "finding": "Multiple reference images are genuinely fused (setting + molecular overlay in F1; blueprint style in G4). Reliable for art-direction and style matching."
    },
    {
      "feature": "Multi-turn editing",
      "finding": "Style transfer (chalkboard, blueprint) and character swaps preserve composition and motion remarkably well via previous_interaction_id. Large structural content changes can be under-applied \u2014 iterate or be explicit."
    },
    {
      "feature": "Aspect ratio & duration",
      "finding": "16:9 and 9:16 both correct; 5s/10s honoured; default 10s. Edits inherit the parent duration. No mid-clip duration drift observed."
    },
    {
      "feature": "Resolution",
      "finding": "Fixed 720p; the `resolution` parameter is rejected. Fits a 'generate at 720p, upscale later' workflow."
    },
    {
      "feature": "Delivery & API",
      "finding": "Inline base64 and URI delivery both work. The Interactions API is clean and conversational; multi-turn chaining 'just works' once store=true. Error messaging is the weak point."
    },
    {
      "feature": "Extended accuracy probes (group I)",
      "finding": "Seven ad-hoc generations stress-tested factual reliability and sharpened the verdict. Strong, accurate results where the process is well-documented: canyon incision (I2) and the Kyoto landmarks (I7, with Mt. Fuji correctly excluded). But the failures are the decisive ones for science: physics is approximated not simulated \u2014 the cargo-ship impact (I4) badly under-scales water displacement with no cavity collapse/rebound jet; biology is unreliable \u2014 the succulent (I1) drew a veined broad-leaf seedling from a grossly oversized seed; and a requested novelty (I6) is physically impossible by design. Crucially, the accurate and the wrong clips are visually indistinguishable \u2014 confirming that no output can be trusted for factual use without expert review."
    }
  ],
  "eap_answers": [
    {
      "q": "Industry use cases \u2014 how is Omni working for education / scientific explanation?",
      "a": "**Mixed, and net not-ready for our use case.** The raw capability is exciting \u2014 it can produce visually lesson-ready material across whiteboard explainers (A1), microscopy (B2, E3), 3D molecular/cellular/anatomical animation (A2, A5, A6, D1) and 2D concept graphics (A4), and the **image-to-video diagram workaround** (E1, E2) is genuinely useful. But for *scientific* education specifically the blockers are decisive: unreliable on-screen scientific text, approximated (not simulated) physics, and unpredictable factual/biological errors that look authoritative. As of this EAP build we would **not** ship Omni-generated science explainers without expert review of every clip, which currently negates the time savings at scale."
    },
    {
      "q": "Veo workflows \u2014 does Omni support your existing Veo workflows? What are the gaps?",
      "a": "We are not production Veo users, so this is from a scientific-content standpoint rather than a migration audit. Omni's text-to-video quality and prompt adherence feel competitive with what we'd expect from Veo-class models, and the **conversational multi-turn editing is a meaningful step beyond a one-shot Veo call**. Gaps relative to a mature Veo pipeline: no resolution choice (720p only), no audio, and no per-shot duration beyond 10 s. The biggest workflow gap is reliable text rendering for labelled scientific content."
    },
    {
      "q": "Awareness & adoption \u2014 were you aware of the Interactions API before this program?",
      "a": "Only marginally. The team knew of the standard `generateContent` / Veo `generateVideos` long-poll APIs, but the **Interactions API with stateful `previous_interaction_id` multi-turn editing was new to us**. Once seen, the model is intuitive."
    },
    {
      "q": "Integration challenges \u2014 do you foresee trouble integrating the Interactions API?",
      "a": "Integration itself was easy \u2014 a single `interactions.create` call with `response_format`, plus `previous_interaction_id` for edits. The friction points we hit: (1) **opaque error messages** \u2014 a 400 `\"invalid argument\"` with no field detail (we traced one to certain prompt words like 'unzipping'/'splitting' being rejected); (2) **occasional transient 400s** on chained edits that succeed on retry; (3) `store=true` is required for chaining and the requirement is only surfaced via an error. All are surmountable with retries and prompt hygiene, but better diagnostics would save real debugging time."
    },
    {
      "q": "Ease of use \u2014 rate the developer experience for multi-turn video workflows.",
      "a": "**4/5.** The mental model (each turn references the prior interaction id) is clean and the helper pattern for extracting video/image content is simple. Multi-turn 'just works' once store=true. Points off only for error messaging and the undocumented content-word rejections."
    },
    {
      "q": "Resolution \u2014 generate all resolutions with Omni, or only 720p then upscale with a cheaper model?",
      "a": "For our use case, **720p-then-upscale is the right tradeoff** and we'd happily adopt it. Omni is currently 720p-only anyway (the `resolution` param is rejected), and generation is already the expensive/slow step at ~39 s/clip. Producing a fast 720p draft and upscaling selected keepers with a cheaper model would cut cost and let us iterate on content before paying for final resolution. We would, however, want a documented, supported upscale path and eventually a native 1080p option for hero shots."
    },
    {
      "q": "Unstructured prompting \u2014 how well does conversational prompting work vs other methods?",
      "a": "It works **well** \u2014 plain natural-language descriptions reliably produced the intended style, composition and subject without any special prompt syntax, and edit instructions ('redraw this as a chalkboard diagram', 'replace the teacher with a robot') were understood literally and applied while preserving the rest of the scene. Two lessons: (a) **substantive structural edits should be stated explicitly and may need a second turn** (the 'separate into a fork' edit was under-applied); (b) **avoid words the validator silently dislikes** \u2014 'unzipping' and 'splitting' triggered hard 400s, while 'separating into a Y-shaped fork' (semantically identical) worked. Example prompts we used are listed against each clip in the gallery."
    },
    {
      "q": "Use cases & comparisons \u2014 primary use cases, and how does Omni compare to Veo on quality and prompt adherence?",
      "a": "Our primary use cases are **scientific explainer videos, animated diagrams, microscopy/3D process animations, and short-form education content**. On raw visual quality and prompt adherence Omni is strong and subjectively competitive with Veo-class output, and its **multi-turn editing** and **image/reference-to-video** features map nicely onto an iterative content workflow. But for our *scientific* use cases that quality does not translate into readiness: text rendering, physics fidelity and factual consistency are not yet dependable enough to use unsupervised, so today Omni is a draft/B-roll tool for us, not a production explainer engine. Fixed 720p and no audio are secondary gaps by comparison."
    },
    {
      "q": "Text-free output \u2014 how important is guaranteed text-free (video-only) output, and why?",
      "a": "**Important, and our results make the case directly.** Because Omni renders dense text as convincing gibberish (A8, A10, G4), the safest pattern for science is to generate **clean, text-free footage** and overlay accurate, accessible labels/equations ourselves in post (or via the image-to-video diagram route). The C1 probe shows an explicit no-text instruction works, but a **guaranteed/flagged text-free mode** would let us trust the output without frame-by-frame QA \u2014 valuable for any factual or accessibility-sensitive application, not just ours."
    }
  ],
  "limitations": [
    "On-screen text: dense, small or hand-written text (crowded diagram labels, chalkboard equations, blueprint annotations) renders as realistic-looking gibberish. Hero text and short tokens are fine.",
    "No resolution control: response_format rejects 'resolution'; output is fixed at 1280\u00d7720. No audio track.",
    "Quantitative & morphological precision: artistic correctness beats biological/numeric correctness. The solar system isn't an accurate ordered/counted system; an ordered Mercury\u2192Neptune tour wasn't followed; and the succulent time-lapse (I1) depicted a veined broad-leaf seedling and a grossly oversized seed \u2014 both botanically wrong for a succulent. Photorealistic polish can mask subtle but important factual errors.",
    "Large structural content edits in multi-turn can be under-applied (the DNA 'separate into a fork' edit barely changed the geometry) even though style/character edits are excellent.",
    "Opaque input validation: certain innocuous science verbs ('unzipping', 'splitting') trigger a hard 400 'invalid argument' with no indication which token caused it.",
    "Occasional transient 400s on chained edits (succeed on retry); store=true is mandatory for chaining and only surfaced via an error.",
    "Duration capped at 10 s; multi-turn edits inherit the parent's duration and can't be independently lengthened."
  ],
  "recommendations": [
    "**Do not deploy Omni for scientific or educational video yet.** Until scientific-text, physics and consistency reliability improve, treat it as not production-ready for any factual teaching/publication use; restrict it to decorative B-roll or expert-vetted drafts.",
    "Treat every generated clip as **unverified** \u2014 correct and subtly-wrong outputs are visually indistinguishable, so a domain expert must review each one before any factual use; never auto-publish.",
    "For any clip that needs correct labels/equations, use the **image-to-video pattern**: render a text-perfect diagram with the image model, then animate it with Omni. This was our most reliable route to accurate, motion-rich educational content.",
    "Otherwise, **keep on-screen text to a few large labels or none at all**, and add precise labels/equations/captions in post. Treat any auto-generated dense text as decorative, not factual.",
    "Adopt a **720p draft \u2192 selective upscale** pipeline; it matches Omni's current output and keeps iteration cheap.",
    "Build a thin client wrapper with **automatic retries + prompt-hygiene** (avoid trigger words, force store=true for chains) to absorb the transient 400s and validation quirks we hit.",
    "Use **multi-turn editing for art-direction** (style transfer, character/branding swaps, reference matching) where it shines; for substantive content changes, state them explicitly and verify, possibly across two turns.",
    "Keep a **human-in-the-loop QA step** for any factual/accessibility-sensitive output until a guaranteed text-free (or reliable-text) mode exists.",
    "Feedback to the Gemini team: (1) clearer 400 error detail incl. offending field/token; (2) a documented upscale path and native 1080p for hero shots; (3) a flagged text-free output mode; (4) document the store=true chaining requirement and the content-word validation."
  ],
  "about_kdense": "**K-Dense** ([k-dense.ai](https://www.k-dense.ai)) builds **AI co-scientists** \u2014 AI systems that plan, run and interpret scientific work alongside human researchers. K-Dense is funded by **Google AI Futures Fund**.\n\nWe ran this evaluation as part of the Gemini Early Access Program to answer one concrete, commercial question: **can we integrate Omni Flash into K-Dense products** as an engine for scientific and educational explanation video? The report below is that go/no-go assessment \u2014 what works, what doesn't, and what would have to change before we put it in front of researchers and learners.\n\nBecause our domain is science, the bar here is **factual reliability**, not just visual quality \u2014 and every clip was generated **one-shot** (see the caveat below), so these findings describe the model's un-iterated floor, not its ceiling.",
  "one_shot": "**One-shot, by design.** Every clip in this report is a single first-attempt generation \u2014 one prompt, one result, with **no prompt iteration, no re-rolls, no seed selection and no cherry-picking** (the only retries were when an API error forced a reword). With prompt engineering, multi-turn refinement and a human in the loop, many of these results would very likely **improve substantially**, so the verdict below measures the *un-iterated floor* rather than the model's ceiling. The honest bottom line stands either way: **as things stand right now, evaluated one-shot, Omni Flash is not dependable for science.** That is a statement about today's out-of-the-box behaviour \u2014 not a claim that the gap can't be closed with iteration as the model improves."
}