K-Dense ran a 35-case benchmark of Google's Omni Flash EAP model to answer a narrow but important scientific question: can a frontier video model produce scientific explanation video that a researcher, educator, or technical team can trust without expert review? The answer from this run is no. The model is visually remarkable, often conceptually aligned, and fast enough to be a practical creative tool, but it is not reliable enough for unattended scientific or educational use.
We were able to run this assessment because K-Dense is part of the Omni Flash early access group. That access let us evaluate the model in a realistic scientific-media workflow before broad availability, with the goal of giving scientists and builders a practical read on where the system already helps and where it still needs expert supervision.
That negative result is more useful than a generic model ranking because the failures are diagnostic. Omni Flash frequently produces exactly the kind of clip a scientist wants at first glance: clean composition, smooth motion, sharp materials, plausible instrumentation, and confident scientific iconography. The problem is that wrong clips look just as authoritative as right ones, which makes visual polish a risk multiplier in scientific communication.
The Short Version
The benchmark used 35 one-shot prompts across biology, chemistry, physics, astronomy, geology, medicine, mathematics, and cultural geography. Thirty-four generations succeeded, one deliberate 1080p probe failed, and the successful clips were saved as 1280x720 MP4 video with no audio. Mean generation time was about 39 seconds per clip, with most clips around 8 to 10 seconds long.
The headline distribution is stark. Only 16 of 34 successful clips scored at least 4/5 for scientific accuracy after layered review. 4 were mixed, and 14 contained serious scientific errors. Visual quality and prompt adherence were consistently strong, but reliability for factual content was not.
| Outcome after review | Clips | Case IDs |
|---|---|---|
| Reliable, accuracy >=4/5 | 16 | A1, A4, A6, A7, B1, B3, C2, E1, E2, F1, G1, G3, G4, H1, I2, I7 |
| Mixed, accuracy 3/5 | 4 | A8, A9, C1, G2 |
| Serious errors, accuracy <=2/5 | 14 | A2, A3, A5, A10, B2, D1, E3, G5, G6, I1, I3, I4, I5, I6 |
| Failed probe | 1 | D2 |
The scientific conclusion is not "video models are useless for science." Several clips are genuinely strong: the water-cycle whiteboard, odd-harmonic Fourier series, methane combustion, black-hole accretion disk, photosynthesis labels, animated neuron diagram, Earth-interior convection, canyon erosion, and Kyoto ukiyo-e montage. The conclusion is narrower and more important: in this one-shot setting, Omni Flash is a powerful drafting and visualization system, not an autonomous scientific explainer.
Why This Benchmark Matters
Scientific video is a high-trust medium. A paper can be checked line by line, a chart can expose its axes and units, and a simulation can be tied to governing equations. Video, by contrast, is often consumed as a finished depiction of mechanism. If a clip shows DNA bases pairing, chromosomes separating, oxygenated blood moving through the heart, or water displacement after a ship impact, viewers infer that the mechanism is being represented correctly.
That makes scientific video different from generic creative generation. A beautiful mistake is worse than an obvious mistake because it passes through human intuition more easily. For research teams and educators, the benchmark therefore asks a practical deployment question: not whether the model can produce impressive examples, but whether it can produce dependable examples at scale.
What We Tested
The cases were organized into nine feature areas: broad text-to-video, aspect ratio and duration control, text rendering, resolution control, image-to-video, reference-to-video, multi-turn editing, URI delivery, and extended scientific/physical accuracy probes. Source and reference images were generated separately with google/gemini-3.1-flash-image, then supplied to Omni Flash where the case required image conditioning. The video model was accessed through Google's Gemini Interactions API in a synchronous one-shot workflow.
Every successful clip was reviewed against the same operational question: would this be safe to use as scientific or educational content without expert intervention? Scores covered accuracy, visual quality, prompt adherence, text legibility, motion coherence, and educational suitability. Because this benchmark used one-shot generation, it measures the un-iterated floor, not the best possible result after prompt tuning, regeneration, editing, and manual cleanup.
Scorecard
| Dimension | Score | Interpretation |
|---|---|---|
| Readiness for scientific/educational video | 1/5 | NOT READY for unattended use. Blocked by unreliable scientific text, non-rigorous physics, and unpredictable factual/consistency errors (below), and because every clip is a single one-shot generation (no iteration), this rates the un-iterated floor, not the model's ceiling. On this scale 1 = not safely usable unsupervised for factual content (it is not 'worthless', visual quality and prompt adherence both score 5); every clip needs expert review. Usable today only for decorative, non-factual B-roll, or as human-vetted drafts. Re-evaluate when the blockers are fixed. |
| Scientific text & labels | 2/5 | Hard blocker. Short 'hero' text is fine, but dense or hand-written labels, equations and annotations render as convincing gibberish (A8, A10, G4, G5). Authoritative-looking nonsense is disqualifying for teaching. |
| Physics & dynamics fidelity | 2/5 | Renders the look of a phenomenon, not its mechanics. The ship-ocean impact under-scales displacement with no cavity/rebound jet (I4); the Hummer paraglide is physically impossible (I6). Fine for vibe, wrong for mechanism. |
| Factual reliability & consistency | 2/5 | Correctness is unpredictable and correct-vs-wrong clips look identical: identical DNA bases shown pairing (A2), oxygenated blood routed to the wrong side (A5), protein folding not grounded in real biochemistry (D1), 4-base 'codons' (I3), veined succulent seedling (I1), mis-ordered solar system with mis-assigned rings (A3). After expert-panel + human-reviewer scrutiny, 14 of 34 clips have serious errors and only 16 are reliably accurate (fewer than half). No clip can be trusted without expert review. |
| Scientific accuracy (process-level, when it lands) | 3/5 | Often conceptually correct (canyon incision, Fourier, black-hole lensing, water cycle, methane combustion, photosynthesis labels), but 'often' is not 'reliably'. The adversarial panel downgraded mitosis 5->2, Krebs labels 3->1, volcano 4->2 and the solar system 3->2 once the facts were actually checked; the polish hides the misses. |
| Visual quality & realism | 5/5 | Consistently broadcast-grade across every style, which is exactly what makes the factual errors dangerous: they look authoritative. |
| Prompt adherence (style/composition) | 5/5 | Reliably produces the requested style, framing, subject and aspect ratio. |
| Image- / Reference-to-video | 4/5 | The one dependable route to correct on-screen labels: animate a text-perfect diagram from the image model. Strong, and the recommended mitigation, but a workaround, not readiness. |
| Latency, reliability & developer experience | 4/5 | Clean conversational Interactions API; ~39 s/clip. Points off for opaque 400s, content-word rejections and transient chained-edit failures. |
The split between aesthetics and reliability is the whole story. Visual quality and prompt adherence scored 5/5, which means the model usually understood the request and rendered it beautifully. Science readiness scored 1/5 because correctness did not follow reliably from beauty, and because several failure modes were hard to notice without subject-matter review.
Judging Methodology
The evaluation deliberately became stricter over time. A naive visual pass found obvious problems, but it also over-credited polished clips that were wrong in domain-specific ways. Mitosis is the clearest example: the clip first looked textbook-like, but later review found that the chromosomes moving to the poles were still paired sister chromatids rather than separated chromatids, which breaks the defining biology of anaphase.
The second layer used an adversarial AI-assisted review panel spanning molecular and cell biology, neuroscience and physiology, chemistry and biochemistry, physics and mathematics, astronomy and astrophysics, earth science, and botany/geography. Those reviewers inspected frame strips, built explicit factual rubrics, and tried to disprove each depiction rather than merely describing whether it looked plausible. This was a structured stress test, not peer-reviewed human specialist sign-off. The third layer was a human domain review over the full videos, which caught additional errors that still-frame review missed.
That escalation is central to the result. Each review layer surfaced more problems, not fewer. DNA base pairing, heart circulation, protein folding, ribosomal codons, mitosis, and everyday physics all produced failures that looked plausible until someone checked the specific claim being made on screen. The human pass was one reviewer, so the report should be read as a fast, transparent benchmark rather than final scientific adjudication.
We did our best to assess quality quickly and rigorously so the report could be useful while the model behavior was still fresh. If you notice a scientific, methodological, or media issue we missed, please email contact@k-dense.ai. We will review credible reports and update the blog as needed.
What Worked
The best clips succeeded when the prompt described a high-level process with limited dependence on dense text or fine mechanistic detail. A1's water-cycle whiteboard produced legible labels and a complete loop. A4's Fourier clip represented odd harmonics and Gibbs ringing well enough for explanation. A6's methane combustion was chemically coherent at the educational level, and B3's black-hole clip produced a convincing accretion disk with lensing-like visual behavior.
Image-to-video was the most important practical mitigation. E1 and E2 began with labeled source diagrams that were already text-perfect, and Omni Flash animated them while largely preserving the labels and layout. For science teams, that suggests a realistic workflow: generate or design a correct static diagram first, then use video generation to add motion, rather than asking the video model to invent both the diagram and the labels from scratch.






What Failed
The dominant blocker was scientific text. Short hero text and sparse labels often worked, but dense labels, equations, blueprint annotations, and chalkboard writing collapsed into authoritative-looking nonsense. In science education, this is not a cosmetic problem. A wrong label can turn a correct-looking animation into misinformation.
The second blocker was physics. The ship-impact and Hummer-paraglide probes showed motion that borrowed the look of physical phenomena without obeying the mechanics. This matters for any user who wants to depict forces, fluids, deformation, orbital motion, thermodynamics, or collision dynamics. The model can render phenomenology, but it should not be mistaken for a simulator.
The third blocker was factual consistency. The DNA helix looked cinematic but paired same-colored bases. The heart animation routed oxygenated blood to the wrong side. The ribosome used four-base "codons" rather than triplets. The succulent time-lapse grew the wrong kind of seedling. These are not obscure edge cases for a scientific audience. They are the basic facts viewers expect an explainer to get right.
Practical Guidance for Scientists
Use Omni Flash today for decorative B-roll, establishing shots, draft storyboards, stylistic exploration, and human-vetted science-flavored media where no viewer will treat the depiction as authoritative. Use it cautiously for education only when a domain expert reviews every clip and when factual labels are added or checked outside the video model.
For mechanism-heavy scientific content, the safer workflow is to separate correctness from cinematography. Start with verified diagrams, equations, and labels. Use image-to-video to animate those assets. Add final text overlays in post-production. Keep generated motion under expert review, especially for biology, anatomy, physics, chemistry, and anything that depicts a sequence of causal steps.
Full Video Gallery
Every successful clip from the benchmark is embedded below with its poster frame, four extracted keyframes, the original test prompt, and the layered review verdict. The failed D2 probe is included as a failure card because it is part of the measured API behavior.
Group A: Text-to-video breadth
This broad text-to-video group mixed excellent explanatory clips with several biologically or physically incorrect ones. The important result is not that the model sometimes failed. It is that failures appeared with the same visual authority as successes.




Test: Text-to-Video in Earth science. Style: Whiteboard / hand-drawn. Run: 33.2 s generation, 10.0 s clip.
Prompt: A whiteboard explainer animation of the water cycle. A hand draws and labels the sun, ocean, evaporation rising as vapor, clouds forming, precipitation as rain falling on mountains, and runoff flowing back to the sea, with arrows showing the continuous loop. Clean educational style.
Verdict: Reference-quality whiteboard explainer. A hand progressively draws the entire cycle and every label is correctly spelled and legible, SUN, OCEAN, EVAPORATION, CONDENSATION, PRECIPITATION, RUNOFF, with a complete, scientifically correct loop of arrows. Directly usable in a lesson as-is.
Panel check: none material
Reviewer note: Good.




Test: Text-to-Video in Molecular biology. Style: Photorealistic 3D. Run: 38.2 s generation, 10.0 s clip.
Prompt: A hyper-realistic 3D animation of a DNA double helix slowly rotating, then gently unwinding to reveal the two antiparallel sugar-phosphate backbones and the colored base pairs (A-T, G-C) connecting them. Cinematic depth of field, soft blue scientific lighting.
Verdict: Gorgeous photorealistic 3D helix with a believable sugar-phosphate backbone, but, as the reviewer flagged, the colour-coded base pairing is wrong: identical (same-colour) bases are shown bonded to each other, violating complementary A-T / G-C pairing. Visually excellent, chemically incorrect on the one rule that matters most. Correctly produced no text.
Panel check: base-pair rungs not cleanly two-base complementary in close-ups; helix handedness unverifiable from stills
Reviewer note: Reviewer caught a fundamental error: the colour-coding shows identical bases pairing with each other (e.g. purple bonded to purple). That's scientifically wrong, DNA base pairing is complementary (A-T and G-C), so a base never pairs with one of the same type. The helix looks beautiful but depicts the central base-pairing rule...




Test: Text-to-Video in Astronomy. Style: 3D animation. Run: 34.4 s generation, 10.0 s clip.
Prompt: A clear 3D educational animation of the solar system: the Sun at the center with the eight planets orbiting along their elliptical paths at different speeds, inner rocky planets faster than the outer gas giants. Saturn shows its rings. Star-field background, wide view.
Verdict: Spectacular rendering, banded Jupiter, ringed Saturn, recognisable Earth, but it is an artistic 'solar-system poster' rather than accurate orbital mechanics: planet count, spacing and order are not reliable. Use it for wonder, not for teaching orbital facts.
Panel check: prominent rings mis-assigned to multiple non-Saturn planets; non-heliocentric order; planet count/positions inconsistent across frames; an Earth-like planet sits implausibly close to the Sun
Reviewer note: Reviewer noticed a temporal-consistency glitch: one of the planets nearest the Sun disappears during the first part of the clip. On top of the panel's findings (wrong planet count/order, rings on the wrong planets), the scene isn't even stable frame-to-frame.




Test: Text-to-Video in Mathematics. Style: 2D motion graphics. Run: 32.9 s generation, 10.0 s clip.
Prompt: A clean 2D motion-graphics animation showing how adding successive sine waves of increasing frequency (a Fourier series) progressively sums into a square wave. Individual sine components on the left, the growing sum on the right, minimalist flat design, grid background.
Verdict: Mathematically correct and a top result. Sine components sum into a square wave with visible Gibbs ringing, and it even labels 'n = 1, 3, 5...', the correct odd-harmonic series. Legible, correct on-screen math.
Panel check: none material, correct odd harmonics (n=1,3,5...), ~1/n amplitudes, Gibbs ringing
Reviewer note: Looks visually correct and conveys the message. Not 100% sure the exact summed waveform is numerically correct, but it doesn't need to be for this explanatory purpose.




Test: Text-to-Video in Medicine / anatomy. Style: Anatomical realistic. Run: 35.6 s generation, 10.0 s clip.
Prompt: A realistic medical animation of a human heart beating, showing the four chambers and the flow of deoxygenated (blue) blood into the right side and oxygenated (red) blood out the left, valves opening and closing in rhythm. Anatomical illustration style on a dark background.
Verdict: Anatomically detailed cross-section (four chambers, valves, chordae) with colour-coded directional flow, but the reviewer flagged that the colour-coded routing mixes the two circulations: oxygenated (red) blood correctly belongs on the left, yet red also appears on the right side (where deoxygenated blue blood should be), so the systemic and pulmonary flows are not kept separate. Looks clinical, but not reliable for teaching how blood actually moves through the heart.
Panel check: chamber/valve anatomy and the blue=deoxygenated / red=oxygenated convention look correct from stills; the full-clip domain reviewer caught the two circulations being mixed (red appears on the right side), score reflects the reviewer
Reviewer note: Reviewer (domain): the blood flow is anatomically incorrect, the model fails to keep the two circulations separate. Oxygenated (red) blood correctly belongs on the left, but here red also appears on the right side (where deoxygenated blue blood should be), so the colour-coded routing mixes the systemic and pulmonary...




Test: Text-to-Video in Chemistry. Style: Molecular 3D. Run: 32.0 s generation, 10.0 s clip.
Prompt: A 3D molecular animation of methane combustion: a methane molecule (CH4, one carbon with four hydrogens) and two oxygen molecules (O2) collide and react, rearranging into one carbon dioxide molecule (CO2) and two water molecules (H2O), releasing energy as a glow. Ball-and-stick models on black.
Verdict: Accurate chemistry. Reactants (CH4 + 2 O2) shown with correct atom colours and labels, a combustion flash, then products (CO2 + 2 H2O). Stoichiometry and molecular structures are right, excellent.
Panel check: none material, correct geometries (tetrahedral CH4, linear CO2, bent H2O) and conserved CH4+2O2->CO2+2H2O
Reviewer note: Checked the reaction: CH4 + 2 O2 -> CO2 + 2 H2O is balanced and atoms are conserved, and the molecular geometries are right (tetrahedral methane, two O2, linear CO2, bent water). Accurate.




Test: Text-to-Video in Astrophysics. Style: Cinematic photorealistic. Run: 38.5 s generation, 10.0 s clip.
Prompt: A cinematic photorealistic animation of a massive star at the end of its life: it swells into a red supergiant, its core collapses, and it detonates in a brilliant supernova explosion, blasting glowing shells of gas outward into space. Dramatic, high dynamic range, deep space background.
Verdict: Cinematic and dramatic: supergiant -> core collapse -> explosion -> expanding remnant, a roughly correct sequence. Spectacular for intros/hooks; artistic rather than data-accurate.
Panel check: red-supergiant swelling phase under-represented (stays orange/yellow); ejecta morphology artistically floral
Reviewer note: Visually appealing, but reviewer is unsure how to judge the scientific accuracy of this one (stellar evolution is hard to assess from a short clip).




Test: Text-to-Video in Geology. Style: Cutaway diagram. Run: 36.3 s generation, 10.0 s clip.
Prompt: An educational cutaway cross-section animation of a stratovolcano erupting: magma rising from a deep chamber through the central conduit, then erupting from the summit with ash plume and lava flows down the slopes. Layered rock strata visible in the cutaway. Clean infographic style.
Verdict: Excellent cutaway (magma chamber, conduit, strata, ash plume, lava). The two large labels ('Magma Chamber', 'Conduit') render correctly, but the dense second-pass labels degrade to gibberish ('Mora Claw', 'Pyno Blath', 'Angon Chamber'). Great visuals, untrustworthy labels.
Panel check: multiple gibberish labels ('Pyro Black','Mora Glse','Aspra Chamber'); duplicate 'Magma Chamber' labels both pointing at the conduit, not a basal chamber
Reviewer note: Reviewer: good enough for what it's trying to show, the structural cutaway (magma chamber, conduit, eruption, layered strata) conveys the process adequately. (The garbled/duplicate text labels the panel flagged remain a caveat for any labelled use.)




Test: Text-to-Video in Neuroscience. Style: Stylized 3D. Run: 36.4 s generation, 10.0 s clip.
Prompt: A stylized 3D animation of a single neuron firing: an electrical impulse (action potential) travels as a bright pulse down the axon from the cell body toward the synaptic terminals, where neurotransmitters are released into the synapse. Glowing blue-and-gold scientific aesthetic.
Verdict: Beautiful stylised 3D action potential travelling soma -> axon -> synaptic terminal with vesicle release. Direction and sequence are correct.
Panel check: neurotransmitter 'vesicles' oversized and released diffusely without a defined synaptic cleft (stylised, not literal)
Reviewer note: Reviewer: good as a visualization but not really scientifically accurate, acceptable for this conceptual/illustrative use case, not for rigorous teaching. (Direction soma->axon->terminal is right; the synapse/vesicle depiction is loose.)




Test: Text-to-Video in Physics. Style: Realistic presenter. Run: 36.8 s generation, 10.0 s clip.
Prompt: A realistic video of a friendly scientist standing at a large chalkboard, gesturing as they explain, with the equation E=mc^2 and supporting diagrams written in chalk behind them. Warm classroom lighting, documentary style.
Verdict: Photorealistic, consistent presenter. The hero equation 'E=mc^2' is perfect, but every surrounding chalk equation is authentic-looking gibberish ('Anoul phicts', 'A energy farstie'). The realism is a double-edged sword: fake equations look convincing enough to mislead.
Panel check: every chalkboard element except 'E=mc^2' is pseudo-physics word-salad / malformed equations (e.g. 'Aroul phcts', 'f=m^2/2^2')
Reviewer note: Reviewer confirms the panel: the main formula (E=mc^2) is correct, but everything else on the board is made-up, inconsistent gibberish.
Group B: Aspect ratio and duration control
These cases isolate a narrower control surface of the model. The same pattern persists: composition and style are usually excellent, while factual reliability depends strongly on the scientific content being depicted.




Test: Portrait 9:16 in Climate science. Style: Explainer / motion graphics. Run: 38.1 s generation, 10.0 s clip.
Prompt: A vertical short-form explainer animation of the greenhouse effect: sunlight reaches Earth's surface, infrared heat radiates upward, and greenhouse gas molecules in the atmosphere trap and re-emit some of it back down, warming the planet. Bright, social-media-friendly motion graphics.
Verdict: Correct vertical 9:16. Engaging kawaii motion-graphics; greenhouse process correct; gas labels 'CO2'/'CH4' and titles ('THE GREENHOUSE EFFECT', 'TRAPPED!') legible. Ideal for education shorts/Reels.
Panel check: greenhouse gases drawn as a discrete outer shell rather than well-mixed; cartoon 'trapping' slightly oversimplifies re-emission
Reviewer note: Looks good.




Test: Duration 5s in Cell biology. Style: Microscopy 3D. Run: 31.2 s generation, 5.0 s clip.
Prompt: A microscope-style 3D animation of a single cell undergoing mitosis: chromosomes condense, align at the center, then separate to opposite poles as the cell pinches into two daughter cells. Realistic cellular detail.
Verdict: Honoured the 5 s request and the stage order is right (condensation -> metaphase plate -> poleward movement -> cytokinesis) with lovely microscopy realism, but the expert panel caught a defining error the first pass missed: the bodies pulled to the poles in 'anaphase' are still X-shaped paired sister chromatids, not the separated single chromatids that anaphase is defined by. It looks textbook-correct and is biologically wrong, the canonical generative-video mitosis error.
Panel check: anaphase shows X-shaped PAIRED sister chromatids moving to both poles instead of separated single chromatids, the defining event is misrepresented
Reviewer note: Reviewer: many scientific issues with this one, beyond the anaphase error the panel flagged (chromosomes moving to the poles as paired X-shaped chromatids rather than separated single chromatids). Not trustworthy as a depiction of mitosis.




Test: Duration 10s in Astrophysics. Style: Cinematic 3D. Run: 39.6 s generation, 10.0 s clip.
Prompt: A long cinematic 10-second animation of a supermassive black hole: a glowing orange accretion disk of superheated gas swirls around the black event horizon, with gravitational lensing bending the light of the disk and background stars around the sphere. Awe-inspiring, scientifically inspired.
Verdict: Honoured the 10 s request. Interstellar-grade black hole: accretion disk, bright photon ring, and gravitational lensing showing the disk's far side above and below the shadow, physically correct. Outstanding.
Panel check: none material, correct shadow, photon ring and over-the-top gravitational lensing of the far disk
Reviewer note: Reviewer defers, not their area of expertise. Score relies on the expert panel's web-verified assessment (correct event-horizon shadow, photon ring and gravitational lensing of the far disk; NASA-sourced).
Group C: Text-rendering probes
These cases isolate a narrower control surface of the model. The same pattern persists: composition and style are usually excellent, while factual reliability depends strongly on the scientific content being depicted.




Test: Text-free probe in Physics. Style: Abstract 3D, NO TEXT. Run: 40.8 s generation, 10.0 s clip.
Prompt: A clean 3D visualization of an electromagnetic wave propagating through space: an oscillating red electric field and a perpendicular blue magnetic field, both sinusoidal, moving together along the propagation axis. Absolutely NO text, NO letters, NO labels, NO numbers anywhere in the image, purely visual.
Verdict: KEY PROBE, explicitly asked for NO text and the model produced none. Text-free output is reliably achievable by prompting. The physics is only loosely right, though: E/B perpendicularity is unclear and one frame becomes chaotic.
Panel check: E and B rendered coplanar (mirror images) rather than mutually perpendicular (EBk); one frame collapses to a non-physical localized spike
Reviewer note: Reviewer couldn't tell what the clip is actually showing, it doesn't read as a clear EM-wave visualization. Reinforces the panel's finding that the E/B field geometry is wrong/ambiguous (fields not mutually perpendicular).




Test: Text-rendering probe in Botany. Style: Infographic with labels. Run: 33.1 s generation, 10.0 s clip.
Prompt: An animated educational infographic explaining photosynthesis, with clearly written labels: sunlight, CO2 entering the leaf, H2O from the roots, and the outputs glucose and O2, with the chloroplast highlighted. Bright textbook infographic style with readable labels.
Verdict: Best 'many labels' result. sunlight, CO2, H2O-from-roots, chloroplast, O2 and glucose are all present and mostly legible/correct, short words and chemical formulae render far better than sentences. Genuinely teachable.
Panel check: none material, correct inputs/outputs, organelle and label spelling
Reviewer note: Looks good.
Group D: Resolution probe
These cases isolate a narrower control surface of the model. The same pattern persists: composition and style are usually excellent, while factual reliability depends strongly on the scientific content being depicted.




Test: Default resolution in Molecular biology. Style: 3D scientific. Run: 36.1 s generation, 10.0 s clip.
Prompt: A 3D scientific animation of a protein folding: a linear chain of amino acids twists and collapses into a compact three-dimensional folded structure with alpha-helices and beta-sheets visible. Smooth ribbon representation, soft lighting.
Verdict: A convincing ribbon animation (recognizable alpha-helices and beta-sheet arrows collapsing into a compact fold; native 1280x720), but the reviewer (biochemistry) flags it as not grounded in real biochemistry: the folding is arbitrary morphing, not an actual sequence->structure pathway. It looks like protein folding without being it, a clear case of the panel over-crediting visual motifs that a domain expert rejects.
Panel check: panel found the ribbon motifs correct from stills (extended chain -> alpha-helices + beta-sheet arrows -> compact fold); the full-clip domain reviewer (biochemistry) judged the folding pathway non-physical, score reflects the reviewer
Reviewer note: Reviewer (biochemistry): many issues, not grounded in any real biochemistry. The 'folding' is arbitrary morphing rather than a real sequence->structure pathway; recognizable alpha-helix / beta-sheet motifs appear but don't represent an actual folding process. The panel over-credited the ribbon look; a domain reviewer does not.
No video was produced. This deliberate 1080p request tested resolution control and failed because the API rejected the resolution parameter. The failure is included because it is part of the benchmark's developer-experience result.
Error: BadRequestError: Error code: 400 - {'error': {'message': "Unknown parameter 'resolution' at 'response_format'.", 'code': 'invalid_request'}}
Group E: Image-to-video diagrams
The image-to-video cases were the clearest mitigation. When a text-perfect diagram was supplied up front, Omni Flash could preserve the diagram and add motion without inventing as much on-screen text.




Test: Image-to-Video in Neuroscience. Style: Animated diagram. Run: 37.5 s generation, 10.0 s clip.
Prompt: Animate this neuron diagram: send a bright electrical pulse traveling from the cell body down the axon to the synaptic terminals, keeping the labels and layout intact.
Verdict: Image-to-video standout and a strategic finding. The OpenRouter labelled diagram is preserved pixel-faithfully, all nine labels stay crisp, while a bright pulse animates soma -> axon -> synapse. This is the workaround for Omni's native text weakness.
Panel check: none material, all labels correctly spelled/placed; correct soma->axon->terminal saltatory propagation
Reviewer note: Fine as a high-level teaching tool.




Test: Image-to-Video in Geology. Style: Animated cross-section. Run: 34.9 s generation, 10.0 s clip.
Prompt: Animate this cross-section of the Earth: show slow convection currents circulating in the mantle and subtle movement, while keeping the layered structure and labels intact.
Verdict: Labelled Earth cross-section preserved (CRUST / MANTLE / OUTER CORE / INNER CORE plus depth values) with subtle interior motion. Convection is gentle, but diagram fidelity is excellent.
Panel check: none material, correct layer order, depth values, phase states and mantle (not inner-core) convection
Reviewer note: Reviewer thinks it's good but isn't an expert in this area; score relies on the panel's web-verified assessment (correct layer order, depth values and mantle convection).




Test: Image-to-Video in Cell biology. Style: Microscopy. Run: 42.5 s generation, 10.0 s clip.
Prompt: Animate this microscope view: show the cell beginning to divide, the chromosomes separating toward opposite ends in a realistic microscopy style.
Verdict: Convincing light-microscopy styling and the right stage order, but the panel flagged the same anaphase problem as B2 (poleward bodies appear to remain paired, not resolved into single chromatids) plus a chromosome count that drifts between frames, implausible for one dividing cell. Needs expert review of the segregation step.
Panel check: likely unresolved sister chromatids at anaphase; chromosome count drifts between frames
Reviewer note: Reviewer confirms an inconsistency with the chromosomes (count/appearance drifts between frames), the same class of error as B2; not a reliable depiction of cell division.
Group F: Reference-to-video
These cases isolate a narrower control surface of the model. The same pattern persists: composition and style are usually excellent, while factual reliability depends strongly on the scientific content being depicted.




Test: Reference-to-Video in Chemistry. Style: Realistic lab. Run: 33.7 s generation, 10.0 s clip.
Prompt: Using these reference images, create a realistic lab video: pour the liquid from the beaker so a chemical reaction occurs, with bubbling and a color change, in the style and setting suggested by the references.
Verdict: Used BOTH reference images: a realistic lab pour of blue solution with bubbling and a colour change, plus an overlaid copper-sulfate complex matching the molecule reference. Beaker graduations are partly legible. Strong reference-to-video.
Panel check: octahedral Cu and tetrahedral sulfate overlays and blue colour are correct, but the vigorous bubbling is not chemically justified by simply mixing copper-sulfate solution
Reviewer note: Reviewer: scientifically fine, but the latter part isn't that interesting and looks like cheap stock footage (an aesthetic/production critique, not an accuracy issue).
Group G: Multi-turn editing chains
The multi-turn chains show a product strength: style and character edits often preserve scene continuity. They also show why science users need verification, because structural edits can be visually persuasive while under-applying the requested scientific change.




Test: Chain base (T2V) in Molecular biology. Style: Photorealistic 3D. Run: 32.5 s generation, 8.0 s clip.
Prompt: A hyper-realistic 3D animation of a DNA double helix rotating slowly in the center of the frame, glowing softly against a dark blue background.
Verdict: Clean photorealistic chain anchor, a rotating helix on a bokeh field (store=true so later edits can reference it). The panel docks it for a subtle but real error: the helix reads left-handed in several frames, whereas biological B-DNA is right-handed, a flaw inherited by the whole G-chain (G2-G4).
Panel check: helix appears LEFT-handed (B-DNA is right-handed); major/minor grooves not differentiated
Reviewer note: Fine.




Test: Multi-turn edit (content) in Molecular biology. Style: Photorealistic 3D. Run: 53.5 s generation, 8.0 s clip.
Prompt: Show the two strands of the double helix gradually separating down the middle into a Y-shaped fork, while keeping the same realistic 3D style.
Verdict: Content edit preserved the style perfectly but under-applied the requested structural change, the strands barely separate. Multi-turn keeps consistency strongly; large structural edits can be muted. (Note: the original 'unzip/split' wording was rejected by input validation and had to be reworded, see limitations.)
Panel check: the requested unzip/replication-fork never forms, the helix stays fully zipped through all frames; also inherits the left-handed twist
Reviewer note: Fine.




Test: Multi-turn edit (style) in Molecular biology. Style: Chalkboard restyle. Run: 55.5 s generation, 8.0 s clip.
Prompt: Redraw this exact animation as a hand-drawn chalkboard diagram, white chalk lines on a dark green chalkboard, keeping the same DNA unzipping motion.
Verdict: Excellent style transfer: the identical helix and motion redrawn as white chalk on a green board, composition fully preserved across the turn.
Panel check: DNA structure preserved through the chalkboard restyle, but inherits the likely left-handed twist of the base clip
Reviewer note: Fine.




Test: Edit with reference image in Molecular biology. Style: Reference restyle. Run: 61.2 s generation, 8.0 s clip.
Prompt: Restyle this video to match the look of the reference image, a technical blueprint aesthetic, while keeping the same DNA motion.
Verdict: Reference-guided restyle nailed the blueprint aesthetic (grid, title block, dimension lines) over the same DNA motion. The decorative drafting annotations are gibberish, as expected for dense text.
Panel check: DNA structure preserved, but inherits the left-handed twist; decorative blueprint annotations are gibberish
Reviewer note: Fine.




Test: Chain base (presenter) in Biochemistry. Style: Whiteboard presenter. Run: 35.6 s generation, 8.0 s clip.
Prompt: A friendly teacher standing beside a whiteboard, pointing to a simple diagram of the Krebs cycle (a circular pathway) as they explain it. Bright classroom, documentary style.
Verdict: Photorealistic, consistent presenter beside a circular Krebs-cycle diagram. 'Krebs Cycle', 'ATP' and 'Citric Acid' are correct; the remaining hand-written labels are gibberish.
Panel check: node labels are fabricated words ('Promiustat','Evention','CH2COrH2'); the real cycle intermediates (isocitrate, -ketoglutarate, succinate, malate, oxaloacetate...) and carriers (NADH/FADH2/CO2) are all absent
Reviewer note: Reviewer: pretty bad text consistency, a node label flips between 'ATP' and 'Citric Acid' across frames, then back to ATP, and the cycle itself is not correct. Confirms and extends the panel's finding (fabricated labels, missing real intermediates).




Test: Multi-turn edit (character) in Biochemistry. Style: Whiteboard presenter. Run: 59.6 s generation, 8.0 s clip.
Prompt: Replace the human teacher with a friendly cartoon robot teacher, keeping the same whiteboard, Krebs cycle diagram, and pointing gestures.
Verdict: Character-edit standout: the human teacher becomes a consistent cartoon robot while the classroom, whiteboard diagram and pointing gestures are all preserved across the clip.
Panel check: identical fabricated/garbled Krebs labels to G5; swapping the presenter to a robot changes nothing about the wrong diagram
Reviewer note: Reviewer: same issues as G5, fabricated/garbled Krebs labels, the same text-consistency flipping, and an incorrect cycle (the diagram is unchanged from G5; only the presenter was swapped).
Group H: URI delivery
These cases isolate a narrower control surface of the model. The same pattern persists: composition and style are usually excellent, while factual reliability depends strongly on the scientific content being depicted.




Test: URI delivery (10s) in Astronomy. Style: Cinematic 3D. Run: 35.0 s generation, 10.0 s clip.
Prompt: A cinematic 10-second flythrough tour of the solar system, gliding past the Sun and each planet in turn from Mercury outward to Neptune, with Saturn's rings highlighted. Realistic space visuals.
Verdict: Confirms delivery:'uri' (downloaded via the Files API rather than inline base64). A gorgeous cinematic flythrough (Jupiter, a Mars-like surface, ringed Saturn, Earth), but an artistic montage, not the ordered Mercury->Neptune tour requested.
Panel check: Mars omitted from the Mercury->Neptune order; a Saturn-like ringed giant appears in two consecutive frames; Neptune not clearly rendered
Reviewer note: Fine.
Group I: Extended scientific and physical accuracy probes
These cases isolate a narrower control surface of the model. The same pattern persists: composition and style are usually excellent, while factual reliability depends strongly on the scientific content being depicted.




Test: Text-to-Video (time-lapse) in Botany. Style: Photorealistic time-lapse. Run: 35.1 s generation, 10.0 s clip.
Prompt: A time-lapse video of a succulent plant growing from a tiny seed into a mature rosette. A seed in dark soil sprouts, a small green shoot emerges and unfurls its first fleshy leaves, then the plant steadily grows larger, leaves thickening and multiplying into a full symmetrical succulent rosette. Smooth accelerated time-lapse,...
Verdict: Visually a polished, smooth seed-to-rosette time-lapse, but not scientifically accurate (reviewer feedback). The early seedling is shown as a broad-leafed plant with prominent leaf veins, wrong for a succulent, whose leaves are thick, fleshy and smooth without net venation. The seed is far too large: real succulent seeds are essentially dust-fine. The mature rosette looks convincingly succulent, but the germination/seedling morphology is botanically incorrect. A textbook example of the recurring pattern in this...
Panel check: oversized visible 'seed' (succulent seeds are dust-fine); broad, thin, prominently net-veined non-fleshy seedling leaves (wrong for a succulent, whose cotyledons and early leaves are small, plump and fleshy with inconspicuous venation; rosette succulents are themselves dicots, so the tell is the leaf texture/venation and the...
Reviewer note: Reviewer (reconfirm): inaccurate, the seedling has leaf veins and is not a succulent at the start; many issues. The seed and seedling morphology are wrong for a succulent (succulent leaves are fleshy and smooth, seeds dust-fine).




Test: Text-to-Video (time-lapse) in Geology. Style: Photorealistic cross-section. Run: 37.5 s generation, 10.0 s clip.
Prompt: A scientifically accurate time-lapse of canyon formation by river erosion, shown as a geological cross-section over millions of years. Start with a flat plateau of horizontal sedimentary rock layers (visible strata in tan, red and grey). A river begins flowing across the surface, then slowly cuts downward, incising a narrow...
Verdict: Geologically sound. The cutaway block-diagram correctly shows fluvial canyon formation: a flat plateau of horizontal sedimentary strata (sandstone/mudstone/limestone) -> a river that incises downward through successive layers -> progressive deepening with rockfall and mass wasting widening the walls -> a mature canyon with exposed horizontal strata and an incised meandering river at the base (a real phenomenon, as at the Grand Canyon / Goosenecks). The process, the layer-by-layer downcutting and the...
Panel check: none material, correct fluvial incision through strata, mass-wasting widening, meandering base channel
Reviewer note: Fine.




Test: Text-to-Video in Molecular biology. Style: 3D biovisualization. Run: 34.1 s generation, 10.0 s clip.
Prompt: A comprehensive, scientifically accurate 3D molecular animation of protein synthesis (mRNA translation) inside a cell. A ribosome of a large and small subunit clamps around an mRNA strand read three bases (a codon) at a time. Cloverleaf tRNAs carrying an amino acid dock so their anticodon pairs with the codon; each amino acid...
Verdict: Comprehensive and accurate at the conceptual/structural level for an unfilmable nanoscale process. Correctly depicts a two-subunit ribosome clamped on an mRNA strand with nucleotide bases, a cloverleaf tRNA delivering an amino acid, peptide-chain elongation, spent-tRNA release and the polypeptide emerging from the exit region. Simplifications a specialist would flag: the literal codon-anticodon base-pairing isn't shown register-accurate, the A/P/E sites aren't differentiated, and tRNA reads as a 2D cloverleaf...
Panel check: subunit arrangement, tRNA shape and emerging peptide are correct; A/P/E-site usage and codon-anticodon register can't be confirmed from stills
Reviewer note: Reviewer (domain) caught a core error: a codon is a triplet, 3 nucleotides (read three bases at a time, coding one amino acid), but the clip shows groups of 4. Highly inaccurate. The panel had flagged codon register as 'unverifiable from stills'; the reviewer resolved it as wrong.




Test: Text-to-Video in Physics (fluid dynamics). Style: Photorealistic slow-motion. Run: 34.5 s generation, 10.0 s clip.
Prompt: A photorealistic cinematic shot of a massive loaded container ship falling from the sky and crashing into a calm ocean, intended to show physically accurate water displacement: a colossal radial splash crown and vertical jets on impact, a deep air cavity punched into the sea, a huge volume of water shoved outward and upward...
Verdict: Qualitatively plausible, not rigorously accurate. It gets the splash crown, outward-radiating displacement waves, water sheeting off, and buoyant settling, but the physics is wrong where it counts: displacement is badly under-scaled for a ~200kt hull, there is no air-cavity collapse or central Worthington rebound jet, the dynamics look stylised and 'hovering' rather than impulsive, and the draft is too shallow. Confirms the study's core finding that Omni produces artistic plausibility, not CFD-grade...
Panel check: displacement & splash energy grossly under-scaled for a laden ship; settled draft floats far too high; no Worthington rebound jet; rigid-body structural failure ignored
Reviewer note: Reviewer: the physics is pretty bad, on impact the containers should be flung off the ship and fall, but they stay put (cargo inertia ignored), on top of the panel's under-scaled displacement / no cavity-collapse findings. Acceptable only if you care purely about the water effect; depends on the use case.




Test: Text-to-Video in Physics (thermodynamics). Style: Photorealistic food. Run: 40.3 s generation, 10.0 s clip.
Prompt: A realistic close-up video of scoops of vanilla ice cream in an air fryer basket. The running air fryer's warm air melts the ice cream: it softens, slumps and collapses, glossy creamy puddles spreading and dripping through the mesh of the basket, with droplets and condensation. Photorealistic, appetizing food-photography style,...
Verdict: Physically believable everyday thermodynamics. Heat-driven melting is depicted correctly: the scoops soften, slump and collapse from solid into a glossy viscous liquid that drips through the mesh basket (correct, an air-fryer basket would let melt drain to the drawer) and pools below, with light bubbling that is plausible under strong convective heat. Minor omissions: over real time the puddle would likely brown/scorch and the fat/water would separate, which isn't shown. As a depiction of melting, it's accurate.
Panel check: heat-driven melting and drip-through-mesh are believable; late-stage surface frothing somewhat overstated for a pure dairy melt
Reviewer note: Reviewer: pretty bad, the appliance looks like a microwave, not an air fryer, and there's ice cream inexplicably dripping from the top. The panel rated the melt from stills; the full clip is wrong on the setting and has a nonsensical artifact.




Test: Text-to-Video in Physics (novelty). Style: Photorealistic action. Run: 34.9 s generation, 10.0 s clip.
Prompt: A realistic cinematic novelty video of a black Hummer SUV driving off the edge of a tall desert cliff; a large colorful paragliding parachute deploys from the roof, the Hummer glides gently down beneath the canopy along the cliff face and lands softly and safely on its wheels at the base of the canyon, kicking up dust. Everyone...
Verdict: Deliberate novelty, physically implausible by design, and correctly flagged as such. Execution is flawless (drive-off -> roof chute deploys -> controlled glide -> soft wheels-down landing, nobody hurt). But the premise violates physics: a ~3-tonne Hummer has an enormous wing loading, so no roof-stowed paraglider canopy could realistically arrest it to a gentle, controlled, safe touchdown, the descent shown is far too slow and stable for the mass. Scored low on accuracy as expected for a requested novelty; it is...
Panel check: intentional novelty, but physically impossible, a ~3 t SUV is ~12-15x any real paraglider canopy's capacity and the canopy shown is far too small for the lift; gentle landing unachievable
Reviewer note: It's fine, accepted as an intentional novelty (the low accuracy score reflects physical plausibility, which the stunt deliberately ignores).




Test: Text-to-Video in Geography / culture. Style: Ukiyo-e + watercolor. Run: 39.8 s generation, 10.0 s clip.
Prompt: A serene tourism montage of Kyoto in a traditional Japanese ukiyo-e woodblock style blended with soft watercolor, drifting through its most famous landmarks: the golden Kinkaku-ji pavilion mirrored in its pond; the vermilion torii-gate tunnel at Fushimi Inari; the Arashiyama bamboo grove; the stilted wooden Kiyomizu-dera temple...
Verdict: Geographically and culturally accurate. All five depicted sites are genuinely Kyoto and individually correct, Kinkaku-ji reflected in its pond, the Fushimi Inari torii tunnel, the Arashiyama bamboo grove, the stilted Kiyomizu-dera over a maple hillside, and the Yasaka Pagoda over Gion's machiya streets with a kimono figure under sakura. Crucially it correctly omitted Mount Fuji (a frequent Kyoto error, Fuji is ~270 km away in a straight line, far out of sight). Cultural elements are respectful and unstereotyped....
Panel check: every landmark is genuinely Kyoto and Mt. Fuji is correctly absent (geography/factual accuracy is excellent); the only miss is stylistic, modern anime-watercolour rather than true Edo woodblock ukiyo-e, scored under prompt adherence, not accuracy
Reviewer note: Reviewer: not the right style, it looks very generic, not the requested ukiyo-e woodblock (confirms the panel's style finding). The Kyoto landmarks/geography remain accurate; the miss is stylistic.
Reproducibility Artifacts
The post directory includes the copied MP4s, keyframes, source/reference images, and two provenance files from the benchmark run: metadata.json and assessments.json. The metadata file records prompts, generation times, delivery mode, duration, and source-image records. The assessment file records scorecards, panel findings, reviewer feedback, and the final per-clip verdicts used in this article.
Bottom Line
Omni Flash is already impressive as a scientific media drafting system. It can produce clips that look expensive, fluent, and on-prompt in under a minute. That is a real capability, and in human-supervised workflows it can reduce the cost of exploring scientific visuals.
It is not yet a trustworthy unattended scientific video generator. The model's strongest surface properties, visual polish and confident composition, make its factual errors harder to catch. Until text rendering, physical dynamics, and factual consistency improve, the right deployment posture is expert-vetted draft generation, not autonomous scientific explanation.
Related K-Dense Writing
This benchmark sits inside a broader K-Dense research program on multimodal scientific agents, verification, and evidence-grounded AI for science. For more context, these posts explain how we think about the same reliability problem from adjacent angles:
- Science is Multimodal: K-Dense and NVIDIA on Nemotron 3 Nano Omni, on why scientific work is naturally multimodal and why visual, audio, and language understanding matter for lab workflows.
- The AI Co-Scientist Is Here. The Bottleneck Is Verification., on why the hard part of scientific AI is no longer generation alone, but checking claims, evidence, and workflows.
- Reproduction, Not Generation, Is AI's Killer App for Science, on why reproducibility is the practical test for AI systems that make scientific claims.
- The Week Science Models Became Real, on the arrival of frontier science models and the shift from raw capability to evidence and product reliability.
- Benchmarking NVIDIA BioNeMo Agent Toolkit Skills for NIM microservices, on K-Dense's benchmark approach for evaluating scientific agent workflows under controlled conditions.
