Limited TimeFast mode is completely free!Try it now

Agentic Data Scientist: An Open Source AI That Actually Does the Analysis

Introducing our free, open source multi-agent framework that plans, executes, and validates complex data science workflows, from differential expression to predictive modeling.

Share:
Agentic Data Scientist: An Open Source AI That Actually Does the Analysis

Agentic Data Scientist is an open source framework that doesn't just assist with data analysis — it does the analysis. Give it a research question and your data. It plans the approach, writes and runs the code, validates the results, and produces a full report. No supervision needed.

Beyond the chatbot paradigm

Most AI tools treat data science as a conversation. You ask how to do something, get code snippets, paste them into a notebook, fix the errors, and repeat. The AI helps, but you're still the one doing the work.

This has a real problem: the AI can't see when its suggested approach won't work with your actual data. It can't catch its own mistakes. It can't adapt when initial results reveal something unexpected.

Agentic Data Scientist doesn't answer questions about data science. It does data science.

How it actually works

The framework uses multiple specialized AI agents, each responsible for a distinct phase of the analytical process. That separation is why the system produces reliable results rather than confident-sounding mistakes.

Before any code runs, a planning agent creates an analysis plan with explicit stages and success criteria. A separate review agent validates that plan — checking for gaps, checking whether it actually addresses your question. Execution starts only after the plan passes review.

This might seem like overhead. It isn't. Thorough planning prevents the cascade of errors that happens when you start coding before you've thought through the approach. The plan becomes a contract the system holds itself accountable to.

A coding agent then implements each stage, with access to scientific computing tools like BioPython, RDKit, PyDESeq2, scanpy, and over a hundred other packages. After each stage, a review agent checks that the implementation accomplished what it was supposed to — not just that it ran without errors.

After each stage, a reflection agent analyzes what was accomplished and what was discovered. If the data reveals something unexpected — a batch effect, an outlier population, a confounding variable — the remaining plan gets updated. Real analysis doesn't follow a fixed script. The plan is a starting point, not a commitment.

Throughout all of this, the success criteria established during planning are tracked continuously. At any point, the system knows what's been completed and what's still open. That's objective measurement of whether the analysis is answering your research question, not just a progress bar.

Why specialized agents matter

A single AI doing everything hits a wall quickly. Planning requires broad thinking. Coding requires precise implementation. Review requires asking whether something actually worked, not just whether it compiled. Each phase needs a different mode of reasoning.

The planning agents think strategically about what needs to happen and in what order, with clear success criteria before any implementation begins. The coding agent has access to the full scientific Python ecosystem — over 120 specialized skills for genomics, proteomics, cheminformatics, and general data science — and writes real code, not pseudocode. The review agents check whether implementations accomplished their purpose. The reflection agent synthesizes progress and adjusts the plan as new information comes in.

Real examples

Upload RNA-seq count data with your experimental design, and the system normalizes it appropriately, runs statistical tests, corrects for multiple comparisons, and generates volcano plots and heatmaps. Point it at a dataset and ask for a predictive model — it handles feature engineering, model selection, cross-validation, and interpretation, including which features actually matter. For something like customer churn prediction, it runs the full pipeline from messy raw data to business-readable output, keeping context across every stage.

Why open source

Agentic Data Scientist is available under the MIT license. The framework is on GitHub and installable via pip.

The code is meant to be modified. Customize the prompts for your domain, add tools for specialized analyses, reshape the workflow to match how your team actually works. It's a foundation, not a closed box.

All of this in K-Dense Web

Everything in Agentic Data Scientist is also integrated into K-Dense Web. K-Dense Web adds a managed cloud environment, enhanced scientific skills, persistent project sessions, and enterprise features. The open source framework gives you the core engine; K-Dense Web is for teams that need more around it.

Getting started

Install with:

pip install agentic-data-scientist

Then run your first analysis:

agentic-data-scientist "Analyze this dataset and identify key patterns" --mode orchestrated --files data.csv

The framework requires API keys for Anthropic and OpenRouter. Once configured, you have a full data science workflow at the command line.

For simpler tasks that don't need the full planning-execution-validation cycle, skip the overhead:

agentic-data-scientist "Write a script to merge these CSV files" --mode simple --files data1.csv data2.csv

Community

We have a Slack community where researchers share workflows, troubleshoot, and generally stress-test what agentic analysis can handle. If you're doing something interesting with it, come share it.


Get started with Agentic Data Scientist or try the full platform at K-Dense Web.

Enjoyed this article? Share it with others!

Share:
Back to all posts