Agentic Data Scientist is an open source framework that doesn't just assist with data analysis—it performs it. Give it a research question and your data, and it will plan the analysis, write and execute the code, validate the results, and deliver a comprehensive report. No hand-holding required.
Beyond the Chatbot Paradigm
Most AI tools treat data science as a conversation. You ask how to do something, get code snippets, paste them into a notebook, fix the errors, and repeat. The AI helps, but you're still the one doing the work.
This approach has a fundamental problem: the AI doesn't know what it doesn't know. It can't see when its suggested approach won't work with your data. It can't catch its own mistakes. It can't adapt when initial results reveal something unexpected.
Agentic Data Scientist takes a different approach entirely. Instead of answering questions about data science, it does data science.
How It Actually Works
The framework uses multiple specialized AI agents, each responsible for a distinct phase of the analytical process. This separation matters—it's why the system produces reliable results rather than confident-sounding mistakes.
Planning comes first. Before any code runs, a planning agent creates a comprehensive analysis plan with explicit stages and success criteria. A separate review agent validates this plan, checking for gaps and ensuring it addresses your actual question. Only when the plan passes review does execution begin.
This might seem like overhead, but it's actually the opposite. Thorough planning up front prevents the cascade of errors that happens when you start coding before you've thought through the approach. The plan becomes a contract that the system holds itself accountable to.
Execution happens stage by stage. A coding agent implements each stage of the plan, with full access to scientific computing tools—BioPython, RDKit, PyDESeq2, scanpy, and over a hundred other specialized packages. After each stage, a review agent validates that the implementation actually accomplished what it was supposed to.
The system adapts as it learns. Here's where things get interesting. After each stage, a reflection agent analyzes what was accomplished and what was discovered. If the data reveals something unexpected—a batch effect, an outlier population, a confounding variable—the remaining plan is adapted accordingly.
This is how experienced data scientists actually work. You start with a plan, but you adjust it based on what the data tells you. Rigid adherence to an initial plan produces bad science. Agentic Data Scientist builds this adaptive capacity into the framework itself.
Validation is continuous. Success criteria established during planning are tracked throughout execution. At any point, the system knows exactly what has been accomplished and what remains. This isn't just progress tracking—it's objective measurement of whether the analysis is actually answering your research question.
The Power of Specialized Agents
A single AI trying to do everything faces an impossible task. Planning requires broad thinking about analytical approaches. Coding requires precise implementation details. Review requires critical evaluation of whether something actually worked. Reflection requires stepping back to see the bigger picture.
By using specialized agents for each role, Agentic Data Scientist brings appropriate capabilities to each phase:
The planning agents think strategically about what needs to be done and in what order, establishing clear criteria for success before any implementation begins.
The coding agent has access to the full scientific Python ecosystem—over 120 specialized skills for genomics, proteomics, cheminformatics, and general data science. It writes and executes real code, not pseudocode or suggestions.
The review agents evaluate implementations critically, checking whether the code actually accomplished its intended purpose rather than just whether it ran without errors.
The reflection agent synthesizes progress and adapts the plan based on discoveries, ensuring the analysis remains aligned with the original research question even as the approach evolves.
Real Examples
Differential expression analysis. Upload your RNA-seq count data, describe your experimental design, and the system will normalize appropriately, perform statistical testing, correct for multiple comparisons, and generate publication-ready volcano plots and heatmaps.
Predictive modeling. Point it at a dataset and ask for a model predicting your outcome of interest. The system will handle feature engineering, model selection, cross-validation, and interpretation—delivering not just a model but an understanding of what drives its predictions.
Complex multi-step workflows. Real analyses rarely fit into single-step frameworks. Customer churn prediction requires data cleaning, feature engineering, model training, and business interpretation. Agentic Data Scientist handles the full pipeline, maintaining context across stages.
Why Open Source Matters
Agentic Data Scientist is available under the MIT license because we believe powerful analytical tools should be accessible to researchers everywhere. The framework is on GitHub and installable via pip.
The code is designed to be extensible. Customize the prompts to fit your domain. Add new tools for specialized analyses. Modify the workflow to match how your team works. The framework provides the foundation; you can build on it however you need.
All of This—And More—In K-Dense Web
Everything in Agentic Data Scientist is also integrated into K-Dense Web, our comprehensive AI platform. K-Dense Web extends these capabilities with a managed cloud environment, enhanced scientific skills, persistent project sessions, and enterprise features for research teams.
The open source framework gives you the core analytical engine. K-Dense Web wraps it in a production-ready platform with additional capabilities for teams that need them.
Getting Started
Install with a single command:
pip install agentic-data-scientist
Then run your first analysis:
agentic-data-scientist "Analyze this dataset and identify key patterns" --mode orchestrated --files data.csv
The framework requires API keys for Anthropic and OpenRouter—the AI providers that power the specialized agents. Once configured, you have a complete data science workflow at your command line.
For simpler tasks that don't need the full planning-execution-validation cycle, a direct mode skips the overhead:
agentic-data-scientist "Write a script to merge these CSV files" --mode simple --files data1.csv data2.csv
Join the Community
We've built a Slack community where researchers share workflows, troubleshoot issues, and push the boundaries of what's possible with agentic data science. Whether you're analyzing genomics data, building financial models, or exploring entirely new domains, we'd love to have you.
Ready to let AI do the analysis? Get started with Agentic Data Scientist or try the full platform at K-Dense Web.
