Training machine learning models well requires a rare combination of skills: theoretical knowledge of algorithms, practical experience with what actually works, and the patience to run countless experiments. What if an AI agent could handle all of that for you?

Karpathy is our open source agentic machine learning engineer. Give it a dataset and a goal, and it will design experiments, write training code, tune hyperparameters, and iterate until it achieves state-of-the-art results. Named as a tribute to Andrej Karpathy, whose educational work has shaped how a generation thinks about deep learning. This tool embodies the kind of methodical, experiment-driven approach that defines great ML engineering.

The ML Engineering Bottleneck

Machine learning has a labor problem. The algorithms are well-documented. The frameworks are mature. GPUs are available on demand. Yet training a model that actually performs well on your specific problem still requires extensive manual effort.

You start with a baseline. It underperforms. You try different architectures, adjust learning rates, experiment with regularization, debug data pipeline issues, and run the same experiment with different random seeds to make sure your results are real. Each iteration takes hours or days. Most don't improve anything.

This isn't glamorous work, but it's where models are actually made. The difference between a paper result and a production model often comes down to hundreds of small decisions made during training, decisions that require both expertise and patience.

Karpathy automates this entire loop.

How It Works

Karpathy combines the Claude Code SDK with Google's Agent Development Kit to create an AI that doesn't just suggest ML approaches. It implements and executes them.

The agent has access to the full scientific Python ecosystem: PyTorch, transformers, scikit-learn, and specialized libraries for everything from computer vision to natural language processing. When you give it a task, it writes real training scripts, runs them, analyzes the results, and decides what to try next.

This is more than code generation. The agent maintains context across experiments, remembers what worked and what didn't, and builds on previous results rather than starting fresh each time. It's the difference between getting code snippets from a chatbot and having an experienced engineer iterate on your problem.

Scientific Skills Built In

What makes Karpathy particularly powerful is its integration with Claude Scientific Skills, a comprehensive collection of specialized tools and workflows for scientific computing.

When the agent encounters a problem in genomics, it has access to BioPython and specialized bioinformatics workflows. For cheminformatics, it can leverage RDKit. For single-cell analysis, scanpy is available. Over a hundred specialized skills are automatically loaded, giving the agent deep capabilities across scientific domains.

This matters because real ML problems rarely exist in isolation. You're not just training a classifier. You're training a classifier on protein sequences, or molecular structures, or clinical time series. Domain-specific tooling makes the difference between a generic model and one that actually captures the structure of your problem.

A Starting Point for Agentic ML

Karpathy is intentionally simple in its architecture. It demonstrates what's possible when you combine modern AI capabilities with scientific computing tools, but it's designed as a foundation rather than a complete solution.

The codebase is clean and extensible. Want to add support for a new ML framework? Straightforward. Need to customize the experimentation logic for your specific workflow? The agent's behavior is configurable. Building something more complex on top? The architecture supports it.

We've kept the implementation minimal because we believe the best tools are ones you can understand and modify. Karpathy isn't a black box. It's a starting point for building agentic ML systems tailored to your needs.

What's Coming

We're actively developing additional capabilities:

Modal sandbox integration will let you choose any compute configuration, from a single GPU for quick experiments to multi-node clusters for large-scale training. The agent will manage resource allocation automatically based on what the experiment requires.

Additional K-Dense Web features may become available in the open source version based on community interest. We're listening to what researchers actually need.

More Power in K-Dense Web

Everything Karpathy can do is also available in K-Dense Web, where it's part of a more comprehensive multi-agent system for end-to-end machine learning workflows.

K-Dense Web extends these capabilities with managed compute infrastructure, persistent experiment tracking, team collaboration features, and tighter integration with our other tools for scientific writing and data analysis. If you need production-grade ML engineering at scale, that's where to look.

The open source Karpathy gives you the core agent. K-Dense Web wraps it in everything else you need for serious ML work.

Get Started

Clone the repository and you can be running experiments in minutes:

git clone https://github.com/K-Dense-AI/karpathy.git
cd karpathy
uv sync
python start.py

The setup script creates a sandboxed environment with all necessary dependencies, loads the scientific skills, and starts a web interface where you can interact with the agent.

Add your datasets to the sandbox directory, describe what you want to achieve, and let Karpathy handle the engineering.

Join the Community

We've built a Slack community where researchers share their experiments, discuss what's working, and help each other push the boundaries of agentic ML. Whether you're training models on genomics data, building recommender systems, or exploring new architectures, we'd love to have you.

Ready to automate your ML engineering? Get started with Karpathy or try the full platform at K-Dense Web.