Every serious tool across cloud IDEs, CLI agents, autonomous agents, and self-hosted infrastructure — with a deep focus on the agentic capabilities, protocols, and context systems reshaping how software gets built.
In 2024, AI coding was autocomplete. In 2026, it's orchestration — agents that plan, write, test, run terminal commands, spawn sub-agents, and merge pull requests while you sleep.
The question developers ask has changed. It's no longer "can AI help me write this function?" It's "which tool do I trust to own this feature end-to-end?" That shift has fractured the market into four distinct archetypes that solve different problems and serve different workflows. The tools are no longer interchangeable.
According to GitHub's Octoverse report, 92% of US developers now use AI coding tools. But a RAND study found that 80–90% of products labeled "AI agent" are still chatbot wrappers underneath. The 15 tools in this guide are the real deal — they can genuinely plan, execute, iterate, and in some cases close the entire loop from ticket to merged PR.
Agentic tools are only as powerful as what they can connect to. Three open standards now define that infrastructure — and every serious tool has either adopted them or is racing to.
Before MCP, every AI tool integration was a bespoke one-off. A GitHub connector in Cursor needed completely different code than the same connector in Copilot. MCP solved the N×M problem: build one server, and every MCP-compatible agent can use it. By March 2026 the ecosystem had crossed 97 million monthly SDK downloads and over 10,000 active public servers — faster adoption than any developer protocol since GraphQL.
The real skill in 2026 isn't prompt writing — it's context engineering. How you structure the persistent instructions, memories, and rules that shape every agent interaction determines whether your AI behaves like a senior engineer or a confused intern.
Every major tool now has a configuration file system. But the implementations differ substantially. Understanding which file does what — and how they interact — is critical for teams that want consistent, codebase-aware AI behaviour across their entire engineering organisation.
applyTo frontmatter. Personal user-level instructions take highest priority. Organisation-level instructions became GA in April 2026.CLAUDE.md (for Claude Code users) and an AGENTS.md (universal fallback). If your team uses multiple tools — or if you run an open-source project where contributors bring their own agents — AGENTS.md gives you broader coverage with one file. For Claude Code power users, CLAUDE.md's path scoping and hook system is worth the extra effort.
"The real skill in working with coding agents is no longer prompt design. It's context engineering — how you structure the persistent instructions and specifications that shape every agent interaction across every session."
— Medium, State of AI Coding Agents, March 2026The five tools most developers reach for every day — each with a distinct philosophy about where AI lives in your workflow.
This is the matrix that actually matters. Beyond autocomplete quality, here's how every tool handles the capabilities that define truly agentic behaviour.
SWE-bench is the gold standard — real GitHub issues, not synthetic puzzles. But scaffolding matters as much as the model. The same model can score 10+ points apart depending on how context is retrieved.
Two open-weight models released in early 2026 that are reshaping what's possible in self-hosted agentic workflows. Different families, different strengths — both worth knowing.
35B total parameters, only 3B active (MoE). Released February 24, 2026. Beats the previous-generation Qwen3-235B-A22B (22B active) — better architecture, not bigger scale. Runs on a MacBook Pro 24GB. General-purpose with exceptional coding: strong SWE-bench, BFCL tool use, and 1M token context. The practical sweet spot for most self-hosted agentic workflows.
Context: 262K native · 1M extended · License: Apache 2.0 · Run via: ollama run qwen3.5:35b-a3b
Released April 2, 2026. Built on Gemini 3 research. Currently #3 open model on Arena AI leaderboard. Native function calling, multimodal (text + vision + audio), Apache 2.0. No SWE-bench score published — Google evaluated against LiveCodeBench, τ2-bench, and GPQA Diamond instead.
Context: 256K tokens · License: Apache 2.0 · Run via: ollama run gemma4:31b
The gap between open-source and proprietary AI coding has closed dramatically. With the right model, self-hosted setups now match proprietary tools on everyday tasks — at zero recurring cost.
"The practical setup: local models for 80% of routine coding, Claude Code for the 20% of hard problems requiring frontier reasoning. The gap is real but narrowing every month."
— InsiderLLM, Best Local Alternatives to Claude Code, February 2026| Tool | Free tier | Entry paid | Pro / Max | Model access | Notable |
|---|---|---|---|---|---|
| Cursor | Limited trial | $20/mo | $40–200/mo | Claude · GPT · Gemini | SOC 2 II · unpredictable credit burn on heavy agent use |
| Windsurf | Free (SWE-1.5 3mo promo) | $20/mo | $40–200/mo | SWE-1.5 · Claude · GPT | Was $15; raised to $20 March 2026. Quota-based. |
| GitHub Copilot | 50 premium req/mo | $10/mo | $19–39/mo | GPT-5 · Claude · Gemini | Cheapest entry. Best for GitHub-native teams. |
| Kiro | Preview (free) | — | Enterprise custom | Claude Sonnet 4.5 + Auto | AWS GovCloud. Post-GA pricing unknown. |
| Antigravity | Generous* | $20/mo | $249.99/mo | Gemini 3.1 · Claude 4.6 · GPT-OSS | ⚠ Opaque credits; hidden weekly cap; 5 unpatched CVEs |
| Claude Code | Free (Sonnet 4.6, limited) | $20/mo | $100–200/mo | Opus 4.6 · Sonnet 4.6 | Token-based. 95% savings possible with caching + batch. |
| Codex CLI | Free (open source) | Bundled with ChatGPT Plus $20 | $200/mo (Pro) | GPT-5.3 Codex | MIT. Best Terminal-Bench scores. 1M devs in month one. |
| Gemini CLI | Free (open source) | — | — | Gemini 2.5 Pro | MIT. Voice input. Best free CLI entry point. |
| Devin 2.2 | — | $20/mo + $2.25/ACU | Enterprise custom | Proprietary | 67% PR merge on defined tasks. Down from $500/mo. ACU = ~15 min active work. |
| Cline | Free (VS Code ext) | API costs only | Teams plan (SSO) | Any via BYOK | $5–200/mo in API costs depending on usage and model. |
| Kilo Code | Free + $20 credits | $20/user/mo (post-Q1) | Enterprise custom | 500+ via BYOK (no markup) | First 10 seats permanently free. |
| OpenHands | Free (self-host) | SaaS plan | On-prem custom | Any via BYOK | $18.8M Series A. Enterprise SDK. Docker sandbox. |
| Aider | Free (Apache 2.0) | API costs only | — | Any via BYOK / local | $0 with Ollama + local model. Best for air-gapped setups. |
| Tabby | Free (self-host) | — | Enterprise custom | Local GPU | Hardware cost only. After initial pull: fully air-gap. |
| Continue.dev | Free (all features) | Hub team plan | — | Any via BYOK | Only OSS with native VS Code + JetBrains. Free tier complete. |
No single tool wins across all dimensions. The right choice depends on where you sit on three axes: how much autonomy you want to delegate, how much your code can leave your infrastructure, and how much you want to spend.
| Your situation | Best pick | Why |
|---|---|---|
| Polished daily coding, familiar VS Code feel, reliability first | Cursor | SOC 2 certified, fastest completions, 8 parallel agents, zero data-loss record, most mature agentic IDE. |
| Best context memory and model comparison across sessions | Windsurf | Cascade Memories learns your patterns. Arena Mode helps you find which model actually works for your codebase. SWE-1.5 is fastest at 950 tok/sec. |
| Hardest reasoning, largest codebase, maximum MCP integration | Claude Code | 80.9% SWE-bench, 1M token context, deepest MCP, Agent Skills, /loop, hooks. The escalation tool when others fail. |
| Cheapest entry, GitHub-centric team, just getting started | GitHub Copilot | $10/mo, broadest IDE support, GitHub-native. Ceiling is real but for inline work it's the best value per dollar. |
| Auditable, spec-driven workflows (regulated industries, compliance) | Kiro | Only tool with mandatory requirement → design → task checkpoints. Hooks for automation. AWS GovCloud available. |
| Defined backlog tasks: migrations, test writing, documentation | Devin 2.2 | 67% PR merge rate on scoped tasks. Assign via Slack, review the PR. Don't use on ambiguous or exploratory work. |
| Maximum OSS features, want autocomplete + agents in one tool | Kilo Code | Superset of Cline + Roo. 500+ models, Orchestrator mode, Memory Bank, inline autocomplete, JetBrains. Free BYOK. |
| Git precision, surgical multi-file edits, clean commit history | Aider | Every edit is a commit. Every session is a branch. $0 with Ollama + Qwen3.5-35B-A3B. Air-gap capable. |
| Code must never leave your infrastructure | Tabby + Cline | Tabby on GPU: true air-gap, SSO, RBAC, usage analytics. Point Cline at it. 60% lower cost than SaaS at scale. |
| Enterprise autonomous agents, need custom agent deployment | OpenHands | $18.8M backed. SDK for custom agents. Docker sandbox. 50%+ GitHub issue resolution. Deployable on-prem. |
| Multi-agent browser automation, prototyping, Google ecosystem | Antigravity (with caveats) | Manager View + Chrome sub-agent are unique. Accept quota opacity and CVEs as preview tax. Not for production repos. |