Stop Paying for LLM API Calls: Build Your Own API with OpenCode Serve

Every side project I build needs LLM capabilities — chatbots, content generators, automation scripts. But paying per token for API calls to OpenAI, Anthropic, and Google adds up fast, especially when you're prototyping and iterating. I already pay for a coding subscription that includes AI access, so I built a way to stop double-paying: a self-hosted LLM API powered by OpenCode's built-in server mode.

OpenCode is a terminal-based AI coding assistant (an alternative to Claude Code) — but hidden inside it is a full HTTP API server. When you run opencode serve, it exposes a RESTful API with session management, multi-provider support, and synchronous LLM responses — all using the AI access you already have through your coding subscription. You can send a prompt and get a complete response back in a single HTTP call — no WebSocket juggling, no streaming complexity. It supports Anthropic, OpenAI, Google, OpenRouter, and any provider you configure in your opencode.json.

Starting the Server

Everything starts with one command:

opencode serve --hostname 0.0.0.0 --port 4096

That's it. OpenCode launches a headless HTTP server with a full REST API. No TUI, no terminal interface — just an API server ready to accept requests. You can configure the port, hostname, and CORS through flags or in your opencode.json. The server initializes all your configured providers and MCP servers once at startup, so there's no cold-start penalty on each request.

Wrapping the API in my Go Backend

The opencode server exposes a straightforward API: create a session, send a message, read the response, delete the session. My Go service wraps this into a single SendPrompt call that handles the full lifecycle:

func (s *OpenCodeAPIService) SendPrompt(ctx context.Context, prompt, model, agent string) (string, error) {
    sessionID, err := s.createSession()
    if err != nil {
        return "", fmt.Errorf("failed to create opencode session: %w", err)
    }

    defer func() {
        s.deleteSession(sessionID)
    }()

    response, err := s.sendMessage(ctx, sessionID, prompt, model, agent)
    if err != nil {
        return "", fmt.Errorf("failed to send message: %w", err)
    }

    return response, nil
}

Creating a session is a single POST to /session:

func (s *OpenCodeAPIService) createSession() (string, error) {
    body := map[string]string{"title": "API Request"}
    data, _ := json.Marshal(body)

    resp, err := s.client.Post(s.serverURL+"/session", "application/json", bytes.NewReader(data))
    // ... parse response to get session ID
}

Sending a message POSTs to /session/{id}/message with the prompt, optional model, and agent selection. The call is synchronous — it blocks until the LLM finishes responding, so the full answer is ready when the HTTP response comes back. I extract the text parts from the response, join them, and return a clean string.

One drawback of this approach: opencode doesn't have a single-shot "send prompt and get response" endpoint. You always need to create a session first, then send your message to that session. For a stateless API wrapper like mine, this means an extra HTTP call on every request. It's a minor overhead, but worth knowing — you can't just POST a prompt directly and get a response back.

The HTTP handler exposes this as POST /api/opencode with JWT authentication, prompt size validation (100K character cap), and proper error handling:

func (h *OpenCodeAPIHandler) SendPrompt(c *gin.Context) {
    var req openCodeAPIRequest
    if err := c.ShouldBindJSON(&req); err != nil {
        utils.ValidationError(c, "prompt is required")
        return
    }

    response, err := h.service.SendPrompt(c.Request.Context(), req.Prompt, req.Model, req.Agent)
    if err != nil {
        utils.InternalErrorResponse(c, "Failed to get response: "+err.Error())
        return
    }

    utils.SuccessResponse(c, http.StatusOK, gin.H{"response": response})
}

A single cURL call gives me access to any model configured in opencode:

curl -X POST http://localhost:3001/api/opencode \
  -H "Authorization: Bearer YOUR_JWT" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Explain closures in Go", "model": "anthropic/claude-sonnet-4-20250514"}'

I also built a session-based variant (OpenCodeServerService) that keeps sessions alive for multi-turn conversations — useful for my Telegram bot where I chat with the AI interactively. And a third approach using tmux for full terminal-level interaction when I need opencode's file editing and tool execution capabilities. The stateless API bridge covers 90% of my use cases though.

The result is one API endpoint, one auth layer, zero per-token API bills for my side projects and automation. I pay for my coding subscription — which I already need — and route all my LLM needs through it. The entire stack runs in a single Docker container with opencode as the sidecar and Go handling the API surface. I use this daily through my Telegram bot, automation scripts, and web dashboard — all hitting the same unified endpoint without a separate API billing meter running.

If you want to go deeper into wrapping AI tools as APIs, check out my guide on turning Claude Code into an LLM API or building custom MCP servers to extend AI capabilities further.