Skip to content

How to run OpenCode with LM Studio locally (Qwen3.5 on a 6-year-old MacBook M1)

Qwen3.5 models are reeeeaally good.

I ran an agentic coding session with OpenCode and lmstudio-community/Qwen3.5-9B-GGUF (Q4_K_M). The model produced a working Telegram bot, then updated it to forward user messages to a local LM Studio OpenAI-compatible server.

The reply model was lmstudio-community/Qwen3.5-0.8B-GGUF running on localhost.

All of this ran on my soon-to-be 6-year-old MacBook M1.

It is not fast. But for small, sensitive, offline tasks, it is absolutely usable.

What I tested

  • OpenCode: 1.2.10
  • LM Studio: 0.4.6
  • Inference backend: Metal llama.cpp 2.5.1
  • Coding model in OpenCode: Qwen3.5 9B Q4_K_M
  • Reply model behind local API: Qwen3.5 0.8B

Quick architecture

  • You prompt OpenCode
  • OpenCode uses local Qwen3.5 9B to plan/edit code
  • Telegram bot receives messages
  • Bot forwards user text to LM Studio (http://localhost:1234/v1/chat/completions)
  • LM Studio runs Qwen3.5 0.8B and returns a reply
  • Bot sends the reply back to Telegram

That split worked well: larger model for coding, tiny model for chat round-trips.

From prompt to working bot (session notes)

The flow was straightforward:

  1. I asked for a simple /ip Telegram bot using jsonip.com and env-based token loading.
  2. The model explored the folder, detected the existing venv, read vars.env, and generated bot.py.
  3. I then asked for a bridge: Telegram <-> LM Studio local OpenAI server on localhost:1234.
  4. It rewrote the bot to forward chat messages to lmstudio-community/Qwen3.5-0.8B-GGUF.

In practice, it took one short session with a couple of follow-ups to go from idea to working flow.

Install the stack

1) Install OpenCode

Use one of these:

bash
curl -fsSL https://opencode.ai/install | bash
# or
npm i -g opencode-ai
# or
brew install anomalyco/tap/opencode

Check install:

bash
opencode --version

1.1) Configure OpenCode to use LM Studio locally

Create ~/.config/opencode/opencode.json:

json
{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "lmstudio": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "LM Studio (local)",
      "options": {
        "baseURL": "http://localhost:1234/v1"
      },
      "models": {
        "lmstudio-community/Qwen3.5-9B-GGUF": {
          "name": "qwen3.5-9b"
        },
        "lmstudio-community/Qwen3.5-0.8B-GGUF": {
          "name": "qwen3.5-0.8b"
        }
      }
    }
  }
}

Quick check:

bash
cat ~/.config/opencode/opencode.json

2) Install LM Studio

  • Install LM Studio desktop (0.4.6) from the official website.
  • Open LM Studio and install/load:
    • lmstudio-community/Qwen3.5-9B-GGUF (for coding sessions)
    • lmstudio-community/Qwen3.5-0.8B-GGUF (for fast bot replies)

If you prefer CLI, LM Studio also provides lms tooling.

3) Start LM Studio local server

Use LM Studio Developer tab, or CLI:

bash
lms server start --port 1234

Make sure the target model is loaded and available through OpenAI-compatible endpoints.

4) Create a small Telegram bot project

bash
mkdir telegram-bot && cd telegram-bot
python3 -m venv venv
source venv/bin/activate
pip install python-telegram-bot requests

Create vars.env:

bash
export TELEGRAM_BOT_TOKEN="<your-bot-token>"

5) Use OpenCode to generate and refine the bot

Inside the project:

bash
source venv/bin/activate
source vars.env
opencode

Then prompt it with something like:

text
Write a simple telegram bot. It should forward incoming messages to
http://localhost:1234/v1/chat/completions using model
lmstudio-community/Qwen3.5-0.8B-GGUF, then return the response to Telegram.
Use TELEGRAM_BOT_TOKEN from environment variables.

In my run, the 9B model produced working code in one session and iterated quickly on follow-up requests.

Minimal bot example

python
import os
import json
import requests
from telegram import Update
from telegram.ext import Application, CommandHandler, MessageHandler, filters

MODEL_NAME = "lmstudio-community/Qwen3.5-0.8B-GGUF"
LLM_URL = "http://localhost:1234/v1/chat/completions"


async def forward_to_llm(update: Update, context) -> None:
    """Forward message to local LLM and send response back"""
    user_message = update.message.text if update.message else ""

    if not user_message:
        return

    try:
        payload = {
            "model": MODEL_NAME,
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_message},
            ],
            "temperature": 0.7,
            "max_tokens": 512,
        }

        response = requests.post(LLM_URL, json=payload, timeout=30)
        data = response.json()

        if "choices" in data and len(data["choices"]) > 0:
            await update.message.reply_text(data["choices"][0]["message"]["content"])
        else:
            await update.message.reply_text("No response from LLM")

    except requests.exceptions.ConnectionError:
        await update.message.reply_text(
            "Cannot connect to local LLM server at localhost:1234"
        )
    except Exception as e:
        await update.message.reply_text(f"Error: {str(e)}")


async def start(update: Update, context) -> None:
    """Welcome message"""
    await update.message.reply_text(
        "I'm connected to a local LLM!\n\nSend me any message and I'll forward it to the model.\n\nModel: Qwen3.5-0.8B-GGUF"
    )


def main():
    application = (
        Application.builder().token(os.environ.get("TELEGRAM_BOT_TOKEN") or "").build()
    )

    application.add_handler(CommandHandler("start", start))
    application.add_handler(
        MessageHandler(filters.TEXT & ~filters.COMMAND, forward_to_llm)
    )

    application.run_polling(allowed_updates=Update.ALL_TYPES)


if __name__ == "__main__":
    main()

Run:

bash
source venv/bin/activate
source vars.env
python bot.py

Reality check on M1 performance

It is slow on my M1. No debate.

But for short coding loops, script generation, local automations, and sensitive/offline workloads, it works.

The practical win is this: I can keep data local, still use an agentic workflow, and still get real output.

Also, no, this does not replace OpenClaw. Not yet. OpenClaw can sleep peacefully tonight.

For this run, I configured a 32k context window and basically maxed out RAM usage on the 16 GB MacBook M1. In practice, this specific coding session used roughly 16k tokens of that window.

Why this matters (for small teams)

If your team does not want every coding task leaving your machine/network, this setup is a serious option right now.

Not perfect. Not blazing fast. But operationally simple and private enough for a lot of day-to-day tasks.

I am now very curious to rerun the exact same flow on:

  • Apple Silicon M4/M5 class hardware
  • A stronger desktop setup (for example AMD 395+ tier)

Because if this is already usable on an old M1, the next hardware jump should be interesting.

FAQ: OpenCode + LM Studio local setup

Is OpenCode + LM Studio usable on a MacBook M1 with 16 GB RAM?

Yes, for small tasks. It is slow, but usable for short coding loops and private/offline work.

Which Qwen3.5 model should I use for local agentic coding?

In this setup, Qwen3.5-9B-GGUF handled coding inside OpenCode, while Qwen3.5-0.8B-GGUF handled fast Telegram chat replies.

What context window did you use?

I configured a 32k context window and this session consumed about 16k tokens.

Can this replace a higher-end coding stack today?

Not fully. It is practical for local, sensitive tasks, but still slower than stronger hardware and cloud-assisted setups.

Read more

Last updated:

Advanced Stack