ChatRaj
Buyer's guide

The 6 best self-hosted AI chatbots in 2026

Independent rankings for regulated teams. Real infrastructure costs, real GPU requirements, and an honest take on when self-hosting is the wrong answer.

Jump to the rankings
Bottom line
The 6 best self-hosted AI chatbots in 2026 are Botpress (open core, SaaS-like UX), Chatwoot (MIT, live chat plus AI), Rasa (CALM engine, enterprise NLU heritage), Open WebUI (~136k stars, mature multi-user), AnythingLLM (RAG-focused workspace), and LibreChat (ChatGPT-style UI with MCP). Expect $300 to $2,000 per month in real infrastructure once you add GPU rental, a Postgres or vector store, monitoring, and patching time. If that ops burden is too high, ChatRaj is the managed SaaS alternative at $29 to $99 per month with EU residency on request.
Reviewed by ··11 min read
Jump to section

What "self-hosted AI chatbot" actually means in 2026

The phrase "self-hosted AI chatbot" is doing a lot of work, and the four interpretations that hide under it have very different ops profiles.

The first interpretation is fully on-premise, fully air-gapped, with the model weights, the vector store, the orchestration layer, and the front-end UI all running inside your own data centre with no outbound network calls. This is what banking, defence, and certain hospital network deployments require. Rasa is still the default choice for this shape.

The second interpretation is self-hosted on your own cloud account, typically a Hetzner, OVH, or AWS region inside the jurisdiction you need to comply with. The chatbot runs inside your VPC; the model can be a local Llama or Mistral served via vLLM, or it can call out to an EU-hosted Mistral or Anthropic Bedrock endpoint. This is the most common modern shape and is what Open WebUI, AnythingLLM, and LibreChat typically deploy as.

The third interpretation is self-hosted application, hosted model API. The chatbot code runs in your infrastructure, but the actual LLM call goes to OpenAI, Anthropic, or Google. This still gives you control of the application data layer and the visitor conversation logs, which is often enough for GDPR-driven decisions but is not enough for HIPAA or banking-regulator deployments.

The fourth interpretation is bring-your-own-key SaaS, which is sometimes mis-marketed as self-hosted. It is not. The vendor runs the app; you provide the model API key. That is a useful billing pattern but does not satisfy any data-residency regulation worth taking seriously.

For the rest of this guide, "self-hosted" means at minimum interpretation 2: your infrastructure, your control plane, your conversation storage. The six vendors below all support that shape.

The true cost of self-hosting (ops + infra + model inference)

Open-source licensing is free. Running the resulting platform reliably is not. The vendor pricing page lists $0; your monthly cloud bill, security budget, and engineering hours do not.

A realistic 2026 cost profile for a serious self-hosted deployment looks like this.

GPU inference. If you serve a local model (the whole point of self-hosting for many teams), you need GPU time. Hetzner offers dedicated H100 instances in the ballpark of $1.50 per hour at the time of writing, which works out to roughly $1,100 per month for a single GPU running 24/7. AWS A100 instances on-demand land closer to $3 per hour and roughly $2,200 per month for the same uptime. If you can tolerate spot or interruptible pricing, both clouds offer 40 to 70 percent discounts. If you only serve the model when traffic comes in (cold-start tolerant), serverless GPU providers like RunPod or Modal bring that figure down further but introduce 5 to 30 seconds of first-token latency.

Application compute. The chatbot app itself, the embedding service, the queue worker, the admin UI. Plan on $40 to $200 per month for a 4 to 16 vCPU box on Hetzner, OVH, or a comparable EU provider.

Database and vector store. Postgres or MySQL for application state, plus a vector store for embeddings. pgvector inside the same Postgres instance keeps this cheap. A managed Postgres at AWS RDS or Aiven with a small vector index lands at $60 to $200 per month for SMB scale.

Monitoring, logs, backups. Sentry, Datadog, Better Stack, or a self-hosted Grafana plus Loki plus Prometheus stack. Plan on $30 to $150 per month for hosted, or two days of setup plus an extra small VM for self-hosted.

Engineering time. This is the cost most teams underestimate. Expect 4 to 12 hours per month of dedicated ops work for security patches, model upgrades, dependency bumps, dependency CVEs, log rotation, certificate renewal, and incident response. At an internal fully-loaded rate of $80 to $200 per hour, that is $320 to $2,400 per month of human time you are not spending on your product.

Add it up. A serious self-hosted chatbot deployment in 2026, with a local GPU-served model, comes in at roughly $1,500 to $4,000 per month all-in once engineering time is counted honestly. A self-hosted chatbot calling a hosted model API (no local GPU) comes in closer to $300 to $800 per month all-in. Neither is "free." The trade is data control, not money.

If your dominant constraint is data residency and you have the engineering bench for it, that price is reasonable. If your dominant constraint is "we want cheap chat for our website," self-hosting is the wrong tool.

Evaluation criteria

We scored every vendor on seven dimensions that actually matter for self-hosted deployments.

License. MIT, Apache 2.0, AGPLv3, BSL, or proprietary. License governs what you can do with the code in commercial contexts, whether you can fork freely, and whether you have to publish your modifications.

Deployment shape. Docker Compose, Helm chart, single-binary, or hand-rolled. The simpler the deployment shape, the faster you go from clone to first conversation.

RAG built in. Does the platform ship with a usable retrieval layer (chunker, embedder, vector store, retrieval prompt), or do you need to glue one together yourself?

Model flexibility. Local models via Ollama or vLLM, hosted models via OpenAI compatible endpoints, or a mix.

Multi-channel and live agent. Website widget only, or also email, WhatsApp, Slack, Discord, and live-agent handoff.

Operational burden. Realistic hours per month after the initial setup. Updates, scaling, log review, security patches.

Community and maintenance signal. GitHub stars are a weak proxy but the trajectory of releases, issue close times, and contributor count tells you whether the project will still be alive in three years.

We are explicitly NOT scoring on benchmark numbers from the vendor's own marketing site, social-media follower counts, or paid-listing rankings.

#1 Botpress: the SaaS-like UX with an open core

Botpress is the most mature project in this category. The current main repository at github.com/botpress/botpress is MIT-licensed and presents as a hosted-feeling experience even when self-hosted: a visual flow builder, a prompt management UI, an integrations marketplace, and a deploy button. The legacy Botpress v12 is dual-licensed under AGPLv3 and a Botpress proprietary license for on-premise enterprise customers.

Pros. Genuinely polished UX, which is rare in self-hosted chatbot tooling. Native LLM agent abstractions with OpenAI and Anthropic out of the box. A large integrations library (WhatsApp, Telegram, Slack, Messenger). Cloud version exists if your team wants to start managed and move to self-hosted later without rewriting the bot.

Cons. The "open core" model means some advanced features are gated to the cloud or paid self-hosted tier. The full self-hosted developer experience is improving but still trails the cloud. Heavier resource footprint than the smaller projects in this list.

Best for. Teams who want a SaaS-feeling builder UI but need self-hosted deployment for regulatory or sovereignty reasons, and who can budget 8 to 16 hours per month for ops.

Chatwoot is the open-source customer-support platform that combines a multi-channel inbox (website, email, WhatsApp, Instagram, Facebook, SMS) with live agent workflows and AI assistance via its "Captain" agent. The codebase is MIT-licensed at github.com/chatwoot/chatwoot. The hosted version starts around $19 per agent per month; the self-hosted version is free under MIT and you pay only for infrastructure.

Pros. Best-in-class agent inbox UX in the self-hosted category. Mature multi-channel routing. MIT license gives full commercial freedom including private forks. Production-proven by thousands of organisations.

Cons. The native AI tier (Captain) is comparatively basic; for deep retrieval-augmented chat you typically integrate Chatwoot with a separate framework like Rasa or a custom RAG service. Ruby on Rails stack which not every team wants to maintain.

Best for. Teams whose dominant need is a live-agent inbox first and AI second, and who already have or can build the retrieval layer separately.

#3 Rasa: NLU-first with the CALM engine for enterprise

Rasa remains the canonical open-source conversational AI framework. As of 2025 to 2026, classic Rasa Open Source is in maintenance mode while active development has shifted to Rasa Pro and the CALM (Conversational AI with Language Models) engine, which uses LLMs for dialogue understanding with developer-defined business-logic flows. Rasa supports air-gapped on-premise deployment, which makes it the default choice for defence, banking, and any regulator who will not accept any outbound network call.

Pros. Strongest position on fully air-gapped deployment. Patented dialogue management with autonomous reasoning plus guided workflows. Enterprise heritage with proven production deployments at large banks and telcos. 20,000-plus GitHub stars and a long-running contributor community.

Cons. Steepest learning curve in this list. Classic Rasa OSS being in maintenance mode means new feature work tracks Pro. Best results require ML engineering bench that most SMBs do not have.

Best for. Regulated enterprises that need full control of dialogue policy and air-gapped deployment, with an internal ML team that can build and tune the agent.

#4 Open WebUI: the ChatGPT-style UI for self-hosted LLMs

Open WebUI is the most popular self-hosted ChatGPT clone, with roughly 136,000 GitHub stars as of May 2026. It is primarily a user-interface project that sits on top of a model server (typically Ollama for local models, or any OpenAI-compatible API). It has the most mature multi-user system in the category with role-based access control, admin and user roles, and SSO via OIDC connecting to Google, GitHub, Okta, and other providers.

Pros. Polished UX that feels familiar to anyone who has used ChatGPT. Mature multi-user and SSO. Strong plugin ecosystem and active Discord community. Easy Docker deployment.

Cons. It is a chat UI, not a customer-facing chatbot platform. There is no website-embed widget for end users, no visitor identity flow, no live-agent inbox. RAG is supported but is feature-led rather than retrieval-quality-led.

Best for. Internal knowledge bases for employees, where the use case is "give my team a private ChatGPT pointed at our internal documents." Not the right tool for a customer-facing chat widget on your marketing site.

#5 AnythingLLM: the RAG-focused self-hosted workspace

AnythingLLM is a full-stack open-source AI workspace that combines chat, document RAG, and lightweight agent capabilities. Around 54,000 GitHub stars as of mid 2026. It is RAG-first rather than chat-first: you upload documents into a workspace, the platform chunks and embeds them, and the chat surface is built around answering from those documents.

Pros. Genuinely simple deployment. Native support for many LLM providers including local Ollama, OpenAI, Anthropic, and OpenAI-compatible endpoints. Strong document-ingestion pipeline that handles PDFs, Office files, and web URLs out of the box. Good fit for small teams that want a private knowledge base without writing code.

Cons. Like Open WebUI, primarily an internal-team tool rather than a customer-facing chat platform. Single-tenant by default; multi-tenant requires more setup. Smaller community than Open WebUI or Chatwoot.

Best for. Small to mid-sized teams that want a private RAG workspace for internal documentation, with minimal ops time investment.

#6 LibreChat: the multi-provider ChatGPT alternative with MCP

LibreChat is the enterprise-leaning open-source ChatGPT alternative. It offers multi-provider support (OpenAI, Anthropic, Google, Azure, local models), conversation forking, advanced search, and was among the first platforms to fully implement Model Context Protocol (MCP) server support. Its RAG runs as a separate service using Meilisearch for full-text search alongside vector embeddings, giving hybrid retrieval out of the box.

Pros. Mature MCP support is genuinely differentiating in 2026. Hybrid retrieval (keyword plus vector) is a real quality win over vector-only RAG. Pixel-perfect ChatGPT-style UI familiar to non-technical users. Docker deployment.

Cons. Like Open WebUI and AnythingLLM, this is an internal chat UI rather than a customer-facing widget platform. Setup involves coordinating the main app, a Mongo or Postgres instance, and the RAG API; not as one-click as the smaller projects.

Best for. Engineering and research teams who want a self-hosted chat workbench with MCP tool use, multi-provider routing, and high-quality retrieval, and who can spend a half-day on initial setup.

When NOT to self-host (and where ChatRaj fits)

Self-hosting is the right answer when you have a hard data-residency, sovereignty, or air-gap requirement that cannot be satisfied by a hosted vendor's DPA plus regional residency. It is the right answer when your scale (millions of messages per month) makes per-message hosted pricing unfavourable compared to a fixed GPU bill. It is the right answer when your dominant constraint is "we need to fine-tune a proprietary model on our private data and serve it in chat."

Self-hosting is the wrong answer when your dominant constraint is "we want a chatbot on our marketing site this quarter." It is the wrong answer when your engineering team does not have any spare capacity to absorb 4 to 12 ops-hours per month. It is the wrong answer when your traffic is bursty (viral spikes break a fixed-capacity GPU setup that a managed vendor would auto-scale through).

If you read this far and the honest answer is "self-hosting is too much," ChatRaj is our managed alternative. It is hosted SaaS, not self-hosted. Pricing is flat: Free at 100 messages per month, Pro at $29 per month for 10,000 messages, Growth at $99 per month for 50,000 messages. EU data residency is available on paid tiers on request as part of the DPA. Hybrid retrieval (BM25 plus semantic via Reciprocal Rank Fusion) ships by default, so retrieval quality is closer to LibreChat-with-Meilisearch than to a vector-only setup. You get no GPU rental, no Docker babysitting, no certificate renewal, no late-night Postgres incident.

That is the trade. If sovereignty is the constraint, pick from the six above. If shipping this quarter is the constraint, the managed path exists.

Decision tree

Pick by your dominant constraint.

  • You need air-gapped on-premise with full dialogue-policy control and you have an ML team. Rasa with the CALM engine.
  • You want a SaaS-feeling builder UI but self-hosted for compliance, and you can budget 8 to 16 ops-hours per month. Botpress.
  • Your dominant need is a multi-channel agent inbox first and AI second, you already run Rails or are happy to. Chatwoot.
  • You want a private ChatGPT for your internal team, not a customer chat widget. Open WebUI if you want the most mature multi-user UX; LibreChat if you want hybrid retrieval and MCP; AnythingLLM if you want the simplest RAG workspace.
  • You read all the above and the ops cost feels heavy. ChatRaj managed SaaS at $29 to $99 per month. EU residency on request, no GPU bill, no infrastructure to babysit.

No single project wins every box. The honest framing for self-hosted in 2026 is that you are choosing a maintenance commitment as much as a software product.

What we deliberately did not score

We did not score on raw benchmark numbers from project websites. Open-source projects benchmark themselves; those numbers do not reproduce reliably on a buyer's own catalog with a buyer's own model choice.

We did not score on "stars per month" or other growth metrics. Open WebUI's star count is high because it is a polished internal-chat clone, not because it is the right tool for customer chat. Star counts measure popularity, not fit.

We did not score on enterprise logos. Every project in this list has at least one large logo somewhere. None of them tell you whether the project will fit your specific compliance and ops profile.

We did not score on the existence of a paid cloud version. Botpress, Chatwoot, and Rasa all offer hosted paid tiers; that signals revenue runway for continued OSS development but does not change the self-hosted user experience.

The right way to evaluate is to spin up a Docker Compose stack of your top two candidates on a $40 per month VM, point each at the same set of internal documents or website pages, and run the same 20 to 30 real questions through each. The project that answers your questions correctly with reasonable latency on hardware you control is the right pick for you. Marketing material from any of the six will tell you the project is the best at everything; the bake-off settles which one actually fits your team.

Install guide

How to pick a self-hosted stack in 5 steps

5 steps. Most operators finish in 60 seconds.

  1. Write down your real data-residency requirement

    Before reading any GitHub README, write down the specific regulation driving the decision: GDPR with EU residency, HIPAA, GLBA, PCI-DSS, the EU AI Act for high-risk systems, or an internal sovereignty policy. Each of these has different infrastructure implications. GDPR can often be satisfied by a hosted vendor with EU residency and a DPA. HIPAA-strict and air-gapped banking deployments cannot. The honest answer here determines whether you need the full self-hosted stack or whether a managed alternative will pass procurement.

  2. Choose the deployment shape: air-gapped, VPC, or BYOK

    Pick one of three shapes. Fully air-gapped on-premise means model weights and conversation data never leave your network; Rasa is the default. Self-hosted in your VPC with a local or regional LLM means you own the app plane but accept regional cloud trust; any of the six work. Self-hosted app with hosted-model API means lighter ops but the LLM call leaves your perimeter; acceptable for GDPR, usually not for HIPAA-strict deployments.

  3. Decide on local model versus hosted model API

    Serving a local model (Llama 3, Mistral, Qwen) via Ollama or vLLM gives you full data control and predictable cost at scale, but adds GPU rental of roughly $1,100 to $2,200 per month per GPU, plus engineering time for model upgrades. Calling a hosted model API (OpenAI, Anthropic, Bedrock, Vertex) keeps inference fast and removes GPU ops, at the cost of sending prompts outside your perimeter. Decide before you pick the platform; the choice rules out some options.

  4. Pick your top 2 projects and Docker-deploy both

    Spin up a $40 to $80 per month Hetzner or OVH VM in your target region. Docker-compose up your top two candidates. Point each at the same set of internal documents or website pages. Send the same 20 to 30 representative questions through each. Grade the answers blind. The project that answers your real questions correctly is the right pick, regardless of marketing or star count.

  5. Stress-test the ops profile before you commit

    Once the bake-off picks a winner, run it for two weeks under realistic load. Time how long a model upgrade takes, how long a security patch takes, how long a Postgres backup-and-restore drill takes. Multiply the observed hours by your fully-loaded engineering rate. If the total monthly ops cost exceeds the managed alternative's price tag by less than your data-residency requirement is worth, self-host. If it does not, switch to managed.

ChatRaj on self-hosted chatbots

All 6 self-hosted platforms scored on the ops dimensions

License, infrastructure burden, GPU requirements, and what each one actually costs to run.

The plugin approach

Other self-hosted chatbots chatbot tools

Typical when you install a WordPress plugin, Shopify app, or third-party chatbot widget.

  • Open-source license: Botpress: MIT (main) / AGPLv3 plus proprietary (v12). Chatwoot: MIT. Rasa OSS: Apache 2.0 (maintenance mode). Open WebUI: BSD-3-clause-ish (verify). AnythingLLM: MIT. LibreChat: MIT.
  • RAG built in by default: Botpress: Yes. AnythingLLM: Yes, RAG-first. LibreChat: Yes, hybrid via Meilisearch. Open WebUI: Yes, basic. Chatwoot: Limited via Captain. Rasa: Bring your own.
  • Multi-channel (WhatsApp, email, Slack): Chatwoot: Best-in-class, all channels. Botpress: Strong, marketplace. Rasa: Via connectors. Open WebUI / AnythingLLM / LibreChat: Internal chat only, no channel layer.
  • GPU required for local model serving: All six can run with a local LLM; expect an H100 ($1.50/hr Hetzner) or A100 ($3/hr AWS). Hosted model API path removes the GPU but moves data out of your perimeter.
  • Realistic ops burden per month: Botpress / Rasa: 8 to 16 hours. Chatwoot: 4 to 10 hours. Open WebUI / AnythingLLM / LibreChat: 2 to 6 hours for an internal-only deployment.
  • Realistic minimum monthly infra cost (excluding engineering time): Hosted-model path: $300 to $800/mo (app VM, Postgres, monitoring). Local-model path: $1,500 to $3,500/mo (add GPU). Air-gapped on-prem: hardware capex plus internal headcount.
  • Live-agent inbox bundled: Chatwoot: Yes, core feature. Botpress: Via integration. Rasa: Bring your own. Open WebUI / AnythingLLM / LibreChat: No.
  • Multi-language support: Chatwoot, Botpress, Rasa: Strong. Open WebUI, AnythingLLM, LibreChat: Inherits from the underlying LLM (so very strong with GPT-4-class or Claude).
  • Air-gapped on-prem support: Rasa: Yes, canonical choice. Botpress: Yes, paid self-hosted tier. Chatwoot: Yes via MIT codebase. Open WebUI / AnythingLLM / LibreChat: Yes if paired with a local model.
  • Time-to-first-conversation for a developer: AnythingLLM and Open WebUI: under 30 minutes via Docker. LibreChat: 1 to 2 hours. Botpress / Chatwoot: half a day to a day. Rasa: 1 to 3 days for a non-trivial bot.
The ChatRaj approach

One script tag. Everything bundled.

Hosted, configured, and maintained by us. You add a single line to your site.

  • Open-source license: ChatRaj is proprietary hosted SaaS, not self-hosted.
  • RAG built in by default: Yes. Hybrid BM25 plus semantic via Reciprocal Rank Fusion.
  • Multi-channel (WhatsApp, email, Slack): Website widget today. WhatsApp and Slack on 2026 roadmap.
  • GPU required for local model serving: No GPU needed (managed inference handled by ChatRaj).
  • Realistic ops burden per month: Zero (vendor runs the stack).
  • Realistic minimum monthly infra cost (excluding engineering time): $0 infra cost. Pro $29/mo or Growth $99/mo flat.
  • Live-agent inbox bundled: No live agent today; lead capture instead.
  • Multi-language support: Yes, 100-plus languages auto-detect.
  • Air-gapped on-prem support: Not supported (hosted SaaS only).
  • Time-to-first-conversation for a developer: Under 5 minutes (script tag in your site).
FAQ: self-hosted chatbots

Common self-hosted chatbot questions

Three honest reasons. First, data residency: if regulation (banking, EU AI Act for high-risk systems, HIPAA-strict deployments, defence) requires conversation data to never leave a specific jurisdiction or your own perimeter, self-hosting may be the only path. Second, model control: if you need to fine-tune a proprietary model on private data and serve it in chat, self-hosted is the natural shape. Third, predictable cost at very high scale: tens of millions of messages per month on a fixed-capacity GPU can beat per-message hosted billing. If none of those apply, hosted SaaS will ship faster and cost less in total.

Was this helpful?

Ship your first chatbot in 60 seconds.

Sign in with Google and you'll be answering visitor questions before your coffee gets cold.

60-second setup · One-line install · Works on any site

Works on any website
SShopify
WWebflow
WPWordPress
SqSquarespace
FFramer
</>Plain HTML