Methodology and a note on honesty
This is a composite case study, not a named customer success story. We say this clearly at the top because the difference matters. A named case study tells you that one specific company saw a specific result. A composite case study like this one synthesises feedback across many operators and reports the typical range. Both formats have a role; only one is honest when no single customer has agreed to be the public face of a marketing page.
The analysis below is drawn from feedback across 14 indie SaaS founders who deployed ChatRaj on their public docs sites between January and May 2026. The 14 sites run on a mix of Docusaurus (six sites), VitePress (four sites), Mintlify (three sites), and Astro Starlight (one site). All 14 founders agreed to share aggregated metrics on the condition that no single deployment would be identifiable from the published write-up. We have honoured that condition by reporting every number as a range across the group rather than as a single point estimate.
The persona used as a narrative anchor in this case study, called Marcus in the sections below, is a composite as well. Marcus is not a real founder. He is a stand-in that combines the most common attributes across the 14 operators we drew from: a small team, a single bootstrapped product, a docs site built on a static-site generator, and a top-of-funnel that depends on Google sending evaluators to the right page. If a section reads like a single person's story, that is the narrative shape working as intended; the underlying numbers behind every claim are the aggregated 14-operator ranges.
Two more transparency notes. First, we are the vendor. This is published on chatraj.com, and we have an obvious interest in the analysis looking favourable. We have tried to counter that bias by including a what-did-not-work section that surfaces the real failure modes founders reported, including one mode that nearly caused a deployment to be rolled back. Second, the 14 operators are a self-selected sample. They opted into the feedback round, which means they were already paying customers at the 60-day mark. Founders who churned in the first 30 days are not represented in this composite, and that is a real selection effect worth naming.
The composite persona setup
Marcus is the composite founder this analysis walks through. He runs a SaaS product for database migrations, helping backend teams move data between Postgres versions without writing migration scripts by hand. The product matches the same persona used on the /use-cases/saas-documentation page on this site, deliberately, so the use-case page and this case-study page reinforce each other rather than describing different operators.
Marcus's team is three people total: him as founding engineer, a part-time designer, and a part-time DevRel. The company is bootstrapped and thin-margin profitable. The docs site is the entire top-of-funnel; there is no paid acquisition channel, no sales team, no outbound. Google sends evaluators to docs pages, those evaluators decide whether to start a trial, and the conversion rate from that single funnel is the only metric that matters.
The docs site averaged 800 visitors per week from Google search in the 60 days leading up to the ChatRaj deployment. The trial conversion rate from docs homepage visits to a started trial sat at 8 percent. The docs themselves were built on Docusaurus and hosted on Vercel, with Algolia DocSearch as the search box. Marcus had written every page himself; the docs were comprehensive for what they covered but organised around how he thought about the product rather than around how strangers searched for answers.
Across the 14 founders in this composite, the typical operator profile clustered tightly around Marcus's shape: two to four people on the team, 500 to 1,500 weekly docs visitors, a trial conversion rate between 5 and 12 percent, and a docs site on one of four static-site generators. If your shape is meaningfully different from this, the numbers in the later sections will need adjusting before you treat them as predictive.
The before-state
Before deploying ChatRaj, the typical operator in this composite had three observable problems and one invisible one.
The first observable problem was bounce rate on Google-landed docs sessions. Across the 14 sites, single-page-no-click sessions averaged between 55 and 70 percent of organic docs traffic. Visitors landed, scanned for under a minute, and closed the tab. The second observable problem was that average time on page for a Google-landed visitor sat between 30 and 50 seconds across the group, which is consistent with the visitor not finding the answer they came for. The third observable problem was trial conversion: across the group, 5 to 12 percent of docs homepage visitors started a trial, which felt low to most founders given the intent shape of their incoming traffic.
The invisible problem was the one that mattered most. None of the 14 founders had visibility into what those bouncing visitors had actually been looking for. Plausible, Google Analytics, and Search Console could tell each founder that visitors left; none of those tools could tell them what specific question the visitor came with and failed to find an answer to. Across the group, every founder named this lack of visibility as the most painful gap in their analytics stack. The docs were a black box at exactly the point in the funnel where understanding was most valuable.
What changed in the first 30 days
Deployment shape was consistent across the 14 operators. Most got the bot live on the docs site within a single afternoon. The composite week-one pattern is documented in the seven-step section below; it covers signup, ingesting the sitemap, customising the widget, embedding the script tag, and verifying answers against a hand-written question list.
By the end of week one, the typical bot was answering questions like "what database versions are supported", "how do I rollback a migration", "does this work with logical replication", and "what happens if a migration fails halfway through". The first three categories of question, version compatibility, rollback semantics, and integration with adjacent infrastructure, made up the bulk of the early traffic across all 14 operators. Founders reported that the bot reliably answered roughly two-thirds of these questions correctly on the default settings.
By the end of week four, the Unanswered tab had filled up. Across the group, founders reported between 12 and 28 unique unanswered questions per week, which translated into a steady stream of editorial backlog items. The most common pattern was a question the docs partially answered but with terminology the visitor did not use. A visitor asking "PG 17 support" was failing to find a docs page titled "PostgreSQL 17 compatibility", because semantic retrieval was not catching the abbreviation. Founders fixed these by adding an explicit FAQ entry or an alias section to the relevant docs page, after which the bot started answering correctly.
The shape of the first 30 days, across the composite, was less about a sudden conversion lift and more about a new feedback loop. Founders were now seeing the questions their docs were not answering, and they were closing those gaps one entry at a time.
60-day metric ranges, typical not single-founder
By the 60-day mark, the 14 operators had enough signal to report outcome metrics. Every number in this section is a range across the group. Treat them as the typical band a similar operator might land in, not as a guarantee or a single-founder result.
Docs-question deflection rate, defined as the share of incoming visitor questions the bot answered confidently enough that the visitor did not subsequently navigate to a contact form or close the tab, landed between 25 and 35 percent across the group. The lower end of the range correlated with smaller docs sites where retrieval had less ground truth to anchor to; the higher end correlated with operators who had spent meaningful time in the Customize tab tuning the system prompt and suggested questions.
Evaluator-to-trial conversion lift, measured against each operator's 60-day pre-deployment baseline, landed between 8 and 15 percent. This number has the largest variance in the dataset and should be treated with the most caution; founders attributed the variance partly to their existing baseline (operators starting from a 5 percent conversion rate saw larger relative lifts than operators starting from 11 percent) and partly to the quality of their docs (better-organised docs produced better-grounded bot answers, which produced better evaluator confidence).
Surfaced content gaps via the Unanswered tab, measured per quarter, landed between 20 and 40 per operator. Founders treated this number as the most valuable single output of the deployment. Even operators whose deflection rate sat at the low end of the range reported that the editorial backlog signal alone justified the subscription cost.
Captured emails from visitors who interacted with the bot before bouncing landed between 4 and 9 per week for the typical operator. This is a smaller channel than the trial conversion lift, but it is incremental to it; these are evaluators who would have closed the tab without leaving a trace, and now leave a trace.
Two ratios stayed roughly flat across the group: the bot's per-answer cost (covered by the flat ChatRaj quota and therefore predictable) and the operator's time spent maintaining the deployment, which after the initial setup week averaged less than 30 minutes per week.
What did not work, honest failures
No deployment in the composite ran cleanly from day one. Three failure modes recurred across the group; one nearly caused a rollback.
The first failure mode was technical terminology the bot misunderstood. Several operators had product-specific terms that the bot's retrieval treated as generic vocabulary. One founder's product used the word "tenant" to mean a logically isolated database namespace, but the bot kept answering tenant questions with content about multi-tenant SaaS architecture in general. The fix was to add an explicit FAQ entry defining the term and to extend the system prompt with a short glossary block. After the fix, the bot answered tenant questions correctly. The lesson is that any product with a non-standard usage of a common word will need a glossary nudge.
The second failure mode was cold-start latency that ran higher than expected during the first deployment week. Several founders reported first-question response times in the 4 to 7 second range during the first 48 hours, against a steady-state expectation closer to 1.5 seconds. The cause was a combination of cold serverless functions and the bot's initial retrieval index warming up. After the first 48 hours, response times settled into the expected band. Operators who launched their bot publicly on day one and watched the first wave of visitors hit cold paths were uncomfortable with this, and we have since added clearer documentation that the first 48 hours run warmer than the steady state.
The third failure mode, and the one that nearly caused a rollback, was a sitemap that missed half the actual published pages. One operator's Docusaurus build was generating a sitemap that only included top-level docs sections, leaving most subpages out of the file the bot ingested. The bot therefore could not answer questions about anything below the section level, and the operator's first impression was that retrieval quality was poor. The diagnosis took about two hours; the fix was to re-point the bot at the canonical sitemap rather than the auto-generated truncated one. After the fix, retrieval quality matched the rest of the cohort. The lesson is that verifying the source list against the actual published page count is worth doing on day one.
The build vs buy dimension
Across the 14 operators, 11 had seriously considered building an in-house RAG chatbot on top of OpenAI or Anthropic APIs before deciding to deploy ChatRaj. The typical thought process was: "I am an engineer, the building blocks are well-documented, the marginal cost of inference is the only true cost, so why pay a vendor?" The honest answer is that building can win for some teams and lose for others.
Building wins when the team has dedicated capacity to maintain the bot. The bot is not a one-week project; it is the retrieval pipeline, the streaming response handler, the conversation memory, the analytics surface, the abuse-prevention layer, the lead capture UI, the email delivery for captured leads, the dashboard for editing prompts, the embed widget, the iframe isolation, the rate limiting, the model failover, the multi-language detection, and the ongoing tuning. Founders who have or will hire a full-time engineer to own this stack can build a credible in-house system, often with better integration into their existing infrastructure than any vendor will give them. Compliance requirements that mandate on-prem deployment also push toward building.
Buying wins when the founder's time is better spent on the product itself. The composite operators in this analysis were three-person teams where the founding engineer was the only person who could meaningfully extend the core product. Spending six to ten weeks of that engineer's time on a chatbot, plus ongoing maintenance, was a worse use of time than paying ChatRaj's flat monthly quota. None of the 11 founders who considered building regretted the buy decision at the 60-day mark, but several said the calculation would have flipped if they had a second backend engineer on the team.
The 6-month outlook
By month six, the typical operator in the composite had expanded the deployment in two ways and was considering a third. The first expansion was adding more source URLs to the bot's knowledge base, often including a public changelog and the public blog so the bot could answer questions about recent feature ships. The second expansion was enabling lead capture more aggressively on the bot, with a soft prompt to leave an email after a substantive answer rather than only after a visitor explicitly asked about pricing or a trial.
The third expansion under consideration for several operators was upgrading from the Pro plan to the Growth plan as traffic scaled past the 10,000-message-per-month cap. Two of the 14 founders had made that upgrade by month six; the rest were close enough to the threshold that they expected to upgrade within the following quarter. None of the founders reported feature gaps in the Growth tier that blocked the upgrade decision.