Answering is Easy—When You Have the Data

Summary

In this article I show how mainstream AI assistants connected to your apps stumble when asked simple questions about the data in them. This is because they only see fragments of your data, leading to slow, error-prone answers and even outright hallucinations.

I compare this to Superego, the open-source personal database I develop, which takes a different path by syncing the full dataset locally and letting the model write and run code over it, an approach that yields much better, much faster results.

Self-Promotion Disclosure

I wrote this article to promote myself as a consultant and the products I build. Of course, I also think the topic is genuinely interesting and valuable from a technical perspective, but my primary goal is self-promotion.

Try this experiment: go to ChatGPT, enable the Google Calendar connector, and ask this relatively simple question:

In terms of the number of events, what was my busiest week last year?

After a minute or two (depending on how many events you have in your calendar), it will come back with an answer that looks something like this:

ChatGPT
Thought for 1m 17s. Your busiest week last year was Mon, Aug 12 → Sun, Aug 18, 2024 (ISO week 33), with 14 events.

Correct! But definitely slow.

Under the hood, ChatGPT uses the connector to fetch—page by page—last year’s events from your calendar. It then writes them into a Python script that calculates the busiest week. Finally, it executes the script and returns the answer.

This is the best-case scenario, but it’s easy for the model to trip somewhere along the way:

  • It doesn’t fetch all pages from the calendar API.
  • It hallucinates events when writing them into the Python script.
  • It decides it doesn’t need a Python script at all, tries to come up with the answer from the data in its context, and hallucinates one.

Try Claude and you’ll see it follow a very similar playbook. Try Gemini and… well, it can’t even paginate:

Gemini
Important Note on Data Limitation: The calendar tool is still limited to returning only the first 60 events it finds. Since there is no mechanism to retrieve more events (pagination), this result is based only on the subset of events provided by the API. If you have significantly more than 60 events in 2024, your actual busiest week may be different.

Now try with Superego, the open-source personal database I’m building:

Superego
Your busiest week in 2024 was the week starting 2024‑08‑12, with 14 events. Below, chart with title: Weekly Event Counts (2024)

Spot on, and in less than four seconds!

If you retry, you’ll sometimes get an uglier chart (here I picked the best-looking one, of course), and sometimes you’ll get no chart at all. The answer, however, is nearly always correct and fast. What’s going on?

A Different Approach

It’s not a smarter model (openai/gpt-oss-120b), an ad-hoc prompt, or some calendar-specific optimization. It’s simply having local, direct access to all the data in the calendar.

Connectors in Superego sync external data sources (e.g., Google Calendar) to database collections stored on your device. When you ask a question, Superego:

  1. Asks the LLM to “write a TypeScript function that takes in all the documents of the collection and outputs the answer”.
  2. Reads all documents from the database (SQLite).
  3. Executes the function in a secure sandbox (QuickJS).
  4. Passes the result back to the LLM, which writes out the final answer.
Expand: The TypeScript function generated by the LLM to answer the question in the example.
// Import the Event type from the collection schema
import type * as Collection_CUit8tEyBaJVkdsGGfFGL from "./Collection_CUit8tEyBaJVkdsGGfFGL.ts";

/**
 * Returns an array of weekly event counts for the year 2024.
 * Each entry contains the week start date (ISO PlainDate) and the count.
 */
export default function main(
  documents: {
    id: string;
    versionId: string;
    content: Collection_CUit8tEyBaJVkdsGGfFGL.Event;
  }[],
): { weekStart: string; count: number }[] {
  // Map from week start ISO date to count
  const weekCounts: Record<string, number> = {};
  for (const doc of documents) {
    const startInstant = globalThis.LocalInstant.fromInstant(doc.content.start);
    // Filter events that start in 2024
    const year = startInstant.toJSDate().getFullYear();
    if (year !== 2024) continue;
    // Get Monday of the week (ISO week start)
    const weekStartInstant = startInstant.startOf("week");
    const weekStart = weekStartInstant.toPlainDate(); // e.g., "2024-03-04"
    weekCounts[weekStart] = (weekCounts[weekStart] ?? 0) + 1;
  }
  // Convert to array
  const result: { weekStart: string; count: number }[] = [];
  for (const [weekStart, count] of Object.entries(weekCounts)) {
    result.push({ weekStart, count });
  }
  // Sort by weekStart ascending
  result.sort((a, b) => a.weekStart.localeCompare(b.weekStart));
  return result;
}

The approach is basically a variant of text-to-SQL, so there’s nothing really innovative there, but you do need the data for it to work. That’s what makes Superego incredible at answering complex questions about your life and, conversely, highlights what ChatGPT and friends are sorely missing.

For the time being, at least. Sam Altman envisions a future where you put your entire life into ChatGPT. Rather not? Try Superego! It’s open-source, encrypted, and completely local. Oh, and of course it exists today. :)