Kolosal - Blog

This tutorial demonstrates how to build a 100% local microeconomics chatbot using Google's open-source Gemma 3 model and Retrieval-Augmented Generation (RAG). By leveraging Kolosal AI for local inference and BM25 for document retrieval, users can create a private, cost-effective AI assistant that provides context-aware answers based on a local economics knowledge base.

What You'll Build

A microeconomics Q&A chatbot that:

Runs 100% locally on your machine using Gemma 3 with Kolosal AI
Retrieves economics context from your custom documents using BM25
Uses RAG (Retrieval-Augmented Generation) to generate accurate, grounded answers
Is deployed via Streamlit inside a Docker container

Demo

System Architecture

The chatbot pipeline is composed of:

User Query: The user types a question (e.g. "What is price elasticity?")
Query Optimizer: The local LLM rewrites the query into optimized search terms
BM25 Retriever: Finds the top 3 relevant documents from your economics notes
Answer Synthesizer: The LLM generates an answer using the question and retrieved docs
Response: Answer is streamed back to the UI with sources shown

Key Technologies

Gemma 3: Google's lightweight 1B open-source LLM, runs locally via Kolosal AI
Kolosal AI: Local inference engine with OpenAI-compatible API
BM25 Retriever: Classic sparse retriever for fast document lookup
Streamlit: Web UI to chat with the bot
Docker: Isolated deployment environment

Run the Chatbot

To run the chatbot on your machine:

# Clone the repo
                    git clone https://github.com/FarrelRamdhani/Microeconomic-Chatbot.git
                    cd Microeconomic-Chatbot

                    # Build and run the container
                    docker build -t microeconomic-chatbot .
                    docker run -p 8501:8501 microeconomic-chatbot

                    # Visit the app
                    http://localhost:8501

Try It Yourself

Once deployed, try asking the chatbot questions like: