Self-hosted API gateway for Ollama with multi-user support, API key management, rate limiting, and real-time usage tracking.
LLamaPass is open-source software you install on your server to manage access to Ollama
The admin of this server has shared LLamaPass with you. Register with your invite code to get access.
Install LLamaPass with Docker to manage multi-user access to your own Ollama instance.
Install the CLI and start chatting with models instantly
# Install the CLI $ pip install llamapass # Configure your API key $ llamapass config set-key oah_your_key # Chat with any model $ llamapass run gemma3 LlamaPass - gemma3 (type /bye to quit) >>> Hello! Hi there! How can I help you today?
Clone, configure, and run with Docker
# Clone and run with Docker $ git clone https://github.com/edoardoted99/llamapass.git $ cd llamapass $ cp .env.example .env $ docker compose up --build # Create your first admin user $ docker compose exec web python manage.py createsuperuser # Ready at http://localhost:8000
A complete gateway between your users and Ollama
Create, revoke, and manage keys with expiration, per-key model restrictions, and rate limits.
30-day analytics with charts for requests, tokens, latency, errors, and model breakdown per key.
Configurable per-key limits backed by Redis. Live monitoring shows how close each key is to its limit.
Transparent async proxy to Ollama. Full streaming support for chat and generate endpoints.
Test Chat, Generate, and Embeddings endpoints directly from the browser. No curl needed.
Works with OpenAI SDKs out of the box. Just point your base URL and use your LLamaPass API key.
Use any HTTP client or the OpenAI SDK
curl https://llamapass.org/ollama/api/chat \
-H "Authorization: Bearer oah_your_key" \
-d '{
"model": "gemma3:1b",
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": false
}'
from openai import OpenAI
client = OpenAI(
base_url="https://llamapass.org/ollama/v1",
api_key="oah_your_key",
)
response = client.chat.completions.create(
model="gemma3:1b",
messages=[
{"role": "user", "content": "Hello!"}
],
)
print(response.choices[0].message.content)