Skip to content

Architecture

This page describes the high-level architecture of the OpenMIA platform.


System Overview

graph TB
    subgraph Client
        A[Web Browser / Mobile App]
    end

    subgraph API Gateway
        B[FastAPI Server]
    end

    subgraph Core Services
        C[Auth Service]
        D[AI Inference Engine]
        E[Memory Manager]
    end

    subgraph Data Layer
        F[(PostgreSQL)]
        G[(Redis Cache)]
        H[(Object Storage)]
    end

    A -->|HTTPS| B
    B --> C
    B --> D
    B --> E
    D --> F
    D --> G
    E --> F
    E --> H

Component Descriptions

API Gateway

The FastAPI-based gateway handles all incoming HTTP requests, performs authentication via JWT tokens, and routes requests to the appropriate internal service.

AI Inference Engine

The core intelligence layer. It loads and serves transformer-based models for:

  • Natural Language Understanding — intent classification, entity extraction
  • Multimodal Processing — vision + language fusion
  • Memory-Augmented Generation — context-aware response generation

Memory Manager

Handles long-term and short-term memory storage:

  • Short-term: Redis-backed session context
  • Long-term: PostgreSQL with vector similarity search

Deployment Topology

graph LR
    subgraph Production
        LB[Load Balancer] --> W1[Worker 1]
        LB --> W2[Worker 2]
        LB --> W3[Worker 3]
        W1 --> DB[(Database)]
        W2 --> DB
        W3 --> DB
    end
Component Scaling Strategy
API Workers Horizontal (Gunicorn)
Inference GPU-bound, vertical
Database Primary + Read Replicas
Cache Redis Cluster

Security

  • All external communication over TLS 1.3
  • API keys rotated every 90 days
  • Rate limiting: 100 req/min per API key
  • Input sanitization on all endpoints

Production Checklist

Before deploying to production, ensure all environment variables are set and secrets are stored in a secure vault (e.g., GitHub Secrets, HashiCorp Vault).