Agency Swarm with Third Party and Open Source Models

I have recently been answering a bunch of questions from agency swarm users looking to leverage Astra Assistants.

They fall into the following two categories:

I want to run agency swarm with other model providers via API (Anthropic, Google, Groq, etc.)
I want to run agency swarm with open source models on my local machine / infrastructure

Agency what?

Agency swarm is a popular multi agent framework built on top of OpenAI's Assistants API.

Folks have been voicing desire to use it with other model providers and with open source models. They even recommend Astra Assistants in their docs.

I watched some of VRSEN's videos on youtube and some of them remind me of the phrase "the future is here, it's just not evenly distributed".

He is out there talking about automating business processes not only using AI agents but groups of them (which he calls agencies) and yet all his stuff is very grounded in reality.

There are a couple of things about his framework that I deeply agree with:

No hard coded framework prompts
Pydantic / Instructor powered type checking for tool creation
Commitment to OpenAI's Assistant API as the right level of abstraction both for setting up and scaling agents

If you haven't checked out agency-swarm, I recommend having a look at the github repo and some of VRSEN's videos.

If you're here to find out how to set up Agency Swarm to work with Astra Assistants, read on:

Other providers via API

If all you are looking at doing is leveraging agency-swarm with other providers (for example Anthropic), simply set up your .env file with the api keys for the provider and wrap your openai client with the following sample code:

from openai import OpenAI
from astra_assistants import patch
from agency_swarm import Agent, Agency, set_openai_client
from dotenv import load_dotenv

load_dotenv("./.env")
load_dotenv("../../../.env")

client = patch(OpenAI())

set_openai_client(client)

ceo = Agent(name="CEO",
            description="Responsible for client communication, task planning, and management.",
            instructions="Please communicate with users and other agents.",
            model="anthropic/claude-3-haiku-20240307",
            # model="gpt-3.5-turbo",
            files_folder="./examples/python/agency-swarm/files",
            tools=[])

agency = Agency([ceo])

assistant = client.beta.assistants.retrieve(ceo.id)
print(assistant)

completion = agency.get_completion("What's something interesting about language models?")
print(completion)

your .env file may look like this:

#!/bin/bash

# AstraDB -> https://astra.datastax.com/ --> tokens --> administrator user --> generate
export ASTRA_DB_APPLICATION_TOKEN=""

# OpenAI Models - https://platform.openai.com/api-keys --> create new secret key
export OPENAI_API_KEY="fake"

# Anthropic claude models - https://console.anthropic.com/settings/keys
export ANTHROPIC_API_KEY="<insert key here>"

Note on architecture. You do not have to run the Astra Assistants backend yourself, the client library will point you at the hosted astra-assistants API hosted by DataStax. However the code is open source, Apache 2 licensed and you can choose to self host if you so choose.

Local Models

If you're running inference locally or in your own private infrastructure, you will have to run the Astra Assistants backend yourself so as to be able to point to your inference server for completions.

The simplest approach is to use ollama and leverage the docker-compose yamls in the Astra Assistants repo.

There are two versions, with and without GPU support. We'll look at GPU support since it's more performant and slightly more complex. See the docker-compose.yaml below:

version: '3.8'

services:
  ollama:
    image: ollama/ollama
    ports:
      - "11434:11434"
    networks:
      - my_network
    volumes:
      - ~/.ollama:/root/.ollama  #map to local volume to keep models
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [ gpu ]
    environment:
      NVIDIA_VISIBLE_DEVICES: "all"  # or specify the GPU IDs
    runtime: nvidia  # Specify the runtime for NVIDIA GPUs  -

  assistants:
    image: datastax/astra-assistants
    ports:
      - "8080:8000"
    networks:
      - my_network
    depends_on:
      - ollama


networks:
  my_network:
    driver: bridge

Notice the networks section which ensures that your containers can talk to each other. You can ensure this is working properly by exec'ing into the assistants container and running:

curl http://ollama:11434

Note: in this setup you need to point to ollama in your application code using the LLM-PARAM-base-url header as per this example when you wrap the client:

from openai import OpenAI
from astra_assistants import patch
from agency_swarm import Agent, Agency, set_openai_client
from dotenv import load_dotenv

load_dotenv("./.env")
load_dotenv("../../../.env")

# client = patch(OpenAI(default_headers={"LLM-PARAM-base-url": "http://localhost:11434"}))
# if using docker-compose, pass custom header to point to the ollama container instead of localhost
client = patch(OpenAI(default_headers={"LLM-PARAM-base-url": "http://ollama:11434"}))

set_openai_client(client)

ceo = Agent(name="CEO",
            description="Responsible for client communication, task planning, and management.",
            instructions="Please communicate with users and other agents.",
            model="ollama_chat/deepseek-coder-v2", # ensure that the model has been pulled in ollama
            files_folder="./examples/python/agency-swarm/files",
            tools=[])

agency = Agency([ceo])

assistant = client.beta.assistants.retrieve(ceo.id)
print(assistant)

completion = agency.get_completion("What's something interesting about language models?")
print(completion)

If you were running ollama and astra-assistants directly on your host (or with docker using host networking) you would point to localhost:

default_headers={"LLM-PARAM-base-url": "http://localhost:11434"}

UPDATE: In astra-assistants 2.0.13 I added support for `OLLAMA_API_BASE_URL` which replaces the LLM-PARAM-base-url setting. Not only is the env var more convenient but it also allows it to work with complex agencies that leverage both ollama and API provider based models.

Note on LiteLLM

Adding this note because I have been asked this question multiple times. LiteLLM proxy with Astra Assistants will be supported when this PR gets merged.

That said, Astra Assistants uses litellm as a library to route LLM completions so the proxy is not strictly necessary to get agency swarm working with other models.

Note, other features of LiteLLM Proxy like cost tracking, etc. will not be available with this method.