Skip to main content

How to build a FREE realtime voice assistant with Gemini

· 8 min read
Gareth Andrew
Chief Auto Coder

Google's new Gemini 2.0 Flash (experimental) model features an api for building a realtime bidirectional voice assistant. Since experimental models are free (and Google will not use your data for training if you use a paid account), this is an extremely slept on opportunity to build a realtime voice assistant for free.

OpenAI Swarm

· 3 min read
Gareth Andrew
Chief Auto Coder

OpenAI released Swarm:

An educational framework exploring ergonomic, lightweight multi-agent orchestration.

It's implemented in Python, but there's a TypeScript version SwarmJS, and the concepts are simple enough to implement anywhere.

AWS Multi-Agent Orchestrator

· 7 min read
Gareth Andrew
Chief Auto Coder

AWS Labs have released a Multi-Agent Orchestrator Framework which they bill as a

Flexible and powerful framework for managing multiple AI agents and handling complex conversations.

It's implemented in python and typescript so lets take a look and see what it does.

(TL;DR The whole framework can be reduced to a single classification prompt and a bit of supporting machinery).

What is an agent?

· 6 min read
Gareth Andrew
Chief Auto Coder

Simon Willison is fond of asking the question "What is an agent?". There appear to be many possible definitions of what an agent is. Does this make the term useless?

Testing the Todo App with Stagehand

· 4 min read
Gareth Andrew
Chief Auto Coder

Browserbase have just launched a new tool based on Playwright, called Stagehand.

Stagehand is the AI-powered successor to Playwright, offering three simple APIs (act, extract, and observe) that provide the building blocks for natural language driven web automation.

Stagehand offers three API methods:

  • act - Perform an action on the page
  • extract - Extract data from the page
  • observe - Show the possible actions that can be performed on the page

Each Stagehand command takes a natural language string as input and uses a language model (currently Anthropic or OpenAI models) to convert the natural language into a Playwright script and then executes it.

One advantage of using Stagehand is that we don't need any knowledge of how the DOM is structured to interact with the page. This is perfect to test the auto-generated React Todo Apps.

Testing the new Sonnet

· 2 min read
Gareth Andrew
Chief Auto Coder

Anthropic have just released their latest model, confusing still called sonnet-3.5 (or technically claude-3-5-sonnet-20241022), we'll call it new sonnet. This model seems to have had quite a major update to it's coding abilities, so lets see how it compares on our react-todo apps.