Blog | Auto Code

How to build a FREE realtime voice assistant with Gemini

January 20, 2025 · 8 min read

Chief Auto Coder

Google's new Gemini 2.0 Flash (experimental) model features an api for building a realtime bidirectional voice assistant. Since experimental models are free (and Google will not use your data for training if you use a paid account), this is an extremely slept on opportunity to build a realtime voice assistant for free.

OpenAI Swarm

December 3, 2024 · 3 min read

Gareth Andrew

Chief Auto Coder

OpenAI released Swarm:

An educational framework exploring ergonomic, lightweight multi-agent orchestration.

It's implemented in Python, but there's a TypeScript version SwarmJS, and the concepts are simple enough to implement anywhere.

AWS Multi-Agent Orchestrator

December 2, 2024 · 7 min read

Gareth Andrew

Chief Auto Coder

AWS Labs have released a Multi-Agent Orchestrator Framework which they bill as a

Flexible and powerful framework for managing multiple AI agents and handling complex conversations.

It's implemented in python and typescript so lets take a look and see what it does.

(TL;DR The whole framework can be reduced to a single classification prompt and a bit of supporting machinery).

What is an agent?

December 1, 2024 · 6 min read

Gareth Andrew

Chief Auto Coder

Simon Willison is fond of asking the question "What is an agent?". There appear to be many possible definitions of what an agent is. Does this make the term useless?

Announcing speech-to-text: CLI Audio Transcription Using Gemini

November 17, 2024 · 2 min read

Gareth Andrew

Chief Auto Coder

I've just released @autocode2/speech-to-text, a Node.js library and CLI tool that makes it easy to transcribe speech using Google's Gemini API.

Introducing Ada

November 15, 2024 · 3 min read

Gareth Andrew

Chief Auto Coder

I was recently inspired by this tweet from Erik Bjäreholt, creator of gptme:

Testing the Todo App with Stagehand

November 3, 2024 · 4 min read

Gareth Andrew

Chief Auto Coder

Browserbase have just launched a new tool based on Playwright, called Stagehand.

Stagehand is the AI-powered successor to Playwright, offering three simple APIs (act, extract, and observe) that provide the building blocks for natural language driven web automation.

Stagehand offers three API methods:

act - Perform an action on the page
extract - Extract data from the page
observe - Show the possible actions that can be performed on the page

Each Stagehand command takes a natural language string as input and uses a language model (currently Anthropic or OpenAI models) to convert the natural language into a Playwright script and then executes it.

One advantage of using Stagehand is that we don't need any knowledge of how the DOM is structured to interact with the page. This is perfect to test the auto-generated React Todo Apps.

Automatic Coding Tools

October 29, 2024 · One min read

Gareth Andrew

Chief Auto Coder

For now this is just an unsorted, incomplete list of tools related to AI-driven coding. I'll hopefully come back to this later:

Testing the new Sonnet

October 24, 2024 · 2 min read

Gareth Andrew

Chief Auto Coder

Anthropic have just released their latest model, confusing still called sonnet-3.5 (or technically claude-3-5-sonnet-20241022), we'll call it new sonnet. This model seems to have had quite a major update to it's coding abilities, so lets see how it compares on our react-todo apps.

Understanding LangGraph Types

October 15, 2024 · 5 min read

Gareth Andrew

Chief Auto Coder

I often run into difficulties trying to build generic apis on top of LangGraph. The types are quite complex so I needed a reference.