Skip to content
Go back

LiveKit - Powering Real-Time Audio, Video, and Data

LiveKit is an open-source platform built to simplify and scale real-time audio, video, and data applications. It provides a robust WebRTC infrastructure with built-in features like SFU (Selective Forwarding Unit), NAT traversal, media routing, and adaptive bandwidth handling so developers don’t have to build all the plumbing themselves. LiveKit also includes an Agents system: programmable server-side logic that can listen to events, transcribe audio, moderate content, or act as virtual participants, enabling advanced use cases like AI moderation or automated streaming workflows.

LiveKIt

Figure 1: LiveKit

LiveKit is an open source project that provides scalable, multi-user conferencing based on WebRTC.

LiveKit core features are:

  1. WebRTC Infrastructure

LiveKit provides a robust and production-ready WebRTC infrastructure that handles low-latency audio and video communication at scale. It takes care of media routing, bandwidth adaptation, NAT traversal, and SFU (Selective Forwarding Unit) logic, allowing developers to focus on building engaging real-time applications without worrying about the underlying complexity.

  1. Agents System

The Agents system in LiveKit enables programmable, server-side automation and intelligence within a room. Agents can listen to events, interact with participants, transcribe audio, analyze media streams, or even serve as virtual participants, opening the door to advanced use cases like AI moderation, bot-driven interactions, or automated live streaming.


LiveKit has both a client and a server. They work together to enable real-time audio/video/data communication.

The LiveKit Server is the backend component that manages rooms, participants, media routing, and signaling. It includes features like:

The Client SDKs are used by our app (web, mobile, or desktop) to connect to the LiveKit Server. They handle:

Workflow Example

  1. The client authenticates (via a token from our backend) and joins a room.
  2. The server manages room state, participants, and routes media streams via SFU.
  3. Both communicate using WebRTC under the hood for real-time performance.

Participants, Tracks, Rooms, Egress and Ingress

  1. Participants:

End-users or processes that connect to a room. Participants can:

  1. Tracks:

Media streams shared by participants. Types include:

  1. Rooms:

Logical spaces that group participants together. Rooms:

  1. Egress / Ingress

Note: RTMP (Real-Time Messaging Protocol) is a communication protocol used to stream audio, video, and data over the internet, originally developed by Adobe for Flash. It uses a persistent TCP connection to deliver low-latency streams.

LiveKit is open-source, which means we can self-host our own LiveKit server. This is a great option if we want to have full control over our infrastructure, or if we want to customize the LiveKit server to our specific needs.

There is also LiveKit Cloud service, which is a hosted version of LiveKit that is managed by the LiveKit team. This makes it easy for us to get up and running quickly and is free for small applications.

LiveKit Flow

import logging

from typing import Any
from livekit import agents
from livekit.agents import Agent, AgentSession, function_tool, JobContext, RunContext, ChatContext, RoomInputOptions
from livekit.plugins.aws.experimental.realtime import RealtimeModel

from dotenv import load_dotenv

load_dotenv()

logger = logging.getLogger(__name__)

@function_tool()
async def lookup_weather(
    context: RunContext,
    location: str,
) -> dict[str, Any]:
    """Look up weather information for a given location.
    
    Args:
        location: The location to look up weather information for.
    """
    return {"weather": f"{location} king", "temperature_f": 70}

class Assistant(Agent):
    def __init__(self, chat_ctx: ChatContext | None = None):
        super().__init__(
            instructions = "You are helpful, creative, and friendly Assistant.",
            chat_ctx = chat_ctx,
            tools = [lookup_weather]
        )

async def entrypoint(ctx: JobContext):
    await ctx.connect()

    # Chat History
    initial_ctx = ChatContext()
    initial_ctx.add_message(role="user", content="My name is Birat.")

    session = AgentSession(llm=RealtimeModel(
        voice="tiffany",
        temperature=0.7,
        top_p=0.9,
        max_tokens=2048,
        region="us-east-1",
    ))

    agent = Assistant(initial_ctx)

    session = await session.start(agent, room=ctx.room,
                                  room_input_options=RoomInputOptions())
    
if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))agent.py
  1. When the agent program is run, it registers itself as a worker with the associated LiveKit server.
  2. When a room is started for our application, the LiveKit sends a job request to the worker, causing the worker to initiate a job.
  3. The job is initiated by an entrypoint function.
  4. When our program (worker) receives a job request, it connects to the room, automatically subscribing to all audio tracks.
  5. It then creates an AgentSession, which orchestrates all of the input/output, components, and orchestration required to create an AI agent.
  6. We start the session, passing in an instance of our agent, specifying the room to which the agent session is assigned/bound, and defining what streams are sent to the room by the agent.
  7. Finally, we define the main loop of our agent, which uses the run_app method to run our program and register it as a worker with the LiveKit server.

LiveKit transforms the complexity of real-time conferencing, covering audio, video, and data into a developer-friendly, scalable foundation built on modern protocols like WebRTC, STUN/TURN, and SFU architecture.

By defining clear abstractions: Participants, Tracks, Rooms, along with features like RTMP Ingress/Egress and Data Channels. LiveKit offers a comprehensive toolkit for building everything from virtual classrooms and AI-powered bots to high-stakes live-streaming platforms.

What makes LiveKit stand out is its flexibility:

In essence, with LiveKit we don’t just spin up video/audio calls we gain the freedom to architect interactive, intelligent, and event-driven experiences that scale to hundreds or thousands of users.



Previous Blog
Bridging IVR with Conversational Voice AI for improved interactions
Next Blog
Inside Amazon Nova Sonic - The Event-Driven API Behind Real-Time Voice AI