Overview
This section explains the fundamental concepts of the Codeset platform. Understanding these concepts is essential for effectively using our tools and services.
Samples
A Sample represents a specific problem or task that an AI agent can work on. It's a self-contained unit that includes:
A codebase: A snapshot of a git repository containing the code the agent will interact with.
A task description: A clear set of instructions for the agent, outlining the goal it needs to achieve (e.g., "fix the bug that causes the authentication to fail" or "implement a new feature to export user data").
A verification script: A set of tests or checks that can be run to automatically determine if the agent has successfully completed the task.
Each sample has a unique sample_id that you use to create a new session.
Sessions
A Session is a live, interactive, and sandboxed environment created from a specific sample. When you create a session, we spin up a secure container with the sample's codebase and all required dependencies, ready for an agent to connect and start working.
Key characteristics of a session:
Isolated: Each session is completely isolated from others, ensuring that an agent's actions in one session do not affect any other.
Stateful: A session maintains the state of the filesystem. Any changes an agent makes—creating, modifying, or deleting files—are preserved for the duration of the session.
Interactive: Agents can execute shell commands within the session's environment, allowing them to explore the codebase, run tests, and apply changes.
Executable: A session contains all required dependencies, including third-party libraries, build tools, SDKs, and runnable scripts (e.g., for executing tests).
You interact with a session using its unique session_id.
Verifications
A Verification is the process of running the sample's verification script to check if the agent has successfully completed the task. You can trigger a verification at any point during a session.
The verification process will return a result indicating success or failure. This provides a clear and objective measure of the agent's performance on the given task. This is the primary mechanism for evaluating an agent's capabilities.
Last updated