ParallelSotopiaEnv Documentation

Description

ParallelSotopiaEnv is a custom environment designed for parallel agents interacting within a defined simulation. This environment integrates various profiles, relationships, and actions that agents can perform, leveraging large language models (like "gpt-3.5-turbo") to model behaviors and interactions.

Dependencies

import asyncio
import copy
import itertools
import random
from typing import Any, Literal, Optional, Type, TypeVar
from gin import configurable
from gymnasium.spaces.dict import Dict
from gymnasium.spaces.discrete import Discrete
from gymnasium.spaces.text import Text
from pettingzoo.utils.env import ParallelEnv
from redis_om.model.model import NotFoundError
from sotopia.agents.llm_agent import Agents
from sotopia.database import EnvironmentProfile
from sotopia.database.persistent_profile import AgentProfile, RelationshipType
from sotopia.messages import ActionType, AgentAction, MessengerMixin, Observation, ScriptBackground, SimpleMessage
from sotopia.renderers import RenderContext, XMLRenderer
from .evaluators import Evaluator, unweighted_aggregate_evaluate

Class

`ParallelSotopiaEnv`

class ParallelSotopiaEnv(ParallelEnv[str, Observation, AgentAction], MessengerMixin):

Parameters

available_action_types (set[ActionType], optional): The action types available to the agents. Defaults to {"none", "speak", "non-verbal communication", "action", "leave"}.
action_order (Literal["simultaneous", "round-robin", "random"], optional): How agents take actions. Defaults to "simultaneous".
model_name (str, optional): Name of the language model. Defaults to "gpt-3.5-turbo".
evaluators (list[Evaluator], optional): List of evaluators for responses. Defaults to [].
terminal_evaluators (list[Evaluator], optional): List of evaluators for terminal states. Defaults to [].
uuid_str (str | None, optional): UUID to load the environment profile from the database.
env_profile (EnvironmentProfile | None, optional): Profile of the environment.
background_class (Optional[Type[TBackground]], optional): Class for background scenarios, defaults to ScriptBackground.

Usage Example

env = ParallelSotopiaEnv(
    available_action_types={"none", "speak", "action"},
    action_order="round-robin",
    model_name="gpt-3.5-turbo"
)

Methods

`reset`

def reset(
    self,
    seed: int | None = None,
    options: dict[str, str] | None = None,
    agents: Agents | None = None,
    omniscient: bool = False,
    lite: bool = False
) -> dict[str, Observation]

Initializes a new episode.

Parameters

seed (int, optional): Seed for random number generator.
options (dict[str, str], optional): Additional options.
agents (Agents, optional): Agent profiles for the environment.
omniscient (bool, optional): If agents know the other agent's goals. Defaults to False.
lite (bool, optional): If True, background information is minimal.

Returns

dict[str, Observation]: Initial observations for each agent.

Usage Example

observations = env.reset(
    seed=42,
    agents=agents,
    omniscient=True
)

`step`

def step(
    self, actions: dict[str, AgentAction] | dict[str, dict[str, int | str]]
) -> tuple[
    dict[str, Observation],
    dict[str, float],
    dict[str, bool],
    dict[str, bool],
    dict[str, dict[Any, Any]]
]

Executes actions and returns new states.

Parameters

actions (dict[str, AgentAction] | dict[str, dict[str, int | str]]): Actions taken by agents.

Returns

tuple[ dict[str, Observation], dict[str, float], dict[str, bool], dict[str, bool], dict[str, dict[Any, Any]] ]: Next state information, including observations, rewards, terminals, truncations, and additional info.

Usage Example

next_obs, rewards, done, truncations, info = env.step(actions)

`astep`

async def astep(
    self, actions: dict[str, AgentAction] | dict[str, dict[str, int | str]]
) -> tuple[
    dict[str, Observation],
    dict[str, float],
    dict[str, bool],
    dict[str, bool],
    dict[str, dict[Any, Any]]
]

Asynchronous version of step.

Parameters

actions (dict[str, AgentAction] | dict[str, dict[str, int | str]]): Actions taken by agents.

Returns

tuple[ dict[str, Observation], dict[str, float], dict[str, bool], dict[str, bool], dict[str, dict[Any, Any]] ]: Next state information, including observations, rewards, terminals, truncations, and additional info.

Usage Example

next_obs, rewards, done, truncations, info = await env.astep(actions)

`render`

def render(self, mode: str = "human") -> None

Render the environment (not implemented).

`close`

def close(self) -> None

Close the environment (not implemented).

Utility Functions

`_actions_to_natural_language`

def _actions_to_natural_language(actions: dict[str, AgentAction]) -> str

Converts agent actions to human-readable language.

`_map_gender_to_adj`

def _map_gender_to_adj(gender: str) -> str

Maps gender to its adjective equivalent.

`get_bio`

def get_bio(
    relationship: RelationshipType, profile: AgentProfile, agent_id: int
) -> str

Generates a description of the agent based on their relationship type.

Parameters

relationship (RelationshipType): Type of relationship.
profile (AgentProfile): Agent profile information.
agent_id (int): Agent identifier.

Returns

str: Rendered biography string for the profile.

`render_text_for_agent`

@configurable
def render_text_for_agent(
    raw_text: str,
    agent_id: int,
    tags_to_render: list[str] = [
        "extra_info",
        "clarification_hint",
        "strategy_hint",
    ],
) -> str

Renders text viewable by a specific agent using XMLRenderer.

Usage Example

Here's a typical usage example starting an episode in the environment:

# Initialize environment
env = ParallelSotopiaEnv(
    available_action_types={"none", "speak", "non-verbal communication", "action"},
    action_order="simultaneous"
)
 
# Reset the environment
observations = env.reset(
    seed=42,
    agents={
        "agent_1": Agent(...),
        "agent_2": Agent(...)
    },
    omniscient=True
)
 
# Perform actions
actions = {
    "agent_1": AgentAction(action_type="speak", argument="Hello!"),
    "agent_2": AgentAction(action_type="action", argument="waved"),
}
 
next_obs, rewards, done, truncations, info = env.step(actions)

Evaluators Generate