Running Experiments
We use gin-config
to configure the experiments. You don't need to be an expert to use it. The basic syntax is
python <code_file.py> --gin_file <gin_file1> --gin_file <gin_file2> '--gin.PARAM1=value1' '--gin.PARAM2=value2'
The --gin_file
is used to load and compose the default configuration. The --gin.PARAM1=value1
is used to overwrite the default configuration. The later configuration will always overwrite the previous one.
Here is an example of running an experiment:
python examples/experiment_eval.py --gin_file sotopia_conf/generation_utils_conf/generate.gin --gin_file sotopia_conf/server_conf/server.gin --gin_file sotopia_conf/run_async_server_in_batch.gin '--gin.ENV_IDS=["01H7VFHPDZVVCDZR3AARA547CY"]' '--gin.AGENT1_MODEL="gpt-4"' '--gin.BATCH_SIZE=20' '--gin.PUSH_TO_DB=False' '--gin.TAG="test"'
For the complete set of parameters, please check the sotopia_conf
folder.
To run a large batch of environments, you can change the ENV_IDS
parameter in sotopia_conf/run_async_server_in_batch.gin
to a list of environment ids. When gin.ENV_IDS==[]
, all environments on the DB will be used.
Getting access to your simulation
After running experiments, you can go to the examples/redis_stats.ipynb
notebook to check the existing episodes (Episode Log section), as well as calculate the performance.
For the original Sotopia simulation in our paper's experiments, you can find how to get them in the Q&A section in the ./docs
folder.
Hyperparameters that are used in the simulation
Tags
TAG
: The tag of the simulation. This tag is used to identify the simulation in the database.TAG_TO_CHECK_EXISTING_EPISODES
: Scripts likeexamples/experiment_eval.py
checks if there are existing episodes with the same tag in the database. If there are, the simulation will not be run. This is to avoid running the same simulation twice. If you want to run the simulation again, you can change the tag or setTAG_TO_CHECK_EXISTING_EPISODES
toNone
.