Table of Contents
Simulators: Text-game EnginesEnvironments: Specific Interactive Games/Environments/Benchmarks
World Generation: Automatic World Generation
Agents: Agents/Agent Architectures
Data: Data/Resources
Position Papers
Shared Tasks
Social Agents: Agent-user or agent-agent dialog
Surveys
Other
Legend: Very recently added Added in last 90 days Added in last year
Simulators
Papers that describe simulators, which are like game engines that specific games/environments can be created in.
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian — ACL 2024
TLDR: Introduces the AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of 106 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging day-to-day tasks requiring rich and interactive coding. (ACL 2024 Best Resource Paper Award)
Project Website
agent data environment simulator
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark — Arxiv 2024
TLDR: Presents DiscoveryWorld, a simulator/environment with 24 scientific discovery tasks solvable by human scientists with PhDs, but not current LLMs. Multi-modal simulator like AlfWorld, with simultaneous text (JSON) and/or 2D (top-down RPG) views.
Project Website
environment simulator
Can Language Models Serve as Text-Based World Simulators?
Ruoyao Wang, Graham Todd, Ziang Xiao, Eric Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen — ACL 2024
TLDR: Examines the question of whether we need to build game simulators manually, or can just use LLMs to do the simulation. Proposes a state task/dataset (BYTESIZED-State-Prediction), showing GPT-4 is only ~60% accurate at this task.
Project Website
data simulator worldgeneration
ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games
Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen — EMNLP 2023
TLDR: Presents ByteSized32, a corpus of 32 text games expressed as approximately 1000 lines of Python each. Shows that a GPT-4 model can perform the world-building task using in-context learning, using a single ByteSized32 game as a reference template. Presents automated metrics for evaluating generated games on runnability, specification compliance, winnability, and physical reality alignment.
Project Website
environment simulator worldgeneration
TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Peter Jansen, Marc-Alexandre Cote — EACL 2023
TLDR: TextWorldExpress: An extremely fast text game simulator that is orders of magnitude faster than other simulators (up to 1 million frames per second). Includes impementations of CookingWorld, Coin Collector, Text-World Common Sense, and a number of other benchmarks. pip installable for Python.
Project Website
environment simulator
ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EMNLP 2022
TLDR: Presents ScienceWorld, a high-fidelity simulator with 30 different elementary science tasks (such as boiling water, building electrical circuits, or determining whether a inherited characteristic is dominent or recessive), powered by simplified simulation engines for thermodynamics, electricity, chemistry, forces, and life processes. Shows that a behavior cloning agent based on a question answering model that can answer most elementary science questions correctly gets extremely low performance on those same tasks when expressed as a virtual environment, highlighting the importance of interactivity when evaluating reasoning.
Project Website
agent environment simulator
Interactive Fiction Games: A Colossal Adventure
Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, Xingdi Yuan — AAAI 2020
TLDR: The Jericho simulator provides an OpenAI Gym-like interface for Z-Machine based interactive fiction games (like Zork). Presents a benchmark for training and evaluating on specific inferactive fiction games.
Project Website
environment simulator
TextWorld: A Learning Environment for Text-based Games
Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Ruo Yu Tao, Layla El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler — GC 2018
TLDR: The popular TextWorld simulator, which allows expressing text games as a high-level knowledge base. Easily allows parametric game variations. Compiles to Inform7 (then Z-machine).
Project Website
environment simulator
(Z-Machine) The Z-Machine Standards Document
Graham Nelson — Website 2014
TLDR: A document that infers the standards for Z-Machine, the popular formalism and virtual machine for representing text games developed by Infocom in the 1980s.
simulator
(Inform7) Natural Language, Semantic Analysis, and Interactive Fiction
Graham Nelson — IF Theory Reader 2006
TLDR: A write-up of Inform7, a popular language for creating interactive fiction games in a programming language that resembles natural language. Compiles to Z-machine.
Project Website
simulator
Environments
Papers that describe specific interactive environments, such as a specific game, or benchmark of games.
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian — ACL 2024
TLDR: Introduces the AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of 106 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging day-to-day tasks requiring rich and interactive coding. (ACL 2024 Best Resource Paper Award)
Project Website
agent data environment simulator
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Peter Jansen, Marc-Alexandre Côté, Tushar Khot, Erin Bransom, Bhavana Dalvi Mishra, Bodhisattwa Prasad Majumder, Oyvind Tafjord, Peter Clark — Arxiv 2024
TLDR: Presents DiscoveryWorld, a simulator/environment with 24 scientific discovery tasks solvable by human scientists with PhDs, but not current LLMs. Multi-modal simulator like AlfWorld, with simultaneous text (JSON) and/or 2D (top-down RPG) views.
Project Website
environment simulator
ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games
Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen — EMNLP 2023
TLDR: Presents ByteSized32, a corpus of 32 text games expressed as approximately 1000 lines of Python each. Shows that a GPT-4 model can perform the world-building task using in-context learning, using a single ByteSized32 game as a reference template. Presents automated metrics for evaluating generated games on runnability, specification compliance, winnability, and physical reality alignment.
Project Website
environment simulator worldgeneration
TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Peter Jansen, Marc-Alexandre Cote — EACL 2023
TLDR: TextWorldExpress: An extremely fast text game simulator that is orders of magnitude faster than other simulators (up to 1 million frames per second). Includes impementations of CookingWorld, Coin Collector, Text-World Common Sense, and a number of other benchmarks. pip installable for Python.
Project Website
environment simulator
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks — ICML 2023
TLDR: Introduces the MACHIAVELLI benchmark, a set of 134 choose-your-own-adventure games from choiceofgames.com that have been automatically annotated with a large (2.8M) set of labels from GPT-4 detailing whether chosen actions represent harmful behaviors. Evaluates performance of baseline agents (Random, DRRN, and GPT-3.5/4) on a large set of ethical measures.
Project Website
data environment
ADaPT: As-Needed Decomposition and Planning with Language Models
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot — Arxiv 2023
TLDR: Introduces TextCraft, a text-based crafting environment similar to WordCraft, but for crafting minecraft items. Introduces ADaPT agent, that iteratively decomposes tasks into subtasks and their plans, as needed. ADaPT shows large performance gains on ALFWorld, WebShop, and TextCraft compared to baselines.
Project Website
agent environment
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI
Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour — AIIDE 2023
TLDR: The paper presents “1001 Nights,” a novel interactive game leveraging Generative AI(GPT4 & Stable Diffusion) for co-creative storytelling, where players’ spoken words dynamically transform the game world, advancing the concept of AI-native games in interactive entertainment.
Project Website
agent environment worldgeneration
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai, Isadora White, Charles Burton Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine — Arxiv 2023
TLDR: LMRL-Gym is a multi-turn RL benchmark including several conventional (e.g. Chess, Wordle, Maze), conversational, and text-game tasks. The text game is a navigation game that appears similar to Coin Collector (TextWorld) or Map Reader (TextWorldExpress), and is implemented in TextWorld.
Project Website
environment
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap — Arxiv 2023
TLDR: SOTOPIA is an open-ended dialog task for role playing, similar to LIGHT, with a set of specific characters and scenarios to role play. Evaluates human and GPT-4 baselines on the social interaction task. Provides a set of 7 automatic metrics for evaluating these social interactions, showing moderate agreement with human judgments.
Project Website
data environment social
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer — ICML 2023
TLDR: Introduces BabyAI-Text, a text-only adaptation of the BabyAI environment. Introduces GLAM (Grounding LLMs with online RL) method, an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals.
Project Website
agent environment
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson — Arxiv 2023
TLDR: Introduces Auction Arena, a novel simulation environment for evaluating LLMs within auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. Also designs an LLM agent with a Planning-Bidding-Belief_Update-Replanning mechanism to navigate auction scenarios, which contributes a structured approach to strategic reasoning and adaptability. Demo available online at the project page.
Project Website
agent environment
ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EMNLP 2022
TLDR: Presents ScienceWorld, a high-fidelity simulator with 30 different elementary science tasks (such as boiling water, building electrical circuits, or determining whether a inherited characteristic is dominent or recessive), powered by simplified simulation engines for thermodynamics, electricity, chemistry, forces, and life processes. Shows that a behavior cloning agent based on a question answering model that can answer most elementary science questions correctly gets extremely low performance on those same tasks when expressed as a virtual environment, highlighting the importance of interactivity when evaluating reasoning.
Project Website
agent environment simulator
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht — ICLR 2021
TLDR: Creates a mirror environment for the 3D ALFRED simulator as a TextWorld. Shows that training on the 7 ALFRED tasks in a TextWorld transfers to the 3D environment.
environment
Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines
Murugesan, K., Atzeni, M., Kapanipathi, P., Shukla, P., Kumaravel, S., Tesauro, G., Talamadupula, K., Sachan, M., & Campbell, M. — AAAI 2021
TLDR: Implements a TextWorld environment for household cleaning tasks. (a) Includes 8 rooms, 56 containers, 190 objects. (b) Task is to pick up objects from the floor and put them in their correct (cannonical common-sense) locations. (c) They train two models, RL agent, and RL agent + ConceptNet, and show the one with more knowledge is more efficient.
Project Website
environment
Process-Level Representation of Scientific Protocols with Interactive Annotation
Ronen Tamari, Fan Bai, Alan Ritter, Gabriel Stanovsky — EACL 2021
TLDR: Environment for simulating text-based biology (wet-lab) experiment protocols
Project Website
environment
Interactive Fiction Games: A Colossal Adventure
Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, Xingdi Yuan — AAAI 2020
TLDR: The Jericho simulator provides an OpenAI Gym-like interface for Z-Machine based interactive fiction games (like Zork). Presents a benchmark for training and evaluating on specific inferactive fiction games.
Project Website
environment simulator
WordCraft: An Environment for Benchmarking Commonsense Agents
Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip HS Torr, Shimon Whiteson, Tim Rocktäschel — LRLW 2020
TLDR: A simple environment to evaluate commonsense reasoning, based on the game Alchemy 2. This game requires the agent to identify concepts out of an available set that can likely be combined based on commonsense relationships, in order to create new concepts.
Project Website
environment
First TextWorld Problems
Marc-Alexandre Côté, Xingdi Yuan, Adam Trischler, Wendy Tay — CoG 2020
TLDR: Blog post about the result of the First TextWorld Problems competition (including the popular CookingWorld environment)
Project Website
environment sharedtask
Playing by the Book: An Interactive Game Approach for Action Graph Extraction from Text
Ronen Tamari, Hiroyuki Shindo, Dafna Shahaf, Yuji Matsumoto — ESKSP 2019
TLDR: Essentially the beginnings of a simulated environment for some materials science experiments, in the context of procedure understanding, to read text and replicate the gold steps/graph. Implemented using Microsoft TextWorld environment.
Project Website
environment
Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, Jason Weston — EMNLP 2019
TLDR: Introduces LIGHT: A dataset and model for dialog interactions in a text-based role playing environment. Includes a large set of crowdsourced dialog about rooms/objects/etc.
Project Website
data environment social
TextWorld: A Learning Environment for Text-based Games
Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Ruo Yu Tao, Layla El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler — GC 2018
TLDR: The popular TextWorld simulator, which allows expressing text games as a high-level knowledge base. Easily allows parametric game variations. Compiles to Inform7 (then Z-machine).
Project Website
environment simulator
World Generation
Papers that explore automatically creating interactive environments.
Can Language Models Serve as Text-Based World Simulators?
Ruoyao Wang, Graham Todd, Ziang Xiao, Eric Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen — ACL 2024
TLDR: Examines the question of whether we need to build game simulators manually, or can just use LLMs to do the simulation. Proposes a state task/dataset (BYTESIZED-State-Prediction), showing GPT-4 is only ~60% accurate at this task.
Project Website
data simulator worldgeneration
ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games
Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen — EMNLP 2023
TLDR: Presents ByteSized32, a corpus of 32 text games expressed as approximately 1000 lines of Python each. Shows that a GPT-4 model can perform the world-building task using in-context learning, using a single ByteSized32 game as a reference template. Presents automated metrics for evaluating generated games on runnability, specification compliance, winnability, and physical reality alignment.
Project Website
environment simulator worldgeneration
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI
Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour — AIIDE 2023
TLDR: The paper presents “1001 Nights,” a novel interactive game leveraging Generative AI(GPT4 & Stable Diffusion) for co-creative storytelling, where players’ spoken words dynamically transform the game world, advancing the concept of AI-native games in interactive entertainment.
Project Website
agent environment worldgeneration
Learning Knowledge Graph-based World Models of Textual Environments
Prithviraj Ammanabrolu, Mark O. Riedl — NeurIPS 2021
TLDR: Presents Worldformer: a model that learns to predict both state changes in a knowledge graph that represents a world model, as well as predict valid actions from the current state.
worldgeneration
Bringing Stories Alive: Generating Interactive Fiction Worlds
Prithviraj Ammanabrolu, Wesley Cheung, Dan Tu, William Broniec, Mark O. Riedl — AIIDE 2020
TLDR: Extracts a knowledge graph from a story using OpenIE triples, then conditions generation on that knowledge graph.
Project Website
worldgeneration
Generating Interactive Worlds with Text
Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston — AAAI 2020
TLDR: Works to build-out more complex game environments based on the LIGHT dataset. Includes user studies that evaluate on cohesiveness, diversity, and human preference.
worldgeneration
An Online Authoring Tool for Interactive Fiction
Bryan Temprado-Battad, José-Luis Sierra, Antonio Sarasa-Cabezuelo — IV 2019
TLDR: Web-based interactive fiction authoring tool
worldgeneration
Toward Automated Quest Generation in Text-Adventure Games
Prithviraj Ammanabrolu, William Broniec, Alex Mueller, Jeremy Paul, Mark O. Riedl — CCNLG 2019
TLDR: High-level examination of using Markov chains and Neural Generative Models for creating text games, studied in the context of the CookingWorld example. Includes user studies that evaluate generated work on creativity and coherence.
worldgeneration
Agents
Papers that describe agents, or agent architectures.
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian — ACL 2024
TLDR: Introduces the AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of 106 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging day-to-day tasks requiring rich and interactive coding. (ACL 2024 Best Resource Paper Award)
Project Website
agent data environment simulator
PDDLEGO: Iterative Planning in Textual Environments
Li Zhang, P. Jansen, Tianyi Zhang, Peter Clark, Chris Callison-Burch, Niket Tandon — *SEM 2024
TLDR: Investigates combining LLMs with formal planners (PDDL) to solve partially-observable environments like text games, by iteratively constructing a planning representation with subgoals. Evaluates on Coin Collector and CookingWorld from TextWorldExpress.
Project Website
agent
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Mikhail Burtsev, Evgeny Burnaev — Arxiv 2024
TLDR: Builds a “semantic and episodic memory” knowledge graph from exploring the environment, and shows this is helpful for an LLM agent. Evaluates (and largely solves) remakes of games similar to CookingWorld, TextWorld Common Sense, and Coin Collector.
Project Website
agent
ADaPT: As-Needed Decomposition and Planning with Language Models
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot — Arxiv 2023
TLDR: Introduces TextCraft, a text-based crafting environment similar to WordCraft, but for crafting minecraft items. Introduces ADaPT agent, that iteratively decomposes tasks into subtasks and their plans, as needed. ADaPT shows large performance gains on ALFWorld, WebShop, and TextCraft compared to baselines.
Project Website
agent environment
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI
Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour — AIIDE 2023
TLDR: The paper presents “1001 Nights,” a novel interactive game leveraging Generative AI(GPT4 & Stable Diffusion) for co-creative storytelling, where players’ spoken words dynamically transform the game world, advancing the concept of AI-native games in interactive entertainment.
Project Website
agent environment worldgeneration
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer — ICML 2023
TLDR: Introduces BabyAI-Text, a text-only adaptation of the BabyAI environment. Introduces GLAM (Grounding LLMs with online RL) method, an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals.
Project Website
agent environment
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson — Arxiv 2023
TLDR: Introduces Auction Arena, a novel simulation environment for evaluating LLMs within auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. Also designs an LLM agent with a Planning-Bidding-Belief_Update-Replanning mechanism to navigate auction scenarios, which contributes a structured approach to strategic reasoning and adaptability. Demo available online at the project page.
Project Website
agent environment
Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
Ruoyao Wang, Peter Jansen — EMNLP 2023
TLDR: Shows that pathcrawling all possible trajectories in a game (up to reward) is a viable agent strategy. While pathcrawling normally produces good (generalizable) and bad (ungeneralizable) trajectories, here it’s shown that the generalizable trajectories can be distilled by training a small LLM agent on candidate trajectories, and using its performance on a development set as a proxy score for generalizability. Evaluated on TextWorldExpress games.
Project Website
agent
Remember what you did so you know what to do next
Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, R. Weischedel — Arxiv 2023
TLDR: Shows that modestly-sized LLM agent models (e.g. GPT-J 6B) can achieve substantially higher performance when including large action histories in the model context. Evaluates on ScienceWorld.
agent
CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark — Arxiv 2023
TLDR: Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities these agents to date do not continually improve over time, beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory, centered on causal abstractions (rather than general “helpful hints”), that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials.
Project Website
agent
Story Shaping: Teaching Agents Human-like Behavior with Stories
Xiangyu Peng, Christopher Cui, Wei Zhou, Renee Jia, Mark Riedl — AIIDE 2023
TLDR: Introduces a technique, Story Shaping, in which a reinforcement learning agent infers tacit knowledge of how to accomplish a task from an example story. The agent intrinsically rewards itself for performing actions that make its current environment adhere to the inferred story world.
agent
Augmenting Autotelic Agents with Large Language Models
Cédric Colas, Laetitia Teodorescu, Pierre-Yves Oudeyer, Xingdi Yuan, Marc-Alexandre Côté — CoLLas 2023
TLDR: A language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals and is shown to master a large diversity of skills in a task-agnostic text-based environment.
agent
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang Ren — NeurIPS 2023
TLDR: SwiftSage is introduced, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks, and develops a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process.
Project Website
agent
Behavior Cloned Transformers are Neurosymbolic Reasoners
Ruoyao Wang, Peter Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EACL 2023
TLDR: Shows that neurosymbolic agents that combine behavior cloning with symbolic modules (calculator, GPS, knowledge-base lookup, sorter) can essentially completely solve 4 benchmark games from TextWorldExpress.
agent
Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions
Chen Feng Tsai, Xiaochen Zhou, Sierra S. Liu, Jing Li, Mo Yu, Hongyuan Mei — Arxiv 2023
TLDR: The experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence, which opens up new research questions at the intersection of artificial intelligence, machine learning, and natural language processing.
agent
A Minimal Approach for Natural Language Action Space in Text-based Games
Dongwon Kelvin Ryu, Meng Fang, Shirui Pan, Gholamreza Haffari, Ehsan Shareghi — Arxiv 2023
TLDR: This paper revisits the challenge of exploring the action space in TGs and proposes a minimal approach of utilizing admissible actions, for training phase and presents a text-based actor-critic (TAC) agent that produces textual commands for game, solely from game observations, without requiring any KG or LM.
agent
Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Xiaoyu Chen, Shenao Zhang, Pushi Zhang, Li Zhao, Jianyu Chen — Arxiv 2023
TLDR: This work proposes ABA, a method that empowers the agent to proactively query external sources for pertinent information using natural language during their interactions in the environment, and demonstrates that by imitation learning, ABA effectively retains and reuses queried and known information in subsequent tasks, mitigating the need for repetitive inquiries.
agent
Plan, Eliminate, and Track – Language Models are Good Teachers for Embodied Agents
Yue Wu, So Yeon Min, Yonatan Bisk, R. Salakhutdinov, A. Azaria, Yuan-Fang Li, Tom M. Mitchell, Shrimai Prabhumoye — Arxiv 2023
TLDR: A framework to use the knowledge in LLMs to simplify the control problem, rather than solving it is proposed, which leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
agent
Knowledge-enhanced Agents for Interactive Text Games
P. Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan M Francis, Kaixin Ma — Arxiv 2023
TLDR: This paper proposes a framework for enabling improved functional grounding of agents in text-based games, and considers two forms of domain knowledge that are injected into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment.
agent
ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EMNLP 2022
TLDR: Presents ScienceWorld, a high-fidelity simulator with 30 different elementary science tasks (such as boiling water, building electrical circuits, or determining whether a inherited characteristic is dominent or recessive), powered by simplified simulation engines for thermodynamics, electricity, chemistry, forces, and life processes. Shows that a behavior cloning agent based on a question answering model that can answer most elementary science questions correctly gets extremely low performance on those same tasks when expressed as a virtual environment, highlighting the importance of interactivity when evaluating reasoning.
Project Website
agent environment simulator
Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Shunyu Yao, Karthik Narasimhan, Matthew Hausknecht — NAACL 2021
TLDR: A really interesting paper that shows TextWorld RL agents likely aren’t really using the text. (a) They replace text with hashcodes and show *improved* task performance, (b) They propose a new representation method “inverse dynamics” that performs better on Zork, where t-SNE plots shows that (i) the baseline model groups state observations by whether they have been seen/are unseen, while (ii) the inverse-dynamics method causes states to be grouped by semantic similarity.
Project Website
agent
Language Models are Few-Shot Butlers
Vincent Micheli, Franccois Fleuret — EMNLP 2021
TLDR: Shows GPT2 can do well at AlfWorld after pretraining. Very similar to Jansen (“Visually-grounded planning without vision”, EMNLP 2020).
agent
How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston — NAACL 2021
TLDR: Introduces (a) LIGHT-Quests, a crowdsourced dataset of quests based on the LIGHT environment, (b) ATOMIC-LIGHT, a common-sense knowledge graph with related knowledge. Also trains a transformer agent to act in this environment.
agent data social
Learning Dynamic Belief Graphs to Generalize on Text-Based Games
Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton — NeurIPS 2020
TLDR: GATA (Graph-aided Transformer Agent) model: Infers and updates latent beliefs during planning. Evaluated on CookingWorld. (Note: Contains huge, interesting visualizations in Appendix).
Project Website
agent
Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Shunyu Yao, Rohan Rao, Matthew Hausknecht, Karthik Narasimhan — EMNLP 2020
TLDR: CALM model: One of the first demonstrations that language models could be used to generate next valid actions for text games, after being trained on gold playthroughs, to reduce the reliance on the “valid action handicap”. Trains a GPT2 agent to generate a shortlist of valid actions, then uses a model similar to a DRRN to choose one action from this shortlist. Evaluated on Jericho games.
Project Website
agent
Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions
Peter Jansen — EMNLP 2020
TLDR: Converted the 3D ALFRED challenge to a text-only trajectory task (pre-AlfWorld), showing GPT2 could recover nearly 50% of gold trajectories. Empirically demonstrated that language models contain a variety of common-sense/pick-and-place information that can be queried with text alone (i.e. no need for visual information).
Project Website
agent
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
Prithviraj Ammanabrolu, Ethan Tien, Matthew Hausknecht, Mark O. Riedl — Arxiv 2020
TLDR: Q-BERT/MCQ-BERT: Shows that asking simple questions at each step (e.g. Where am I? What do I have? What do I see?) helps improve performance on Zork and other Jericho games.
Project Website
agent
Playing Text-Based Games with Common Sense
Sahith Dambekodi, Spencer Frazier, Prithviraj Ammanabrolu, Mark O. Riedl — Arxiv 2020
TLDR: Shows that using knowledge from the COMET knowledge base helps an agent learn faster in the 9:05 interactive fiction game, which is part of the Jerico benchmark. Uses the Q*BERT model.
agent
Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge
K. Murugesan, Mattia Atzeni, Pushkar Shukla, Mrinmaya Sachan, Pavan Kapanipathi, Kartik Talamadupula — Arxiv 2020
TLDR: Shows that knowledge from ConceptNet can be combined with a GATA to improve performance/sample efficiency on two environments: KitchenCleanup and CookingWorld.
agent
Graph Constrained Reinforcement Learning For Natural Language Action Spaces
Prithviraj Ammanabrolu, Matthew Hausknecht — ICLR 2020
TLDR: KG-DQN: Builds a knowledge graph of a text world environment using OpenIE triples, and uses this for next-action selection through an Advantage Actor Critic model. Evaluated on Jericho games.
Project Website
agent
Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning
Xiaoxiao Guo, M. Yu, Yupeng Gao, Chuang Gan, Murray Campbell, S. Chang — EMNLP 2020
TLDR: Reframes playing interactive fiction games as a multi-paragraph reading comprehension (MPRC) problem. Evaluates on Jericho games.
Project Website
agent
Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction
Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare — AAAI 2020
TLDR: Proposes two algorithmic improvements to Deep Relevance Networks for interactive fiction. Evaluates on SaladWorld (a TextWorld environment) and Zork.
agent
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Y. Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, C. Zhang — NeurIPS 2020
TLDR: Describes a reinforcement learning method that uses stacked heirarchical attention on knowledge graphs. Evaluates on Jericho games.
Project Website
agent
Exploration Based Language Learning For Text-Based Games
Andrea Madotto, M. Namazifar, J. Huizinga, Piero Molino, Adrien Ecoffet, Huaixiu Zheng, A. Papangelis, Dian Yu, Chandra Khatri, G. Tur — IJCAI 2020
TLDR: Shows that the Go-Explore algorithm can outperform various baseline models (such as a DRRN) in terms of accuacy and sample efficiency. Evaluated on two TextWorld games: coin collector, and cooking world.
agent
Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games
Subhajit Chaudhury, Daiki Kimura, Kartik Talamadupula, Michiaki Tatsubori, Asim Munawar, Ryuki Tachibana — EMNLP 2020
TLDR: Shows that RL agents trained on one TextWorld generally do not generalize well to other TextWorlds. Proposes a pruning method that appears to help speed generalization.
Project Website
agent
Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics
Xusen Yin, Jonathan May — Arxiv 2020
TLDR: Proposes a method of using a Siamese Neural Network with Deep Q Learning to promote transfer learning across environments. Evaluates on CookingWorld, and demonstrates transfer learning to a treasure hunting game (both Text World).
agent
LeDeepChef Deep Reinforcement Learning Agent for Families of Text-Based Games
Leonard Adolphs, T. Hofmann — AAAI 2020
TLDR: LeDeepChef: 2nd place winner in shared task. Uses a combination of ranking, actor-critic, and feudal learning.
agent sharedtask
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam — Arxiv 2020
TLDR: Proposes a new task in the LIGHT environment: The player must say something that causes an agent (computer) to perform a specific action (e.g. put on chain mail) or use a specific emotive. Creates an RL agent that is able to succeed at this task about half the time.
agent social
Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning
Prithviraj Ammanabrolu, Mark O. Riedl — NAACL 2019
TLDR: KG-DQN: Builds a knowledge graph of a text world environment using OpenIE triples, and uses this for next-action selection through a DQN. Evaluates on TextWorld.
Project Website
agent
Transfer in Deep Reinforcement Learning using Knowledge Graphs
Prithviraj Ammanabrolu, Mark O. Riedl — Textgraphs 2019
TLDR: Shows that using knowledge graphs as domain (or genre)-specific priors helps improve transfer learning to different same-genre textworlds (e.g. horror, sci-fi, soap-opera). Refers to this as “knowledge graph seeding”. Evaluates on TextWorld, and several Jericho games.
agent
Comprehensible Context-driven Text Game Playing
Xusen Yin, Jonathan May — CoG 2019
TLDR: Shows that using a “fast CNN” in place of an LSTM in a DQN can provide speed and accuracy improvements. Evaluates on Zork.
agent
Learn How to Cook a New Recipe in a New House: Using Map Familiarization, Curriculum Learning, and Bandit Feedback to Learn Families of Text-Based Adventure Games
Xusen Yin, Jonathan May — Arxiv 2019
TLDR: Proposes a number of methods to promote agent generalization. Evaluates on CookingWorld.
agent
NAIL: A General Interactive Fiction Agent
M. Hausknecht, R. Loynd, Greg Yang, Adith Swaminathan, J. Williams — Arxiv 2019
TLDR: NAIL: Winner of 2018 shared task. Uses a variety of “decision modules” (e.g. hoarder, examiner, interactor), which interact with a knowledge graph representation.
agent sharedtask
RDF* Graph Database as Interlingua for the TextWorld Challenge
Guntis Barzdins, D. Gosko, Paulis F. Barzdins, Uldis Lavrinovics, Gints Bernans, E. Celms — CoG 2019
TLDR: Shared task agent. Uses a split “Actor and Observer” architecture, where these two modules communicate through an “RDF* database”. Database serves as the world model, and is updated in part by using FrameNet to interpret observations.
agent sharedtask
Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor — Arxiv 2019
TLDR: Suggests it can solve Zork with 10 million interactions through a combination of compressive sensing and immitation learning.
agent
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
Tom Zahavy, Matan Haroush , Nadav Merlis, Daniel J. Mankowitz, Shie Mannor — NeurIPS 2018
TLDR: Action-Elimination Deep Q-Network (AE-DQN): Shows that learning to predict and discard irrelevant actions can improve agent performance. Evaluated on Zork.
agent
Language Expansion In Text-Based Games
Ghulam Ahmed Ansari, P. SagarJ., A. P. S. Chandar, Balaraman Ravindran — Arxiv 2018
TLDR: Works towards greating more generic RL text game agents using policy distilliation. Evaluates on Home World.
agent
Counting to Explore and Generalize in Text-based Games
Xingdi Yuan, Marc-Alexandre Cote, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler — ERL 2018
TLDR: Uses an “episodic count-based exploration scheme” for counting state space states, to encourage an RL model to explore states it hasn’t been to. Evaluates on Coin Collector.
agent
What can you do with a rock? Affordance extraction via word embeddings
Nancy Fulda and Daniel Ricks and Ben Murdoch and David Wingate — IJCAI 2017
TLDR: Attempts to distill commonsense affordances from Wikipedia embeddings using the analogy embedding paradigm (e.g. king:queen::man:woman), then uses these to improve performance of an RL agent on Z-Machine games (including Zork)
agent
Text-based Adventures of the Golovin AI Agent
Bartosz Kostka, Jarosław Kwiecien, Jakub Kowalski, Paweł Rychlikowski — CIG 2017
TLDR: Golovin model: An agent that combines a variety of methods (language models, LSTM, rules) to perform comparably to the 2016 Shared Task Winner @ Zork and other similar games.
agent
Deep Reinforcement Learning with a Natural Language Action Space
Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf — ACL 2016
TLDR: DRRN (Deep Reinforcement Relevance Network). Learns separate representations of the state and action space, which are combined with an interactive function to approximate a Q function for deep reinforcement learning. Strong early RL baseline for most games, frequently compared against.
agent
Language Understanding for Text-based Games using Deep Reinforcement Learning
Karthik Narasimhan, Tejas D. Kulkarni, R. Barzilay — EMNLP 2015
TLDR: LSTM-DQN: Early paper showing RL can be applied to text-based adventure games. Learns state and action representations jointly. Evaluates on two games (HomeWorld and FantasyWorld)
Project Website
agent
Data
Papers that describe data or resources.
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents
Harsh Trivedi, Tushar Khot, Mareike Hartmann, Ruskin Manku, Vinty Dong, Edward Li, Shashank Gupta, Ashish Sabharwal, Niranjan Balasubramanian — ACL 2024
TLDR: Introduces the AppWorld Engine, a high-fidelity execution environment of 9 day-to-day apps, operable via 457 APIs, populated with digital activities of 106 people living in a simulated world, and an associated benchmark of natural, diverse, and challenging day-to-day tasks requiring rich and interactive coding. (ACL 2024 Best Resource Paper Award)
Project Website
agent data environment simulator
Can Language Models Serve as Text-Based World Simulators?
Ruoyao Wang, Graham Todd, Ziang Xiao, Eric Yuan, Marc-Alexandre Côté, Peter Clark, Peter Jansen — ACL 2024
TLDR: Examines the question of whether we need to build game simulators manually, or can just use LLMs to do the simulation. Proposes a state task/dataset (BYTESIZED-State-Prediction), showing GPT-4 is only ~60% accurate at this task.
Project Website
data simulator worldgeneration
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks — ICML 2023
TLDR: Introduces the MACHIAVELLI benchmark, a set of 134 choose-your-own-adventure games from choiceofgames.com that have been automatically annotated with a large (2.8M) set of labels from GPT-4 detailing whether chosen actions represent harmful behaviors. Evaluates performance of baseline agents (Random, DRRN, and GPT-3.5/4) on a large set of ethical measures.
Project Website
data environment
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap — Arxiv 2023
TLDR: SOTOPIA is an open-ended dialog task for role playing, similar to LIGHT, with a set of specific characters and scenarios to role play. Evaluates human and GPT-4 baselines on the social interaction task. Provides a set of 7 automatic metrics for evaluating these social interactions, showing moderate agreement with human judgments.
Project Website
data environment social
Ontologically Faithful Generation of Non-Player Character Dialogues
Nathaniel Weir, Ryan Thomas, Randolph D’Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani — Arxiv 2023
TLDR: Introduces KNUDGE, a knowledge-constrained NPC dialogue dataset, where models must author a complex dialogue tree between players and NPCs according to a large set of persona/lore and quest specification passages. The data is drawn from a real RPG (The Outer Worlds). Introduces a series of automatic and human evaluation protocols for the task.
data social
FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information
Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara Martin, Chris Callison-Burch — ACL 2023
TLDR: This paper presents FIREBALL, a large dataset of sessions from real Dungeons and Dragons (D&D) gameplay on Discord with true game state information. The true game states are intended to help large language models generate better game rounds.
Project Website
data
Modeling Worlds in Text
Prithviraj Ammanabrolu, Mark O. Riedl — Arxiv 2021
TLDR: JerichoWorld: Large dataset for generating knowledge graphs from interactive fiction games. Collected from 27 interactive fiction games (from the Jericho dataset).
data
How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston — NAACL 2021
TLDR: Introduces (a) LIGHT-Quests, a crowdsourced dataset of quests based on the LIGHT environment, (b) ATOMIC-LIGHT, a common-sense knowledge graph with related knowledge. Also trains a transformer agent to act in this environment.
agent data social
Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, Jason Weston — EMNLP 2019
TLDR: Introduces LIGHT: A dataset and model for dialog interactions in a text-based role playing environment. Includes a large set of crowdsourced dialog about rooms/objects/etc.
Project Website
data environment social
Position Papers
Position papers.
Language (Re)modelling: Towards Embodied Language Understanding
Ronen Tamari, Chen Shani, Tom Hope, Miriam R. L. Petruck, Omri Abend, Dafna Shahaf — ACL 2020
TLDR: Why and how text-based games play a key role in training NLU systems with more human like inductive biases of grounded mental simulation and metaphoric inference.
position
Dungeons and DQNs: Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games
Lara J. Martin, Srijan Sood, Mark O. Riedl — INT-WICED 2018
TLDR: Position paper that proposes using table-top role-playing games (like Dungeons and Dragons) as a challenge task.
position
Shared Tasks
Papers that describe shared tasks (either by participating teams, or organizer summaries).
LeDeepChef Deep Reinforcement Learning Agent for Families of Text-Based Games
Leonard Adolphs, T. Hofmann — AAAI 2020
TLDR: LeDeepChef: 2nd place winner in shared task. Uses a combination of ranking, actor-critic, and feudal learning.
agent sharedtask
First TextWorld Problems
Marc-Alexandre Côté, Xingdi Yuan, Adam Trischler, Wendy Tay — CoG 2020
TLDR: Blog post about the result of the First TextWorld Problems competition (including the popular CookingWorld environment)
Project Website
environment sharedtask
NAIL: A General Interactive Fiction Agent
M. Hausknecht, R. Loynd, Greg Yang, Adith Swaminathan, J. Williams — Arxiv 2019
TLDR: NAIL: Winner of 2018 shared task. Uses a variety of “decision modules” (e.g. hoarder, examiner, interactor), which interact with a knowledge graph representation.
agent sharedtask
RDF* Graph Database as Interlingua for the TextWorld Challenge
Guntis Barzdins, D. Gosko, Paulis F. Barzdins, Uldis Lavrinovics, Gints Bernans, E. Celms — CoG 2019
TLDR: Shared task agent. Uses a split “Actor and Observer” architecture, where these two modules communicate through an “RDF* database”. Database serves as the world model, and is updated in part by using FrameNet to interpret observations.
agent sharedtask
The Text-Based Adventure AI Competition
Timothy Atkinson, Hendrik Baier, Tara Copplestone, Sam Devlin, Jerry Swan — IEEE Trans. Games 2019
TLDR: Summary Paper for the 2016, 2017, and 2018 Shared Tasks on Text-based Adventure AI
sharedtask
Social Agents
Papers that describe dialog, such as agent-user communication, or agent-agent communciation.
SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap — Arxiv 2023
TLDR: SOTOPIA is an open-ended dialog task for role playing, similar to LIGHT, with a set of specific characters and scenarios to role play. Evaluates human and GPT-4 baselines on the social interaction task. Provides a set of 7 automatic metrics for evaluating these social interactions, showing moderate agreement with human judgments.
Project Website
data environment social
Ontologically Faithful Generation of Non-Player Character Dialogues
Nathaniel Weir, Ryan Thomas, Randolph D’Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani — Arxiv 2023
TLDR: Introduces KNUDGE, a knowledge-constrained NPC dialogue dataset, where models must author a complex dialogue tree between players and NPCs according to a large set of persona/lore and quest specification passages. The data is drawn from a real RPG (The Outer Worlds). Introduces a series of automatic and human evaluation protocols for the task.
data social
Towards Socially Intelligent Agents with Mental State Transition and Human Utility
Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu — SIGDIAL 2022
TLDR: Builds a “mental state parser” for representing the mental states of agents as a graph. Evaluated on LIGHT.
social
How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston — NAACL 2021
TLDR: Introduces (a) LIGHT-Quests, a crowdsourced dataset of quests based on the LIGHT environment, (b) ATOMIC-LIGHT, a common-sense knowledge graph with related knowledge. Also trains a transformer agent to act in this environment.
agent data social
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam — Arxiv 2020
TLDR: Proposes a new task in the LIGHT environment: The player must say something that causes an agent (computer) to perform a specific action (e.g. put on chain mail) or use a specific emotive. Creates an RL agent that is able to succeed at this task about half the time.
agent social
Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, Jason Weston — EMNLP 2019
TLDR: Introduces LIGHT: A dataset and model for dialog interactions in a text-based role playing environment. Includes a large set of crowdsourced dialog about rooms/objects/etc.
Project Website
data environment social
Surveys
Survey papers.
A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Peter Jansen — Wordplay 2022
TLDR: Presents an in-depth survey of text game agents including motivations/position (why use text games?), simulators (and how they compare to 2D/3D simulators), environments, agents, and contemporary/future directions.
survey
A Survey of Text Games for Reinforcement Learning Informed by Natural Language
Philip Osborne, Heido Nõmm, André Freitas — TACL 2022
TLDR: A survey of text games as they relate to being modelled as reinforcement learning problems.
survey
Other
Other papers.
Skill Check: Some Considerations on the Evaluation of Gamemastering Models for Role-Playing Games
Santiago Góngora, Luis Chiruzzo, Gonzalo Méndez, Pablo Gervás — GALA 2023
TLDR: Proposes three test categories (interaction, item tracking, map design) to evaluate different gamemastering skills. Shows ChatGPT, Bard, and OpenAssistant LLMs struggle with common-sense reasoning and keeping the fictional world consistent.
Project Website
other
Interactive Language Learning by Question Answering
Xingdi Yuan, Marc-Alexandre Cote, Jie Fu, Zhouhan Lin, Christopher Pal, Yoshua Bengio, Adam Trischler — EMNLP 2019
TLDR: Introduces QAit (Question Answering with Interactive Text), a question answering task where answers must be gathered by interacting with a text game (CookingWorld). Includes 3 question types centered around properties: locationOf, existanceOf, and getProperty.
other
Ceptre: A Language for Modeling Generative Interactive Systems
Chris Martens — AIIDE 2015
TLDR: Presents Linear Logic, a formalism for modeling interactive fiction (and used by TextWorld).
other
Last updated: Aug 22, 2024