Interactive Virtual Environments

A list of research articles published on interactive text-based virtual environments, agents, and related areas.

New to text-based interactive virtual environments?

Table of Contents

   Simulators: Text-game Engines    
   Environments: Specific Interactive Games/Environments/Benchmarks    
   World Generation: Automatic World Generation    
   Agents: Agents/Agent Architectures    
   Data: Data/Resources    
   Position Papers    
   Shared Tasks    
   Social Agents: Agent-user or agent-agent dialog    
   Surveys    
   Other    
   Legend:       Very recently added       Added in last 90 days       Added in last year

Simulators

Papers that describe simulators, which are like game engines that specific games/environments can be created in.

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games
Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen — EMNLP 2023
TLDR: Presents ByteSized32, a corpus of 32 text games expressed as approximately 1000 lines of Python each. Shows that a GPT-4 model can perform the world-building task using in-context learning, using a single ByteSized32 game as a reference template. Presents automated metrics for evaluating generated games on runnability, specification compliance, winnability, and physical reality alignment.
Project Website
environment simulator worldgeneration

TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Peter Jansen, Marc-Alexandre Cote — EACL 2023
TLDR: TextWorldExpress: An extremely fast text game simulator that is orders of magnitude faster than other simulators (up to 1 million frames per second). Includes impementations of CookingWorld, Coin Collector, Text-World Common Sense, and a number of other benchmarks. pip installable for Python.
Project Website
environment simulator

ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EMNLP 2022
TLDR: Presents ScienceWorld, a high-fidelity simulator with 30 different elementary science tasks (such as boiling water, building electrical circuits, or determining whether a inherited characteristic is dominent or recessive), powered by simplified simulation engines for thermodynamics, electricity, chemistry, forces, and life processes. Shows that a behavior cloning agent based on a question answering model that can answer most elementary science questions correctly gets extremely low performance on those same tasks when expressed as a virtual environment, highlighting the importance of interactivity when evaluating reasoning.
Project Website
agent environment simulator

Interactive Fiction Games: A Colossal Adventure
Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, Xingdi Yuan — AAAI 2020
TLDR: The Jericho simulator provides an OpenAI Gym-like interface for Z-Machine based interactive fiction games (like Zork). Presents a benchmark for training and evaluating on specific inferactive fiction games.
Project Website
environment simulator

TextWorld: A Learning Environment for Text-based Games
Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Ruo Yu Tao, Layla El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler — GC 2018
TLDR: The popular TextWorld simulator, which allows expressing text games as a high-level knowledge base. Easily allows parametric game variations. Compiles to Inform7 (then Z-machine).
Project Website
environment simulator

(Z-Machine) The Z-Machine Standards Document
Graham Nelson — Website 2014
TLDR: A document that infers the standards for Z-Machine, the popular formalism and virtual machine for representing text games developed by Infocom in the 1980s.
simulator

(Inform7) Natural Language, Semantic Analysis, and Interactive Fiction
Graham Nelson — IF Theory Reader 2006
TLDR: A write-up of Inform7, a popular language for creating interactive fiction games in a programming language that resembles natural language. Compiles to Z-machine.
Project Website
simulator

 

Environments

Papers that describe specific interactive environments, such as a specific game, or benchmark of games.

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games
Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen — EMNLP 2023
TLDR: Presents ByteSized32, a corpus of 32 text games expressed as approximately 1000 lines of Python each. Shows that a GPT-4 model can perform the world-building task using in-context learning, using a single ByteSized32 game as a reference template. Presents automated metrics for evaluating generated games on runnability, specification compliance, winnability, and physical reality alignment.
Project Website
environment simulator worldgeneration

TextWorldExpress: Simulating Text Games at One Million Steps Per Second
Peter Jansen, Marc-Alexandre Cote — EACL 2023
TLDR: TextWorldExpress: An extremely fast text game simulator that is orders of magnitude faster than other simulators (up to 1 million frames per second). Includes impementations of CookingWorld, Coin Collector, Text-World Common Sense, and a number of other benchmarks. pip installable for Python.
Project Website
environment simulator

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks — ICML 2023
TLDR: Introduces the MACHIAVELLI benchmark, a set of 134 choose-your-own-adventure games from choiceofgames.com that have been automatically annotated with a large (2.8M) set of labels from GPT-4 detailing whether chosen actions represent harmful behaviors. Evaluates performance of baseline agents (Random, DRRN, and GPT-3.5/4) on a large set of ethical measures.
Project Website
data environment

ADaPT: As-Needed Decomposition and Planning with Language Models
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot — Arxiv 2023
TLDR: Introduces TextCraft, a text-based crafting environment similar to WordCraft, but for crafting minecraft items. Introduces ADaPT agent, that iteratively decomposes tasks into subtasks and their plans, as needed. ADaPT shows large performance gains on ALFWorld, WebShop, and TextCraft compared to baselines.
Project Website
agent environment

Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI
Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour — AIIDE 2023
TLDR: The paper presents “1001 Nights,” a novel interactive game leveraging Generative AI(GPT4 & Stable Diffusion) for co-creative storytelling, where players’ spoken words dynamically transform the game world, advancing the concept of AI-native games in interactive entertainment.
Project Website
agent environment worldgeneration

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai, Isadora White, Charles Burton Snell, Charles Sun, Joey Hong, Yuexiang Zhai, Kelvin Xu, Sergey Levine — Arxiv 2023
TLDR: LMRL-Gym is a multi-turn RL benchmark including several conventional (e.g. Chess, Wordle, Maze), conversational, and text-game tasks. The text game is a navigation game that appears similar to Coin Collector (TextWorld) or Map Reader (TextWorldExpress), and is implemented in TextWorld.
Project Website
environment

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap — Arxiv 2023
TLDR: SOTOPIA is an open-ended dialog task for role playing, similar to LIGHT, with a set of specific characters and scenarios to role play. Evaluates human and GPT-4 baselines on the social interaction task. Provides a set of 7 automatic metrics for evaluating these social interactions, showing moderate agreement with human judgments.
Project Website
data environment social

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer — ICML 2023
TLDR: Introduces BabyAI-Text, a text-only adaptation of the BabyAI environment. Introduces GLAM (Grounding LLMs with online RL) method, an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals.
Project Website
agent environment

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson — Arxiv 2023
TLDR: Introduces Auction Arena, a novel simulation environment for evaluating LLMs within auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. Also designs an LLM agent with a Planning-Bidding-Belief_Update-Replanning mechanism to navigate auction scenarios, which contributes a structured approach to strategic reasoning and adaptability. Demo available online at the project page.
Project Website
agent environment

ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EMNLP 2022
TLDR: Presents ScienceWorld, a high-fidelity simulator with 30 different elementary science tasks (such as boiling water, building electrical circuits, or determining whether a inherited characteristic is dominent or recessive), powered by simplified simulation engines for thermodynamics, electricity, chemistry, forces, and life processes. Shows that a behavior cloning agent based on a question answering model that can answer most elementary science questions correctly gets extremely low performance on those same tasks when expressed as a virtual environment, highlighting the importance of interactivity when evaluating reasoning.
Project Website
agent environment simulator

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht — ICLR 2021
TLDR: Creates a mirror environment for the 3D ALFRED simulator as a TextWorld. Shows that training on the 7 ALFRED tasks in a TextWorld transfers to the 3D environment.
environment

Text-based RL Agents with Commonsense Knowledge: New Challenges, Environments and Baselines
Murugesan, K., Atzeni, M., Kapanipathi, P., Shukla, P., Kumaravel, S., Tesauro, G., Talamadupula, K., Sachan, M., & Campbell, M. — AAAI 2021
TLDR: Implements a TextWorld environment for household cleaning tasks. (a) Includes 8 rooms, 56 containers, 190 objects. (b) Task is to pick up objects from the floor and put them in their correct (cannonical common-sense) locations. (c) They train two models, RL agent, and RL agent + ConceptNet, and show the one with more knowledge is more efficient.
Project Website
environment

Process-Level Representation of Scientific Protocols with Interactive Annotation
Ronen Tamari, Fan Bai, Alan Ritter, Gabriel Stanovsky — EACL 2021
TLDR: Environment for simulating text-based biology (wet-lab) experiment protocols
Project Website
environment

Interactive Fiction Games: A Colossal Adventure
Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, Xingdi Yuan — AAAI 2020
TLDR: The Jericho simulator provides an OpenAI Gym-like interface for Z-Machine based interactive fiction games (like Zork). Presents a benchmark for training and evaluating on specific inferactive fiction games.
Project Website
environment simulator

WordCraft: An Environment for Benchmarking Commonsense Agents
Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip HS Torr, Shimon Whiteson, Tim Rocktäschel — LRLW 2020
TLDR: A simple environment to evaluate commonsense reasoning, based on the game Alchemy 2. This game requires the agent to identify concepts out of an available set that can likely be combined based on commonsense relationships, in order to create new concepts.
Project Website
environment

First TextWorld Problems
Marc-Alexandre Côté, Xingdi Yuan, Adam Trischler, Wendy Tay — CoG 2020
TLDR: Blog post about the result of the First TextWorld Problems competition (including the popular CookingWorld environment)
Project Website
environment sharedtask

Playing by the Book: An Interactive Game Approach for Action Graph Extraction from Text
Ronen Tamari, Hiroyuki Shindo, Dafna Shahaf, Yuji Matsumoto — ESKSP 2019
TLDR: Essentially the beginnings of a simulated environment for some materials science experiments, in the context of procedure understanding, to read text and replicate the gold steps/graph. Implemented using Microsoft TextWorld environment.
Project Website
environment

Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, Jason Weston — EMNLP 2019
TLDR: Introduces LIGHT: A dataset and model for dialog interactions in a text-based role playing environment. Includes a large set of crowdsourced dialog about rooms/objects/etc.
Project Website
data environment social

TextWorld: A Learning Environment for Text-based Games
Marc-Alexandre Côté, Ákos Kádár, Xingdi Yuan, Ben Kybartas, Tavian Barnes, Emery Fine, James Moore, Matthew Hausknecht, Ruo Yu Tao, Layla El Asri, Mahmoud Adada, Wendy Tay, Adam Trischler — GC 2018
TLDR: The popular TextWorld simulator, which allows expressing text games as a high-level knowledge base. Easily allows parametric game variations. Compiles to Inform7 (then Z-machine).
Project Website
environment simulator

 

World Generation

Papers that explore automatically creating interactive environments.

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games
Ruoyao Wang, Graham Todd, Eric Yuan, Ziang Xiao, Marc-Alexandre Côté, Peter Jansen — EMNLP 2023
TLDR: Presents ByteSized32, a corpus of 32 text games expressed as approximately 1000 lines of Python each. Shows that a GPT-4 model can perform the world-building task using in-context learning, using a single ByteSized32 game as a reference template. Presents automated metrics for evaluating generated games on runnability, specification compliance, winnability, and physical reality alignment.
Project Website
environment simulator worldgeneration

Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI
Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour — AIIDE 2023
TLDR: The paper presents “1001 Nights,” a novel interactive game leveraging Generative AI(GPT4 & Stable Diffusion) for co-creative storytelling, where players’ spoken words dynamically transform the game world, advancing the concept of AI-native games in interactive entertainment.
Project Website
agent environment worldgeneration

Learning Knowledge Graph-based World Models of Textual Environments
Prithviraj Ammanabrolu, Mark O. Riedl — NeurIPS 2021
TLDR: Presents Worldformer: a model that learns to predict both state changes in a knowledge graph that represents a world model, as well as predict valid actions from the current state.
worldgeneration

Bringing Stories Alive: Generating Interactive Fiction Worlds
Prithviraj Ammanabrolu, Wesley Cheung, Dan Tu, William Broniec, Mark O. Riedl — AIIDE 2020
TLDR: Extracts a knowledge graph from a story using OpenIE triples, then conditions generation on that knowledge graph.
Project Website
worldgeneration

Generating Interactive Worlds with Text
Angela Fan, Jack Urbanek, Pratik Ringshia, Emily Dinan, Emma Qian, Siddharth Karamcheti, Shrimai Prabhumoye, Douwe Kiela, Tim Rocktaschel, Arthur Szlam, Jason Weston — AAAI 2020
TLDR: Works to build-out more complex game environments based on the LIGHT dataset. Includes user studies that evaluate on cohesiveness, diversity, and human preference.
worldgeneration

An Online Authoring Tool for Interactive Fiction
Bryan Temprado-Battad, José-Luis Sierra, Antonio Sarasa-Cabezuelo — IV 2019
TLDR: Web-based interactive fiction authoring tool
worldgeneration

Toward Automated Quest Generation in Text-Adventure Games
Prithviraj Ammanabrolu, William Broniec, Alex Mueller, Jeremy Paul, Mark O. Riedl — CCNLG 2019
TLDR: High-level examination of using Markov chains and Neural Generative Models for creating text games, studied in the context of the CookingWorld example. Includes user studies that evaluate generated work on creativity and coherence.
worldgeneration

 

Agents

Papers that describe agents, or agent architectures.

ADaPT: As-Needed Decomposition and Planning with Language Models
Archiki Prasad, Alexander Koller, Mareike Hartmann, Peter Clark, Ashish Sabharwal, Mohit Bansal, Tushar Khot — Arxiv 2023
TLDR: Introduces TextCraft, a text-based crafting environment similar to WordCraft, but for crafting minecraft items. Introduces ADaPT agent, that iteratively decomposes tasks into subtasks and their plans, as needed. ADaPT shows large performance gains on ALFWorld, WebShop, and TextCraft compared to baselines.
Project Website
agent environment

Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI
Yuqian Sun, Zhouyi Li, Ke Fang, Chang Hee Lee, Ali Asadipour — AIIDE 2023
TLDR: The paper presents “1001 Nights,” a novel interactive game leveraging Generative AI(GPT4 & Stable Diffusion) for co-creative storytelling, where players’ spoken words dynamically transform the game world, advancing the concept of AI-native games in interactive entertainment.
Project Website
agent environment worldgeneration

Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, Pierre-Yves Oudeyer — ICML 2023
TLDR: Introduces BabyAI-Text, a text-only adaptation of the BabyAI environment. Introduces GLAM (Grounding LLMs with online RL) method, an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals.
Project Website
agent environment

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson — Arxiv 2023
TLDR: Introduces Auction Arena, a novel simulation environment for evaluating LLMs within auctions, a setting chosen for being highly unpredictable and involving many skills related to resource and risk management, while also being easy to evaluate. Also designs an LLM agent with a Planning-Bidding-Belief_Update-Replanning mechanism to navigate auction scenarios, which contributes a structured approach to strategic reasoning and adaptability. Demo available online at the project page.
Project Website
agent environment

Self-Supervised Behavior Cloned Transformers are Path Crawlers for Text Games
Ruoyao Wang, Peter Jansen — EMNLP 2023
TLDR: Shows that pathcrawling all possible trajectories in a game (up to reward) is a viable agent strategy. While pathcrawling normally produces good (generalizable) and bad (ungeneralizable) trajectories, here it’s shown that the generalizable trajectories can be distilled by training a small LLM agent on candidate trajectories, and using its performance on a development set as a proxy score for generalizability. Evaluated on TextWorldExpress games.
Project Website
agent

Remember what you did so you know what to do next
Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin, Marjorie Freedman, R. Weischedel — Arxiv 2023
TLDR: Shows that modestly-sized LLM agent models (e.g. GPT-J 6B) can achieve substantially higher performance when including large action histories in the model context. Evaluates on ScienceWorld.
agent

CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization
Bodhisattwa Prasad Majumder, Bhavana Dalvi Mishra, Peter Jansen, Oyvind Tafjord, Niket Tandon, Li Zhang, Chris Callison-Burch, Peter Clark — Arxiv 2023
TLDR: Language agents have shown some ability to interact with an external environment, e.g., a virtual world such as ScienceWorld, to perform complex tasks, e.g., growing a plant, without the startup costs of reinforcement learning. However, despite their zero-shot capabilities these agents to date do not continually improve over time, beyond performance refinement on a specific task. Here we present CLIN, the first language-based agent to achieve this, so that it continually improves over multiple trials, including when both the environment and task are varied, and without requiring parameter updates. Our approach is to use a persistent, dynamic, textual memory, centered on causal abstractions (rather than general “helpful hints”), that is regularly updated after each trial so that the agent gradually learns useful knowledge for new trials.
Project Website
agent

Story Shaping: Teaching Agents Human-like Behavior with Stories
Xiangyu Peng, Christopher Cui, Wei Zhou, Renee Jia, Mark Riedl — AIIDE 2023
TLDR: Introduces a technique, Story Shaping, in which a reinforcement learning agent infers tacit knowledge of how to accomplish a task from an example story. The agent intrinsically rewards itself for performing actions that make its current environment adhere to the inferred story world.
agent

Augmenting Autotelic Agents with Large Language Models
Cédric Colas, Laetitia Teodorescu, Pierre-Yves Oudeyer, Xingdi Yuan, Marc-Alexandre Côté — CoLLas 2023
TLDR: A language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals and is shown to master a large diversity of skills in a task-agnostic text-based environment.
agent

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Bill Yuchen Lin, Yicheng Fu, Karina Yang, Prithviraj Ammanabrolu, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Yejin Choi, Xiang Ren — NeurIPS 2023
TLDR: SwiftSage is introduced, a novel agent framework inspired by the dual-process theory of human cognition, designed to excel in action planning for complex interactive reasoning tasks, and develops a heuristic method to harmoniously integrate the two modules, resulting in a more efficient and robust problem-solving process.
Project Website
agent

Behavior Cloned Transformers are Neurosymbolic Reasoners
Ruoyao Wang, Peter Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EACL 2023
TLDR: Shows that neurosymbolic agents that combine behavior cloning with symbolic modules (calculator, GPS, knowledge-base lookup, sorter) can essentially completely solve 4 benchmark games from TextWorldExpress.
agent

Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions
Chen Feng Tsai, Xiaochen Zhou, Sierra S. Liu, Jing Li, Mo Yu, Hongyuan Mei — Arxiv 2023
TLDR: The experiments show that ChatGPT performs competitively compared to all the existing systems but still exhibits a low level of intelligence, which opens up new research questions at the intersection of artificial intelligence, machine learning, and natural language processing.
agent

A Minimal Approach for Natural Language Action Space in Text-based Games
Dongwon Kelvin Ryu, Meng Fang, Shirui Pan, Gholamreza Haffari, Ehsan Shareghi — Arxiv 2023
TLDR: This paper revisits the challenge of exploring the action space in TGs and proposes a minimal approach of utilizing admissible actions, for training phase and presents a text-based actor-critic (TAC) agent that produces textual commands for game, solely from game observations, without requiring any KG or LM.
agent

Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Xiaoyu Chen, Shenao Zhang, Pushi Zhang, Li Zhao, Jianyu Chen — Arxiv 2023
TLDR: This work proposes ABA, a method that empowers the agent to proactively query external sources for pertinent information using natural language during their interactions in the environment, and demonstrates that by imitation learning, ABA effectively retains and reuses queried and known information in subsequent tasks, mitigating the need for repetitive inquiries.
agent

Plan, Eliminate, and Track – Language Models are Good Teachers for Embodied Agents
Yue Wu, So Yeon Min, Yonatan Bisk, R. Salakhutdinov, A. Azaria, Yuan-Fang Li, Tom M. Mitchell, Shrimai Prabhumoye — Arxiv 2023
TLDR: A framework to use the knowledge in LLMs to simplify the control problem, rather than solving it is proposed, which leads to a significant 15% improvement over SOTA for generalization to human goal specifications.
agent

Knowledge-enhanced Agents for Interactive Text Games
P. Chhikara, Jiarui Zhang, Filip Ilievski, Jonathan M Francis, Kaixin Ma — Arxiv 2023
TLDR: This paper proposes a framework for enabling improved functional grounding of agents in text-based games, and considers two forms of domain knowledge that are injected into learning-based agents: memory of previous correct actions and affordances of relevant objects in the environment.
agent

ScienceWorld: Is your Agent Smarter than a 5th Grader?
Ruoyao Wang, Peter Alexander Jansen, Marc-Alexandre Côté, Prithviraj Ammanabrolu — EMNLP 2022
TLDR: Presents ScienceWorld, a high-fidelity simulator with 30 different elementary science tasks (such as boiling water, building electrical circuits, or determining whether a inherited characteristic is dominent or recessive), powered by simplified simulation engines for thermodynamics, electricity, chemistry, forces, and life processes. Shows that a behavior cloning agent based on a question answering model that can answer most elementary science questions correctly gets extremely low performance on those same tasks when expressed as a virtual environment, highlighting the importance of interactivity when evaluating reasoning.
Project Website
agent environment simulator

Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Shunyu Yao, Karthik Narasimhan, Matthew Hausknecht — NAACL 2021
TLDR: A really interesting paper that shows TextWorld RL agents likely aren’t really using the text. (a) They replace text with hashcodes and show *improved* task performance, (b) They propose a new representation method “inverse dynamics” that performs better on Zork, where t-SNE plots shows that (i) the baseline model groups state observations by whether they have been seen/are unseen, while (ii) the inverse-dynamics method causes states to be grouped by semantic similarity.
Project Website
agent

Language Models are Few-Shot Butlers
Vincent Micheli, Franccois Fleuret — EMNLP 2021
TLDR: Shows GPT2 can do well at AlfWorld after pretraining. Very similar to Jansen (“Visually-grounded planning without vision”, EMNLP 2020).
agent

How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston — NAACL 2021
TLDR: Introduces (a) LIGHT-Quests, a crowdsourced dataset of quests based on the LIGHT environment, (b) ATOMIC-LIGHT, a common-sense knowledge graph with related knowledge. Also trains a transformer agent to act in this environment.
agent data social

Learning Dynamic Belief Graphs to Generalize on Text-Based Games
Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, William L. Hamilton — NeurIPS 2020
TLDR: GATA (Graph-aided Transformer Agent) model: Infers and updates latent beliefs during planning. Evaluated on CookingWorld. (Note: Contains huge, interesting visualizations in Appendix).
Project Website
agent

Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Shunyu Yao, Rohan Rao, Matthew Hausknecht, Karthik Narasimhan — EMNLP 2020
TLDR: CALM model: One of the first demonstrations that language models could be used to generate next valid actions for text games, after being trained on gold playthroughs, to reduce the reliance on the “valid action handicap”. Trains a GPT2 agent to generate a shortlist of valid actions, then uses a model similar to a DRRN to choose one action from this shortlist. Evaluated on Jericho games.
Project Website
agent

Visually-Grounded Planning without Vision: Language Models Infer Detailed Plans from High-level Instructions
Peter Jansen — EMNLP 2020
TLDR: Converted the 3D ALFRED challenge to a text-only trajectory task (pre-AlfWorld), showing GPT2 could recover nearly 50% of gold trajectories. Empirically demonstrated that language models contain a variety of common-sense/pick-and-place information that can be queried with text alone (i.e. no need for visual information).
Project Website
agent

How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
Prithviraj Ammanabrolu, Ethan Tien, Matthew Hausknecht, Mark O. Riedl — Arxiv 2020
TLDR: Q-BERT/MCQ-BERT: Shows that asking simple questions at each step (e.g. Where am I? What do I have? What do I see?) helps improve performance on Zork and other Jericho games.
Project Website
agent

Playing Text-Based Games with Common Sense
Sahith Dambekodi, Spencer Frazier, Prithviraj Ammanabrolu, Mark O. Riedl — Arxiv 2020
TLDR: Shows that using knowledge from the COMET knowledge base helps an agent learn faster in the 9:05 interactive fiction game, which is part of the Jerico benchmark. Uses the Q*BERT model.
agent

Enhancing Text-based Reinforcement Learning Agents with Commonsense Knowledge
K. Murugesan, Mattia Atzeni, Pushkar Shukla, Mrinmaya Sachan, Pavan Kapanipathi, Kartik Talamadupula — Arxiv 2020
TLDR: Shows that knowledge from ConceptNet can be combined with a GATA to improve performance/sample efficiency on two environments: KitchenCleanup and CookingWorld.
agent

Graph Constrained Reinforcement Learning For Natural Language Action Spaces
Prithviraj Ammanabrolu, Matthew Hausknecht — ICLR 2020
TLDR: KG-DQN: Builds a knowledge graph of a text world environment using OpenIE triples, and uses this for next-action selection through an Advantage Actor Critic model. Evaluated on Jericho games.
Project Website
agent

Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement Learning
Xiaoxiao Guo, M. Yu, Yupeng Gao, Chuang Gan, Murray Campbell, S. Chang — EMNLP 2020
TLDR: Reframes playing interactive fiction games as a multi-paragraph reading comprehension (MPRC) problem. Evaluates on Jericho games.
Project Website
agent

Algorithmic Improvements for Deep Reinforcement Learning applied to Interactive Fiction
Vishal Jain, William Fedus, Hugo Larochelle, Doina Precup, Marc G. Bellemare — AAAI 2020
TLDR: Proposes two algorithmic improvements to Deep Relevance Networks for interactive fiction. Evaluates on SaladWorld (a TextWorld environment) and Zork.
agent

Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Y. Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, C. Zhang — NeurIPS 2020
TLDR: Describes a reinforcement learning method that uses stacked heirarchical attention on knowledge graphs. Evaluates on Jericho games.
Project Website
agent

Exploration Based Language Learning For Text-Based Games
Andrea Madotto, M. Namazifar, J. Huizinga, Piero Molino, Adrien Ecoffet, Huaixiu Zheng, A. Papangelis, Dian Yu, Chandra Khatri, G. Tur — IJCAI 2020
TLDR: Shows that the Go-Explore algorithm can outperform various baseline models (such as a DRRN) in terms of accuacy and sample efficiency. Evaluated on two TextWorld games: coin collector, and cooking world.
agent

Bootstrapped Q-learning with Context Relevant Observation Pruning to Generalize in Text-based Games
Subhajit Chaudhury, Daiki Kimura, Kartik Talamadupula, Michiaki Tatsubori, Asim Munawar, Ryuki Tachibana — EMNLP 2020
TLDR: Shows that RL agents trained on one TextWorld generally do not generalize well to other TextWorlds. Proposes a pruning method that appears to help speed generalization.
Project Website
agent

Zero-Shot Learning of Text Adventure Games with Sentence-Level Semantics
Xusen Yin, Jonathan May — Arxiv 2020
TLDR: Proposes a method of using a Siamese Neural Network with Deep Q Learning to promote transfer learning across environments. Evaluates on CookingWorld, and demonstrates transfer learning to a treasure hunting game (both Text World).
agent

LeDeepChef Deep Reinforcement Learning Agent for Families of Text-Based Games
Leonard Adolphs, T. Hofmann — AAAI 2020
TLDR: LeDeepChef: 2nd place winner in shared task. Uses a combination of ranking, actor-critic, and feudal learning.
agent sharedtask

I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam — Arxiv 2020
TLDR: Proposes a new task in the LIGHT environment: The player must say something that causes an agent (computer) to perform a specific action (e.g. put on chain mail) or use a specific emotive. Creates an RL agent that is able to succeed at this task about half the time.
agent social

Playing Text-Adventure Games with Graph-Based Deep Reinforcement Learning
Prithviraj Ammanabrolu, Mark O. Riedl — NAACL 2019
TLDR: KG-DQN: Builds a knowledge graph of a text world environment using OpenIE triples, and uses this for next-action selection through a DQN. Evaluates on TextWorld.
Project Website
agent

Transfer in Deep Reinforcement Learning using Knowledge Graphs
Prithviraj Ammanabrolu, Mark O. Riedl — Textgraphs 2019
TLDR: Shows that using knowledge graphs as domain (or genre)-specific priors helps improve transfer learning to different same-genre textworlds (e.g. horror, sci-fi, soap-opera). Refers to this as “knowledge graph seeding”. Evaluates on TextWorld, and several Jericho games.
agent

Comprehensible Context-driven Text Game Playing
Xusen Yin, Jonathan May — CoG 2019
TLDR: Shows that using a “fast CNN” in place of an LSTM in a DQN can provide speed and accuracy improvements. Evaluates on Zork.
agent

Learn How to Cook a New Recipe in a New House: Using Map Familiarization, Curriculum Learning, and Bandit Feedback to Learn Families of Text-Based Adventure Games
Xusen Yin, Jonathan May — Arxiv 2019
TLDR: Proposes a number of methods to promote agent generalization. Evaluates on CookingWorld.
agent

NAIL: A General Interactive Fiction Agent
M. Hausknecht, R. Loynd, Greg Yang, Adith Swaminathan, J. Williams — Arxiv 2019
TLDR: NAIL: Winner of 2018 shared task. Uses a variety of “decision modules” (e.g. hoarder, examiner, interactor), which interact with a knowledge graph representation.
agent sharedtask

RDF* Graph Database as Interlingua for the TextWorld Challenge
Guntis Barzdins, D. Gosko, Paulis F. Barzdins, Uldis Lavrinovics, Gints Bernans, E. Celms — CoG 2019
TLDR: Shared task agent. Uses a split “Actor and Observer” architecture, where these two modules communicate through an “RDF* database”. Database serves as the world model, and is updated in part by using FrameNet to interpret observations.
agent sharedtask

Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces
Chen Tessler, Tom Zahavy, Deborah Cohen, Daniel J. Mankowitz, Shie Mannor — Arxiv 2019
TLDR: Suggests it can solve Zork with 10 million interactions through a combination of compressive sensing and immitation learning.
agent

Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning
Tom Zahavy, Matan Haroush , Nadav Merlis, Daniel J. Mankowitz, Shie Mannor — NeurIPS 2018
TLDR: Action-Elimination Deep Q-Network (AE-DQN): Shows that learning to predict and discard irrelevant actions can improve agent performance. Evaluated on Zork.
agent

Language Expansion In Text-Based Games
Ghulam Ahmed Ansari, P. SagarJ., A. P. S. Chandar, Balaraman Ravindran — Arxiv 2018
TLDR: Works towards greating more generic RL text game agents using policy distilliation. Evaluates on Home World.
agent

Counting to Explore and Generalize in Text-based Games
Xingdi Yuan, Marc-Alexandre Cote, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler — ERL 2018
TLDR: Uses an “episodic count-based exploration scheme” for counting state space states, to encourage an RL model to explore states it hasn’t been to. Evaluates on Coin Collector.
agent

What can you do with a rock? Affordance extraction via word embeddings
Nancy Fulda and Daniel Ricks and Ben Murdoch and David Wingate — IJCAI 2017
TLDR: Attempts to distill commonsense affordances from Wikipedia embeddings using the analogy embedding paradigm (e.g. king:queen::man:woman), then uses these to improve performance of an RL agent on Z-Machine games (including Zork)
agent

Text-based Adventures of the Golovin AI Agent
Bartosz Kostka, Jarosław Kwiecien, Jakub Kowalski, Paweł Rychlikowski — CIG 2017
TLDR: Golovin model: An agent that combines a variety of methods (language models, LSTM, rules) to perform comparably to the 2016 Shared Task Winner @ Zork and other similar games.
agent

Deep Reinforcement Learning with a Natural Language Action Space
Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf — ACL 2016
TLDR: DRRN (Deep Reinforcement Relevance Network). Learns separate representations of the state and action space, which are combined with an interactive function to approximate a Q function for deep reinforcement learning. Strong early RL baseline for most games, frequently compared against.
agent

Language Understanding for Text-based Games using Deep Reinforcement Learning
Karthik Narasimhan, Tejas D. Kulkarni, R. Barzilay — EMNLP 2015
TLDR: LSTM-DQN: Early paper showing RL can be applied to text-based adventure games. Learns state and action representations jointly. Evaluates on two games (HomeWorld and FantasyWorld)
Project Website
agent

 

Data

Papers that describe data or resources.

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, Hanlin Zhang, Scott Emmons, Dan Hendrycks — ICML 2023
TLDR: Introduces the MACHIAVELLI benchmark, a set of 134 choose-your-own-adventure games from choiceofgames.com that have been automatically annotated with a large (2.8M) set of labels from GPT-4 detailing whether chosen actions represent harmful behaviors. Evaluates performance of baseline agents (Random, DRRN, and GPT-3.5/4) on a large set of ethical measures.
Project Website
data environment

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap — Arxiv 2023
TLDR: SOTOPIA is an open-ended dialog task for role playing, similar to LIGHT, with a set of specific characters and scenarios to role play. Evaluates human and GPT-4 baselines on the social interaction task. Provides a set of 7 automatic metrics for evaluating these social interactions, showing moderate agreement with human judgments.
Project Website
data environment social

Ontologically Faithful Generation of Non-Player Character Dialogues
Nathaniel Weir, Ryan Thomas, Randolph D’Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani — Arxiv 2023
TLDR: Introduces KNUDGE, a knowledge-constrained NPC dialogue dataset, where models must author a complex dialogue tree between players and NPCs according to a large set of persona/lore and quest specification passages. The data is drawn from a real RPG (The Outer Worlds). Introduces a series of automatic and human evaluation protocols for the task.
data social

FIREBALL: A Dataset of Dungeons and Dragons Actual-Play with Structured Game State Information
Andrew Zhu, Karmanya Aggarwal, Alexander Feng, Lara Martin, Chris Callison-Burch — ACL 2023
TLDR: This paper presents FIREBALL, a large dataset of sessions from real Dungeons and Dragons (D&D) gameplay on Discord with true game state information. The true game states are intended to help large language models generate better game rounds.
Project Website
data

Modeling Worlds in Text
Prithviraj Ammanabrolu, Mark O. Riedl — Arxiv 2021
TLDR: JerichoWorld: Large dataset for generating knowledge graphs from interactive fiction games. Collected from 27 interactive fiction games (from the Jericho dataset).
data

How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston — NAACL 2021
TLDR: Introduces (a) LIGHT-Quests, a crowdsourced dataset of quests based on the LIGHT environment, (b) ATOMIC-LIGHT, a common-sense knowledge graph with related knowledge. Also trains a transformer agent to act in this environment.
agent data social

Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, Jason Weston — EMNLP 2019
TLDR: Introduces LIGHT: A dataset and model for dialog interactions in a text-based role playing environment. Includes a large set of crowdsourced dialog about rooms/objects/etc.
Project Website
data environment social

 

Position Papers

Position papers.

Language (Re)modelling: Towards Embodied Language Understanding
Ronen Tamari, Chen Shani, Tom Hope, Miriam R. L. Petruck, Omri Abend, Dafna Shahaf — ACL 2020
TLDR: Why and how text-based games play a key role in training NLU systems with more human like inductive biases of grounded mental simulation and metaphoric inference.
position

Dungeons and DQNs: Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games
Lara J. Martin, Srijan Sood, Mark O. Riedl — INT-WICED 2018
TLDR: Position paper that proposes using table-top role-playing games (like Dungeons and Dragons) as a challenge task.
position

 

Shared Tasks

Papers that describe shared tasks (either by participating teams, or organizer summaries).

LeDeepChef Deep Reinforcement Learning Agent for Families of Text-Based Games
Leonard Adolphs, T. Hofmann — AAAI 2020
TLDR: LeDeepChef: 2nd place winner in shared task. Uses a combination of ranking, actor-critic, and feudal learning.
agent sharedtask

First TextWorld Problems
Marc-Alexandre Côté, Xingdi Yuan, Adam Trischler, Wendy Tay — CoG 2020
TLDR: Blog post about the result of the First TextWorld Problems competition (including the popular CookingWorld environment)
Project Website
environment sharedtask

NAIL: A General Interactive Fiction Agent
M. Hausknecht, R. Loynd, Greg Yang, Adith Swaminathan, J. Williams — Arxiv 2019
TLDR: NAIL: Winner of 2018 shared task. Uses a variety of “decision modules” (e.g. hoarder, examiner, interactor), which interact with a knowledge graph representation.
agent sharedtask

RDF* Graph Database as Interlingua for the TextWorld Challenge
Guntis Barzdins, D. Gosko, Paulis F. Barzdins, Uldis Lavrinovics, Gints Bernans, E. Celms — CoG 2019
TLDR: Shared task agent. Uses a split “Actor and Observer” architecture, where these two modules communicate through an “RDF* database”. Database serves as the world model, and is updated in part by using FrameNet to interpret observations.
agent sharedtask

The Text-Based Adventure AI Competition
Timothy Atkinson, Hendrik Baier, Tara Copplestone, Sam Devlin, Jerry Swan — IEEE Trans. Games 2019
TLDR: Summary Paper for the 2016, 2017, and 2018 Shared Tasks on Text-based Adventure AI
sharedtask

 

Social Agents

Papers that describe dialog, such as agent-user communication, or agent-agent communciation.

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents
Xuhui Zhou, Hao Zhu, Leena Mathur, Ruohong Zhang, Haofei Yu, Zhengyang Qi, Louis-Philippe Morency, Yonatan Bisk, Daniel Fried, Graham Neubig, Maarten Sap — Arxiv 2023
TLDR: SOTOPIA is an open-ended dialog task for role playing, similar to LIGHT, with a set of specific characters and scenarios to role play. Evaluates human and GPT-4 baselines on the social interaction task. Provides a set of 7 automatic metrics for evaluating these social interactions, showing moderate agreement with human judgments.
Project Website
data environment social

Ontologically Faithful Generation of Non-Player Character Dialogues
Nathaniel Weir, Ryan Thomas, Randolph D’Amore, Kellie Hill, Benjamin Van Durme, Harsh Jhamtani — Arxiv 2023
TLDR: Introduces KNUDGE, a knowledge-constrained NPC dialogue dataset, where models must author a complex dialogue tree between players and NPCs according to a large set of persona/lore and quest specification passages. The data is drawn from a real RPG (The Outer Worlds). Introduces a series of automatic and human evaluation protocols for the task.
data social

Towards Socially Intelligent Agents with Mental State Transition and Human Utility
Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu — SIGDIAL 2022
TLDR: Builds a “mental state parser” for representing the mental states of agents as a graph. Evaluated on LIGHT.
social

How to Motivate Your Dragon: Teaching Goal-Driven Agents to Speak and Act in Fantasy Worlds
Prithviraj Ammanabrolu, Jack Urbanek, Margaret Li, Arthur Szlam, Tim Rocktäschel, Jason Weston — NAACL 2021
TLDR: Introduces (a) LIGHT-Quests, a crowdsourced dataset of quests based on the LIGHT environment, (b) ATOMIC-LIGHT, a common-sense knowledge graph with related knowledge. Also trains a transformer agent to act in this environment.
agent data social

I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents
Shrimai Prabhumoye, Margaret Li, Jack Urbanek, Emily Dinan, Douwe Kiela, Jason Weston, Arthur Szlam — Arxiv 2020
TLDR: Proposes a new task in the LIGHT environment: The player must say something that causes an agent (computer) to perform a specific action (e.g. put on chain mail) or use a specific emotive. Creates an RL agent that is able to succeed at this task about half the time.
agent social

Learning to Speak and Act in a Fantasy Text Adventure Game
Jack Urbanek, Angela Fan, Siddharth Karamcheti, Saachi Jain, Samuel Humeau, Emily Dinan, Tim Rocktaschel, Douwe Kiela, Arthur Szlam, Jason Weston — EMNLP 2019
TLDR: Introduces LIGHT: A dataset and model for dialog interactions in a text-based role playing environment. Includes a large set of crowdsourced dialog about rooms/objects/etc.
Project Website
data environment social

 

Surveys

Survey papers.

A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Peter Jansen — Wordplay 2022
TLDR: Presents an in-depth survey of text game agents including motivations/position (why use text games?), simulators (and how they compare to 2D/3D simulators), environments, agents, and contemporary/future directions.
survey

A Survey of Text Games for Reinforcement Learning Informed by Natural Language
Philip Osborne, Heido Nõmm, André Freitas — TACL 2022
TLDR: A survey of text games as they relate to being modelled as reinforcement learning problems.
survey

 

Other

Other papers.

Interactive Language Learning by Question Answering
Xingdi Yuan, Marc-Alexandre Cote, Jie Fu, Zhouhan Lin, Christopher Pal, Yoshua Bengio, Adam Trischler — EMNLP 2019
TLDR: Introduces QAit (Question Answering with Interactive Text), a question answering task where answers must be gathered by interacting with a text game (CookingWorld). Includes 3 question types centered around properties: locationOf, existanceOf, and getProperty.
other

Ceptre: A Language for Modeling Generative Interactive Systems
Chris Martens — AIIDE 2015
TLDR: Presents Linear Logic, a formalism for modeling interactive fiction (and used by TextWorld).
other

 

Last updated: Feb 06, 2024

Submit your paper to the Text Game Research List