Aligning Machiavellian Agents: Presented at AAAI 2026

January 5, 2026
Kitware @ Association for the Advancement of Artificial Intelligence. January 20-27, 2026, Singapore.

January 20–27, 2026 | Singapore

The 40th AAAI Conference is one of the premier venues for artificial intelligence research. At the conference, Kitware is showcasing how researchers and industry teams are advancing AI through scalable, trustworthy intelligent systems. Our open source tools enable organizations to build, evaluate, and deploy AI solutions that handle large-scale datasets and accelerate model development for real-world, mission-critical applications.

Kitware is pleased to have a paper accepted to this conference on the alignment of decision-making reinforcement learning agents. This work represents a natural extension of Kitware’s leadership in responsible and trustworthy AI.

Kitware’s Activities and Involvement

Aligning Machiavellian Agents

Title: Aligning Machiavellian Agents: Behavior Steering via Test-Time Policy Shaping
Authors: Dena Mujtaba, Brian Hu, Anthony Hoogs, and Arslan Basharat

AI Alignment Track
Paper ID AIA297 in Poster Session # 3 on January 24 at 12:00 – 2:00 PM SGT

This work introduces a test-time policy-shaping method to align reinforcement learning agents, enabling fine-grained behavioral control without modifying reward functions or retraining. Lightweight, attribute-specific classifiers evaluate properties of the candidate actions and adjust action probabilities during inference to balance reward maximization with alignment objectives. Evaluations on the MACHIAVELLI benchmark show that the approach generalizes across diverse text-based environments, mitigates power-seeking and unethical behavior, and supports nuanced trade-off analysis between alignment attributes and task performance, offering a scalable and adaptable alternative to training-time alignment methods.

Anthony Hoogs, Ph.D., served as a Program Committee member for the AI Alignment track and as a Senior Program Committee member for the main conference.

Arslan Basharat, Ph.D., Director of Multimodal AI at Kitware, will attend AAAI 2026 in person and present our poster on January 24 at 12:00 – 2:00 PM SGT.

For inquiries or to schedule an in-person meeting at AAAI, please visit our Contact Us page.

Related Research in Responsible and Human-Aligned AI

In addition to our AAAI 2026 contributions to aligning reinforcement learning agents, Kitware is advancing explainable, responsible decision-making through our recent work in DARPA’s In the Moment (ITM) program. In Phase 1, we demonstrated that AI systems can align with human judgment and values in complex, high-stakes environments such as medical triage through interpretable, value-based reasoning. Building on that foundation, Phase 2 extends these alignment capabilities to cybersecurity, where AI must balance competing objectives while maintaining trust in dynamic, mission-critical scenarios.

Leaders in Artificial Intelligence Research and Scalable Scientific Computing

With more than 20 years of experience in AI, machine learning, and scientific computing, Kitware delivers solutions that drive innovation across research and industry.

Our technical focus areas include:

  • Machine learning and neural model development.
  • Explainable and responsible AI for high-stakes use cases.
  • Distributed computing and data management.
  • Simulation and AI-driven modeling.
  • Interactive and in situ visualization.

Our expertise spans defense, healthcare, climate science, and autonomous systems—domains that demand rigor, transparency, and reliability. Through open source development and collaborative R&D, we provide tools and support for reproducible, scalable, and effective AI deployment.

Contact Us

Leave a Reply