Link : https://arxiv.org/pdf/2411.00114
(Parallel Information Aggregation via Neural Orchestration) architecture
Notes
- Using PIANO, author shows’ that agents form their own professional identities, obey collective rules, transmit cultural information and exert religious influence, and use sophisticated infrastructures, such as legal systems.
- Just as a pianist coordinates multiple notes to create a harmony, the PIANO architecture selectively and concurrently executes various modules in parallel to enable agents to interact with the environment in real-time.
Why is it hard to build AI civilizations
- Single agents don’t make progress: LLM-powered agents often struggle to maintain a grounded sense of reality in their actions and reasoning. Even a small rate of hallucinations can poison downstream agent behavior when agents continuously interact with the environment via LM calls
- Groups of agent’s don’t make progress: Agents that miscommunicate their thoughts and intents can mislead other agents, causing them to propagate further hallucinations and loop.
- A lack of benchmarks for civilizational progress: There is a lack of large-scale benchmarks that can attributed to how technically difficult it is to perform simulations of hundreds or thousands of agents in a single world.
PIANO Architecture
- Author propose two brain-inspired design principles for the composite architecture of human-like AI agents.
- There are two main problems with agent architecture
- Concurrency
- Coherence
-
Concurrency
- P: Agent should be able to think and act concurrently. Agent be able to interact in real time with low-latency, but also have the capacity to slowly deliberate and plan.
- CS: Vast majority of LLM-based agents today primarily use single-threaded, sequential functions. None are designed for concurrent programming.
- S: The brain solves this problem by running different modules concurrently and at different time scales.
- Author designed modules, such as
- cognition,
- planning,
- motor execution,
- and speech
- to run concurrently in agent brain.
- Each module can be seen as a stateless function that reads and writes to a shared Agent State.
- Author designed modules, such as
-
Coherence
- P: An immediate challenge with concurrent modules is that they can produce independent outputs, making the agent incoherent. For instance, agents say one thing but actually do something else.
- Incoherence also scales exponentially as the number of independent output modules increases
- CS: The incoherence problem is usually not obvious for sequential architectures.
- But a significant problem when multiple output modules can interface with the environment.
- S: To ensure that the multiple outputs produced by agent are coherent, author introduced a Cognitive Controller (CC) module that is solely responsible for making high-level deliberate decisions.
- P: An immediate challenge with concurrent modules is that they can produce independent outputs, making the agent incoherent. For instance, agents say one thing but actually do something else.
Cognitive Controller and Core modules
- Cognitive Controller is solely responsible for making high-level deliberate decisions.
- The decisions are then translated downstream to produce appropriate output in each motor module
- The Cognitive Controller synthesizes information across the Agent State through a bottleneck.
- This design of a bottlenecked decision-maker that broadcasts its outputs has been suggested as a core ingredient for human consciousness and is used in some neural network architectures.
-
Core Modules
- Memory: Stores and retrieves conversations, actions, and observations across various timescales.
- Action Awareness: Allows agents to assess their own state and performance, enabling for moment-by-moment adjustments.
- Goal Generation: Facilitates the creation of new objectives based on the agent’s experiences and environmental interactions.
- Social Awareness: Enables agents to interpret and respond to social cues from other agents, supporting cooperation and communication.
- Talking: Interprets and generates speech.
- Skill Execution: Performs specific skills or actions within the environment.
Experiments
- There is bunch of really fun and expected outcomes, i suggest reading through the paper in detail to enjoy it completely. Link above.