Data | ET Mueller

From Hayek to RL

Hayek’s 1945 essay The Use of Knowledge in Society can be read as an early account of reinforcement learning. Both explore how decisions are coordinated in environments where information is incomplete, decentralized, and constantly changing.

Hayek begins with a world that resembles a Markov Decision Process: complete information, fixed preferences, known transitions. But he stresses that such a world does not exist. Real economies are partially observable multi-agent systems where each participant adapts using local feedback.

He contrasts three forms of planning:

Central planning → single-agent model-based RL
Decentralized competition → multi-agent RL
Delegated authority → hierarchical RL

Prices become the shared reward signal that allows decentralized agents to converge on coordinated behavior without full observability. When prices are flexible, learning is fast; when rigid, feedback collapses.

Hayek’s arbitrageur—“who gains from local differences of commodity prices”—is the prototype of an adaptive learning agent. Each trader transmits information by exploiting local signals.

He anticipated many modern RL challenges: abstraction versus detail, generalization versus local accuracy, and the impossibility of perfect global knowledge. Economies and RL systems progress by designing incentives that encourage agents to learn the “desirable things” without central control — what we now call reward shaping.

From Hayek to RL

AI Resources