Over the past several weeks I’ve been working on a document laying out my current thinking around likely AI takeoff scenarios. I’ve been calling it a hierarchical conceptual framework, lacking a more pithy or clear term for it. In the process of getting feedback on my drafts it’s become clear to me that it’s nonobvious what sort of thing I’m writing, and why I’d write it.
It’s become increasingly clear to me, thinking about AI risk why I don’t feel I have a firm understanding of the discourse around it:
- People often notice they disagree by disagreeing on the truth of some proposition close to end outcomes, but this often implies very different models of what’s going on.
- Fundamental disagreements around which models are most applicable is hidden at the foundation of positive arguments, rather than formalized in a disjunctive way.
- Even when people do notice that they’re disagreeing about which model to use (and not just the parameters of a shared model), and argue about that, they don’t seem to formally segment off that part of the argument in a way that’s easy for a reader to follow - instead, all the arguments still seem to be worded in a way that points them towards the end goal of a specific end outcome prediction or policy recommendation rather than the proximate disagreement.
The conceptual framework I am trying to build, is one that articulates those disjunctions that I commonly see underly disagreements, contextualizing a range of models as the result of differing beliefs about the world, such that if you adjust parameters in this underlying model, it generates different local approximations that correspond to models different people are using to think about AI risk. I’m trying to make it as formal and explicit as I can, in order to make it easy to classify different types of disagreement. I’m trying to make it hierarchical so that we can see which beliefs screen off other disagreements, establishing chains of relevance.
In order to keep the complexity of this framework manageable, I am not dealing directly at all with AI safety. Instead, I’m starting with an amoral stance and - before asking the question “what promotes safety vs danger?”, asking the more fundamental question, “what might happen?”.