Lessons learned from modeling AI takeoffs, and a call for collaboration

I am now able to think about things like AI risk and feel like the concepts are real, not just verbal. This was the point of my modeling the world project. I’ve generated a few intuitions around what’s important in AI risk, including a few considerations that I think are being neglected. There are a few directions my line of research can be extended, and I’m looking for collaborators to pick this up and run with it.

I intend to write about much of this in more depth, but since EA Global is coming up I want a simple description to point to here. These are just loose sketches, so I'm trying to describe rather than persuade.

New intuitions around AI risk

  • Better cybersecurity norms would probably reduce the chance of an accidental singleton.
  • Transparency on AI projects’ level of progress reduces the pressure towards an AI arms race.
  • Safety is convergent - widespread improvements in AI value alignment improve our chances at a benevolent singleton, even in an otherwise multipolar dynamic.

Continue reading

Update 8: Prioritization, finishing, inventories, and working memory

My friend Satvik recently told me about an important project management intuition he’d acquired: it’s a very bad sign to have a lot of projects that are “90% complete”. This is bad for a few reasons, including:

  • Inventory: For any process that makes things, it’s a substantial savings to have a smaller inventory. A manufacturer buys raw inputs, does work on them, and ships them to a customer. Every moment between the purchase of inputs and the delivery of finished goods is a cost to the manufacturer, because of the time value of money. Smaller inventories are what it looks like to have a faster turnaround. If a lot of your projects are 90% complete, that means you’re spending a lot of time having invested a lot of work into them, but realizing none of the value of the finished product.
  • Power law: Some projects might be much more important than others. If you’re allocating time and effort evenly among many projects, you may be wasting a lot of it.
  • Quality of time estimates: Things may be sitting at “90%” because they keep seeming “almost done” even as you put a lot of additional work into them. If you’re using faulty estimates of time to completion, this may make your cost-benefit calculations wrong.
  • Mental overhead: Even if it were somehow optimal for an ideal worker to handle a lot of projects simultaneously, in practice humans can’t perform very well like that. Conscious attention isn’t the only constraint - there are also only so many things you can productively fit on the “back burner”.

I decided to use this insight to prioritize my work. Things that are “90% done” should be frontloaded, and idea generation should be deprioritized until I’ve “closed out” more things and cleared up mental space. Continue reading

Update 7: conceptual vocabulary, writing, feedback, lit review

  1. I’m redefining my current research project: I’m limiting its current scope to creating a conceptual framework, not getting empirical results.
  2. I’m dropping my commitment to regular updates.
  3. I got some good feedback on my draft research hierarchy writeup. Core issues I’m going to emphasize are:
    • Conflict vs cooperation as drivers of AI progress during a takeoff scenario.
    • Creating disjunctive arguments that put considerations in some relation to each other and make it clear which scenarios are mutually exclusive alternatives to one another, which are elaborations on a basic dynamic, and which might overlap.
  4. I plan to review more of the existing on AI risk to see whether anyone else has already made substantial progress I don’t know about, on a conceptual framework for AI risk.

Continue reading

Update 6: working on a research hierarchy

Each time I talk through my research with someone, I seem to get another "click" moment as part of the implied structure falls into place. This in turn has helped me figure out what function I want my "big picture" model to serve.

An example of one of these "click" moments happened yesterday, when, trying to explain why the considerations I was thinking about formed a natural group and others should be excluded, I realized that I was trying to model an equilibrium where none of the agents modeled each other, as a simpler case to build more complex models on top of. Continue reading

Update 5: writing, organizing my research

The last few workdays have been tricky because the working style that seemed to serve me well for doing new research (running across things that catch my eye and then leaning into my desire to understand it better) didn't work very well for writing. In addition, it seems that I have to repeatedly expend executive effort just to allocate large uninterrupted blocks of time for me to work, well-rested, on this project. This week I've put into place some accountability measures that should help, such as committing to examine my social engagements in advance and clarify whether I genuinely expect them to be as valuable as time spent on this project.

On the object level, while writing up my initial model of how AI takeoff might be affected by the existence of a broader economy, I realized that I was really trying to do two different things:

  1. Mathematize my intuitions about AI takeoff scenarios where each AI can either work on itself or participate in the economy.
  2. Lay out a higher-level argument about how the above fits into the bigger picture.

My plan is to finish writing these two things up, and then go back and review my research priorities, with an eye towards creating a single view where I can see the overall structure of my current model, and changes I've made to it. (Then I'll go back to doing more object level research.)

Update 4: Refining the two-factor economic model of AI takeoff

Yesterday was an errand day, so I didn’t anticipate getting much done at all, but I ended up running into a few people who’d read this blog and talking about stuff on it, and making an appointment to talk with a couple of other people about AI timelines. Today I intended to put my economic model of AI takeoff into a sufficiently polished form that I could ask some friends for feedback pre-publication. Unfortunately, I was unsure what I wanted the finished thing to be, which presented some motivation challenges. I’ll be spending the rest of the workday, and some of tomorrow morning, thinking that through.

Update 2: Shortest paths, Arrogance

This was an abbreviated research day, partly because I spent the first couple of hours writing the prior two posts.

Shortest path to human-level AGI

The timelines research might spiral out to an unmanageable level of complexity real fast, so I though about ways to make it more tractable. We don’t really care about estimating all the timelines, just the shortest ones, or at least the shortest ones likely to lead to an intelligence explosion. So you could look at which paths to AI researchers are most excited about. Continue reading

Update 1: Research Agenda, Paths to AI

I took a bunch of days off after my first day for various reasons, during which I came up with my plan to summarize each day’s work, so this is a retrospective with a longer lag than I hope will be usual.

I began by tracing some of my uncertainty on what to do back to uncertainty about how the world works. I decided to focus on the likely timing and speed of an intelligence explosion, because if I end up with a strong answer about this, it could narrow down my plausible options a lot.

I focused mostly on the timing of human-level artificial general intelligence, leaving the question of whether a takeoff is likely to be fast or slow for later. I also decided to leave aside the question of existential risk from AI that isn’t even reliably superhuman, although I suspect that this is a substantial risk as well.

I enumerated a few plausible paths to human-level intelligence, and began looking into how long each might take. I was not able to get a final estimate for any path, but got as far as determining that the cost and availability of computing hardware is not likely to be the primary constraining factor after about ten years, so I can’t just extrapolate using Moore’s law. Predicting these timelines is going to require a model of how long the relevant theoretical or non-computing technical insights will take to generate. This will be messy. Continue reading