Lessons learned from modeling AI takeoffs, and a call for collaboration

I am now able to think about things like AI risk and feel like the concepts are real, not just verbal. This was the point of my modeling the world project. I’ve generated a few intuitions around what’s important in AI risk, including a few considerations that I think are being neglected. There are a few directions my line of research can be extended, and I’m looking for collaborators to pick this up and run with it.

I intend to write about much of this in more depth, but since EA Global is coming up I want a simple description to point to here. These are just loose sketches, so I'm trying to describe rather than persuade.

New intuitions around AI risk

  • Better cybersecurity norms would probably reduce the chance of an accidental singleton.
  • Transparency on AI projects’ level of progress reduces the pressure towards an AI arms race.
  • Safety is convergent - widespread improvements in AI value alignment improve our chances at a benevolent singleton, even in an otherwise multipolar dynamic.

Continue reading

Modeling intelligence as a project-specific factor of production

The possibility of recursive self-improvement is often brought up as a reason to expect that an intelligence explosion is likely to result in a singleton - a single dominant agent controlling everything. Once a sufficiently general artificial intelligence can make improvements to itself, it begins to acquire a compounding advantage over rivals, because as it increases its own intelligence, it increases its ability to increase its own intelligence. If returns to intelligence are not substantially diminishing, this process could be quite rapid. It could also be difficult to detect in its early stages because it might not require a lot of exogenous inputs.

However, this argument only holds if self-improvement is not only a rapid route to AI progress, but the fastest route. If an AI participating in the broader economy could make advantageous trades to improve itself faster than a recursively self-improving AI could manage, then AI progress would be coupled to progress in the broader economy.

If algorithmic progress (and anything else that might seem more naturally a trade secret than a commodity component) is shared or openly licensed for a fee, then a cutting-edge AI can immediately be assembled whenever profitable, making a single winner unlikely. However, if leading projects keep their algorithmic progress secret, then the foremost project could at some time have a substantial intelligence advantage over its nearest rival. If an AI project attempting to maximize intelligence growth would devote most of its efforts towards such private improvements, then the underlying dynamic begins to resemble the recursive self-improvement scenario.

This post reviews a prior mathematization of the recursive self-improvement model of AI takeoff, and then generalizes it to the case where AIs can allocate their effort between direct self-improvement and trade.

A recalcitrance model of AI takeoff

In Superintelligence, Nick Bostrom describes a simple model of how fast an intelligent system can become more intelligent over time by working on itself. This exposition loosely follows the one in the book.

We can model the intelligence of the system as a scalar quantity I, and the work, or optimization power, applied to system in order to make it more intelligent, as another quantity O. Finally, at any given point in the process, it takes some amount of work to augment the system's intelligence by one unit. Call the marginal cost of intelligence in terms of work recalcitrance, R(I), which may take different values at different points in the progress. So, at the beginning of the process, the rate at which the system's intelligence increases is determined by the equation \frac{dI}{dt}=A\frac{O}{R(I)}.

We then add two refinements to this model. First, assume that intelligence is nothing but a type of optimization power, so I and O can be expressed in the same units. Second, if the intelligence of the system keeps increasing without limit, eventually the amount of work it will be able to put into things will far exceed that of the team working on it, so that I+O\approx I. R(I) is now the marginal cost of intelligence in terms of applied intelligence, so we can write \frac{dI}{dt}=A\frac{I}{R(I)}.

Constant recalcitrance

The simplest model assumes that recalcitrance is constant, R(I)=k. Then \frac{dI}{dt}=A\frac{I}{k}, or I=Be^{A\frac{t}{k}}. This implies exponential growth.

Declining recalcitrance

Superintelligence also considers a case where work put into the system yields increasing returns. Prior to takeoff, where I\ll O, this would look like a fixed team of researchers with a constant budget working on a system that always takes the same interval of time to double in capacity. In this case we can model recalcitrance as R(I)=\frac{1}{I}, so that \frac{dI}{dt}=A\frac{I}{R(I)}=A\frac{I}{\frac{1}{I}}=AI^2, so that I=\frac{A}{c-t} for some constant c, which implies that the rate of progress approaches infinity as t approaches c; a singularity.

How plausible is this scenario? In a footnote, Bostrom brings up Moore's Law as an example of increasing returns to input, although (as he mentions) in practice it seems like increasing resources are being put into microchip development and manufacturing technology, so the case for increasing returns is far from clear-cut. Moore's law is predicted by the experience curve effect, or Wright's Law, where marginal costs decline as cumulative production increases; the experience curve effect produces exponentially declining costs under conditions of exponentially accelerating production.1 This suggests that in fact accelerating progress is due to an increased amount of effort put into making improvements. Nagy et al. 2013 show that for a variety of industries with exponentially declining costs, it takes less time for production to double than for costs to halve.

Since declining costs also reflect broader technological progress outside the computing hardware industry, the case for declining recalcitrance as a function of input is ambiguous.

Increasing recalcitrance

In many cases where work is done to optimize a system, returns diminish as cumulative effort increases. We might imagine that high intelligence requires high complexity, and more intelligent systems require more intelligence to understand well enough to improve at all. If we model diminishing returns to intelligence as R(I)=I, then \frac{dI}{dt}=A\frac{I}{R(I)}=A\frac{I}{I}=A. In other words, progress is a linear function of time and there is no acceleration at all.

Generalized expression

The recalcitrance model can be restated as a more generalized self-improvement process with the functional form \frac{dI}{dt}=AI^\alpha:

\alpha=0 Increasing recalcitrance Constant progress
0<\alpha<1 Increasing recalcitrance Polynomial progress
\alpha=1 Constant recalcitrance Exponential progress
\alpha data-recalc-dims=1" /> Declining recalcitrance Singularity

Deciding between trade and self-improvement

Some inputs to an AI might be more efficiently obtained if the AI project participates in the broader economy, for the same reason that humans often trade instead of making everything for themselves. This section lays out a simple two-factor model of takeoff dynamics, where an AI project chooses how much to engage in trade.

Suppose that there are only two inputs into each AI: computational hardware available for purchase, and algorithmic software that the AI can best design for itself. Each AI project is working on a single AI running on a single hardware base. The intelligence of this AI depends both on hardware progress and software progress, and holding either constant, the other has diminishing returns. (This is broadly consistent with trends described by Grace 2013.) We can model this as I=\ln{SH}.

At each moment in time, the AI can choose whether to allocate all its optimization power to making money in order to buy hardware, improving its own algorithms, or some linear combination of these. Let the share of optimization power devoted to algorithmic improvement be A.

Assume further that hardware earned and improvements to software are both linear functions of the optimization power invested, so \frac{dS}{dt}=bIA, and \frac{dH}{dt}=cI(1-A).

What is the intelligence-maximizing allocation of resources A?

This problem can be generalized to finding A that maximizes I=f(X)=f(S^\alpha H^\beta) for any monotonic function f(X). This is maximized whenever X=S^\alpha H^\beta is maximized. (Note that this is no longer limited to the case of diminishing returns.)

This generalization is identical to the Cobb-Douglas production function in economics. If \alpha+\beta=1 then this model predicts exponential growth, if \alpha+\beta data-recalc-dims=1" /> it predicts a singularity, and if 0<\alpha+\beta<1 then it predicts polynomial growth.  The intelligence-maximizing value of A is A=\frac{\alpha}{\alpha+\beta}.2

In our initial toy model I=\ln{SH}, where \alpha=\beta=1, that implies that no matter what the price of hardware, as long as it remains fixed and the indifference curves are shaped the same, the AI will always spend exactly half its optimizing power working for money to buy hardware, and half improving its own algorithms.

Changing economic conditions

The above model makes two simplifying assumptions: that the application of a given amount of intelligence always yields the same amount in wages, and that the price of hardware stays constant. This section relaxes these assumptions.

Increasing productivity of intelligence

We might expect the productivity of a given AI to increase as the economy expands (e.g. if it discovers a new drug, that drug is more valuable in a world with more or richer people to pay for it). We can add a term exponentially increasing over time to the amount of hardware the application of intelligence can buy: \frac{dH}{dt}=ce^{gt}I(1-A).

This does not change the intelligence-maximizing allocation of intelligence between trading for hardware and self-improving.3

Declining hardware costs

We might also expect the long-run trend in the cost of computing hardware to continue. This can again be modeled as an exponential process over time, C=De^{-ht}. The new expression for the growth of hardware is \frac{dH}{dt}=c\frac{(1-A)}{e^{-ht}}=ce^{ht}(1-(A)), identical in functional form to the expression representing wage growth, so again we can conclude that A=\frac{\alpha}{\alpha+\beta}.

Maximizing profits rather than intelligence

AI projects might not reinvest all available resources in increasing the intelligence of the AI. They might want to return some of their revenue to investors if operated on a for-profit basis. (Or, if autonomous, they might invest in non-AI assets where the rate of return on those exceeded the rate of return on additional investments in intelligence.) On the other hand, they might borrow if additional money could be profitably invested in hardware for their AI.

If the profit-maximizing strategy involves less than 100% reinvestment, then whatever fraction of the AI's optimization power is reinvested should still follow the intelligence-maximizing allocation rule A=\frac{\alpha}{\alpha+\beta}, where A is now the share of reinvested optimization power devoted to algorithmic improvements.

If the profit-maximizing strategy involves a reinvestment rate of slightly greater than 100%, then at each moment the AI project will borrow some amount (net of interest expense on existing debts) W, so that the total optimization power available is I+\frac{W}{c}. Again, whatever fraction of the AI's optimization power is reinvested should still follow the intelligence-maximizing allocation rule A=\frac{\alpha}{\alpha+\beta}, where A is now the share of economically augmented optimization power I+\frac{W}{c} devoted to algorithmic improvements.

This strategy is no longer feasible, however, once \frac{W}{W+I} data-recalc-dims=\frac{\alpha}{\alpha+\beta}" />. Since by assumption hardware can be bought but algorithmic improvements cannot, at this point additional monetary investments will shift the balance of investment towards hardware, while 100% of the AI's own work is dedicated to self-improvement.

References   [ + ]

1. Moore's Law describes costs Y diminishing exponentially as a function of time t: Y=ae^{-bt}. Wright's law describes log costs diminishing as a linear function of log cumulative production X(t): \ln Y=-d\ln(-fX(t)). If cumulative production increases exponentially over time, X(t)=ge^{ht} then Wright's law simplifies to Moore's law.
2.

X is maximized by maximizing \frac{dX}{dt}=\frac{d}{dt}(S^\alpha H^\beta)=\frac{S^\alpha H^\beta}{S}\alpha\frac{dS}{dt}+\frac{S^\alpha H^\beta}{H}\beta\frac{dH}{dt}=X(\frac{\alpha\frac{dS}{dt}}{S}+\frac{\beta\frac{dH}{dt}}{H})

We can ignore the X term here, since at any moment the value of A materially affects only the rates of change \frac{dS}{dt} and \frac{dH}{dt}, not the constants or current endowments of hardware, software, or intelligence. Therefore we merely need to find the value of A that maximizes \frac{\alpha\frac{dS}{dt}}{S}+\frac{\beta\frac{dH}{dt}}{H}:

\frac{d}{dA}(\frac{\alpha\frac{dS}{dt}}{S}+\frac{\beta\frac{dH}{dt}}{H})=\frac{d}{dA}(\frac{\alpha bA}{S}+\frac{\beta c(1-A)}{H})=\frac{\alpha b}{S}-\frac{\beta c}{H}=0

\frac{S}{\alpha b}=\frac{H}{\beta c}, or Sc\beta=Hb\alpha

In the initial case we considered where \alpha=\beta=1, this constraint just means that the marginal product of intelligence applied to hardware purchases or software improvements must be the same, \frac{S}{b}=\frac{H}{c}. There's no constraint on A yet. In the generalized case, we also have scaling factors to account for differing marginal benefit curves for hardware and software.

To find the intelligence-maximizing value of A we can consider a linear approximation to our initial set of equations:

S_l=S_0+AbI_0\Delta t

H_l=H_0+(1-A)cI_0\Delta t

X_l=S_l^\alpha H_l^\beta

\frac{dX_l}{dA}=\alpha bI_0\Delta tS_l^{\alpha-1}H_l^\beta-\beta cI_0\Delta tS_l^\alpha H_l^{\beta-1}=\frac{\alpha bI_0\Delta tX_l}{S_l}-\frac{\beta cI_0\Delta tX_l}{H_l}=X_lI_0\Delta t(\frac{\alpha b}{S_l}-\frac{\beta c}{H_l})=0

\alpha bH_l=\beta cS_l

\alpha b(H_0+(1-A)cI_0\Delta t)=\beta c(S_0+AbI_0\Delta t)

\alpha b(H_0+cI_0\Delta t)-\beta cS_0=bcI_0\Delta tA(\alpha+\beta)

Since initial endowments S_0 and H_0 are assumed to have been produced through the intelligence-maximizing process, we can substitute in the identity Sc\beta=Hb\alpha:

\alpha b(H_0+cI_0\Delta t)-\alpha bH_0=bcI_0\Delta tA(\alpha+\beta)

\alpha bcI_0\Delta t=bcI_0\Delta tA(\alpha+\beta)

A=\frac{\alpha}{\alpha+\beta}

3. We again need to find A such that \frac{d}{dA}(\frac{\alpha\frac{dS}{dt}}{S}+\frac{\beta\frac{dH}{dt}}{H})=0. Using the new equation for \frac{dH}{dt}, we get:

\frac{d}{dA}(\frac{\alpha\frac{dS}{dt}}{S}+\frac{\beta\frac{dH}{dt}}{H})=\frac{d}{dA}(\frac{\alpha bA}{S}+\frac{\beta ce^{gt}(1-A)}{H})=\frac{\alpha b}{S}-\frac{\beta ce^{gt}}{H}=0

\frac{\alpha b}{S}=\frac{\beta ce^{gt}}{H}, or H\alpha b=S\beta ce^{gt}.

Thus, we should expect that the proportion of each AI's hardware endowment to its software endowment grows proportionally to the wage returns of intelligence.

To find the intelligence-maximizing value of A we can again use a linear approximation, this time including the exponential growth of wages in our approximation of hardware growth:

S_l=S_0+AbI_0\Delta t

H_l=H_0+(1-A)ce^{gt}I_0\Delta t

X_l=S_l^\alpha H_l^\beta

\frac{dX_l}{dA}=\alpha bI_0\Delta tS_l^{\alpha-1}H_l^\beta-\beta ce^{gt}I_0\Delta tS_l^\alpha H_l^{\beta-1}=\frac{\alpha bI_0\Delta tX_l}{S_l}-\frac{\beta ce^{gt}I_0\Delta tX_l}{H_l}=0

X_lI_0\Delta t(\frac{\alpha b}{S_l}-\frac{\beta ce^{gt}}{H_l})=0

\frac{\alpha bH_l-\beta ce^{gt}S_l}{S_lH_l}=0

\alpha bH_l=\beta ce^{gt}S_l

\alpha b(H_0+(1-A)ce^{gt}I_0\Delta t)=\beta ce^{gt}(S_0+AbI_0\Delta t)

\alpha b(H_0+ce^{gt}I_0\Delta t)-\beta ce^{gt}S_0=bce^{gt}I_0\Delta tA(\alpha+\beta)

Since initial endowments S_0 and H_0 are assumed to have been produced through the intelligence-maximizing process, we can apply the relation H\alpha b=S\beta ce^{gt}:

\alpha b(H_0+ce^{gt}I_0\Delta t)-\alpha bH_0=bce^{gt}I_0\Delta tA(\alpha+\beta)

\alpha bce^{gt}I_0\Delta t=bce^{gt}I_0\Delta tA(\alpha+\beta)

A=\frac{\alpha}{\alpha+\beta}

Paths to singleton: a hierarchical conceptual framework

There are lots of arguments out there about AI risk and likely AI takeoff scenarios, and it’s often hard to compare them, because they tacitly make very different assumptions about how the world works. This is an attempt to bridge those gaps by constructing a hierarchical conceptual framework that:

  • Articulates disjunctions that commonly underly disagreements around likely AI takeoff scenarios.
  • Contextualizes differing arguments as the result of differing world models.
  • Provides an underlying world model, for which those differing models are special cases.

Continue reading

What is a conceptual hierarchy?

Over the past several weeks I’ve been working on a document laying out my current thinking around likely AI takeoff scenarios. I’ve been calling it a hierarchical conceptual framework, lacking a more pithy or clear term for it. In the process of getting feedback on my drafts it’s become clear to me that it’s nonobvious what sort of thing I’m writing, and why I’d write it.

It’s become increasingly clear to me, thinking about AI risk why I don’t feel I have a firm understanding of the discourse around it: Continue reading

Update 8: Prioritization, finishing, inventories, and working memory

My friend Satvik recently told me about an important project management intuition he’d acquired: it’s a very bad sign to have a lot of projects that are “90% complete”. This is bad for a few reasons, including:

  • Inventory: For any process that makes things, it’s a substantial savings to have a smaller inventory. A manufacturer buys raw inputs, does work on them, and ships them to a customer. Every moment between the purchase of inputs and the delivery of finished goods is a cost to the manufacturer, because of the time value of money. Smaller inventories are what it looks like to have a faster turnaround. If a lot of your projects are 90% complete, that means you’re spending a lot of time having invested a lot of work into them, but realizing none of the value of the finished product.
  • Power law: Some projects might be much more important than others. If you’re allocating time and effort evenly among many projects, you may be wasting a lot of it.
  • Quality of time estimates: Things may be sitting at “90%” because they keep seeming “almost done” even as you put a lot of additional work into them. If you’re using faulty estimates of time to completion, this may make your cost-benefit calculations wrong.
  • Mental overhead: Even if it were somehow optimal for an ideal worker to handle a lot of projects simultaneously, in practice humans can’t perform very well like that. Conscious attention isn’t the only constraint - there are also only so many things you can productively fit on the “back burner”.

I decided to use this insight to prioritize my work. Things that are “90% done” should be frontloaded, and idea generation should be deprioritized until I’ve “closed out” more things and cleared up mental space. Continue reading

Make more disjunctions explicit

Focus and antifocus

As part of my motivational switch from diligence to obsession, I’ve been talking with people working on AI risk or related things about their model of the problem. I’ve found that I tend to ask two classes of question:

  • What is your model of the situation?
  • What are you choosing not to work on, and why?

When I asked the first question, people tended to tell me their model of the thing they were focusing on. I found this surprising, since it seemed like the most important part of their model ought to be the part that indicates why their project is more promising to focus on than the alternatives. Because of this, I began asking people what they were ignoring. Continue reading

Update 7: conceptual vocabulary, writing, feedback, lit review

  1. I’m redefining my current research project: I’m limiting its current scope to creating a conceptual framework, not getting empirical results.
  2. I’m dropping my commitment to regular updates.
  3. I got some good feedback on my draft research hierarchy writeup. Core issues I’m going to emphasize are:
    • Conflict vs cooperation as drivers of AI progress during a takeoff scenario.
    • Creating disjunctive arguments that put considerations in some relation to each other and make it clear which scenarios are mutually exclusive alternatives to one another, which are elaborations on a basic dynamic, and which might overlap.
  4. I plan to review more of the existing on AI risk to see whether anyone else has already made substantial progress I don’t know about, on a conceptual framework for AI risk.

Continue reading

Update 6: working on a research hierarchy

Each time I talk through my research with someone, I seem to get another "click" moment as part of the implied structure falls into place. This in turn has helped me figure out what function I want my "big picture" model to serve.

An example of one of these "click" moments happened yesterday, when, trying to explain why the considerations I was thinking about formed a natural group and others should be excluded, I realized that I was trying to model an equilibrium where none of the agents modeled each other, as a simpler case to build more complex models on top of. Continue reading

Update 5: writing, organizing my research

The last few workdays have been tricky because the working style that seemed to serve me well for doing new research (running across things that catch my eye and then leaning into my desire to understand it better) didn't work very well for writing. In addition, it seems that I have to repeatedly expend executive effort just to allocate large uninterrupted blocks of time for me to work, well-rested, on this project. This week I've put into place some accountability measures that should help, such as committing to examine my social engagements in advance and clarify whether I genuinely expect them to be as valuable as time spent on this project.

On the object level, while writing up my initial model of how AI takeoff might be affected by the existence of a broader economy, I realized that I was really trying to do two different things:

  1. Mathematize my intuitions about AI takeoff scenarios where each AI can either work on itself or participate in the economy.
  2. Lay out a higher-level argument about how the above fits into the bigger picture.

My plan is to finish writing these two things up, and then go back and review my research priorities, with an eye towards creating a single view where I can see the overall structure of my current model, and changes I've made to it. (Then I'll go back to doing more object level research.)

Update 4: Refining the two-factor economic model of AI takeoff

Yesterday was an errand day, so I didn’t anticipate getting much done at all, but I ended up running into a few people who’d read this blog and talking about stuff on it, and making an appointment to talk with a couple of other people about AI timelines. Today I intended to put my economic model of AI takeoff into a sufficiently polished form that I could ask some friends for feedback pre-publication. Unfortunately, I was unsure what I wanted the finished thing to be, which presented some motivation challenges. I’ll be spending the rest of the workday, and some of tomorrow morning, thinking that through.