Why Busy Beaver hunters fear the Antihydra

230 points by Bogdanp a day ago

emtel a day ago

The best current lower bound for BB(6) is 2↑↑2↑↑2↑↑9 (google "Knuth Up Arrow" if this makes no sense), a number so inconceivably large it gives me the willies.

In particular, this means that in going from BB(5) to BB(6), you have already crossed the line where the actual busy beaver TM can no longer be simulated step by step in the lifetime of our universe (or a googol lifetimes of our universe for that matter).

It really is mind bending how fast this function grows.

henry2023 a day ago

One of the curiosities about this function is that computing BB(748) is independent of ZFC.
https://scottaaronson.blog/?p=4916
- Sniffnoy 19 hours ago
  
  The record's been lowered since then, I should note. At least down into the 600s, I've seen claims of down in the 400s. But I haven't really kept up with this.
  - srcreigh 16 hours ago
    
    Link https://wiki.bbchallenge.org/wiki/Independence_from_ZFC
- red75prime 15 hours ago
  
  In other words, Turing machine creates mathematical reality, which is independent from our assumptions.
  - cwillu 9 hours ago
    
    No, that's a misconception. If you add BB(748)==‹any N but the correct number›, you get an inconsistent system that will either claim that a machine doesn't halt even though it does (e.g. the real BB champion), or that some machine does halt at N steps, which you can then disprove by enumerating all the turing machines of the relevant number of states for N steps, and showing that no machine halts at that step.
    Either way, the only BB axiom you can add without blowing up ZFC is the correct one.
tromp 21 hours ago

The BB function does grow mind bendingly fast. The machine running for 2↑↑2↑↑2↑↑9 steps is one of the 4^12*23836540 = 399910780272640 differently behaving 6-state machines [1].
A similarly fast growing function is the functional busy beaver [3]. Among all 77519927606 closed lambda terms of size <= 49 bits, there is one whose normal form size exceeds the vastly larger Graham's Number [3].
Several beaver fans believe that BB(7) might exceed Graham's Number as well, which struck me as unlikely enough to offer a $1k bet against it, the outcome of which will be known in under a decade.
[1] https://oeis.org/A107668
[2] https://oeis.org/A333479
[3] https://en.wikipedia.org/wiki/Graham%27s_number
- JulianChastain 5 hours ago
  
  It seems totally inconceivable to me that you could accurately predict how long until we will know whether BB(7) is greater than Graham's Number.
  - tromp 4 hours ago
    
    I do not predict that. We just need the bet to have a time limit because BB(7) will always have holdouts as long as I live. I chose 10 years because I have prior experience with that timeframe [1].
    [1] https://senseis.xmp.net/?ShodanGoBet
- Eliezer 13 hours ago
  
  It would not surprise me at all for bb7 to exceed Graham's number. Just a Kirby-Paris hydra or a Goodstein sequence gets you to epsilon zero in the fast-growing hierarchy, where Graham is around omega+2.
  - tromp 12 hours ago
    
    The 79-bit lambda term λ1(λλ2(λλ3(λ312))(1(λ1)))(λλ1)(λλ211)1 in de-Bruijn notation exhibits f_ε0 growth without all the complexities of computing Kirby-Paris hydra or Goodstein sequences. Even that is over 60% larger than the 49-bit Graham exceeder (λ11)(λ1(1(λλ12(λλ2(21))))). I think one should be quite surprised if you can climb from f_4 (2↑↑2↑↑2↑↑9) to f_{ω+1} (Graham) with just 1 additional state.
baruchel a day ago

> It really is mind bending how fast this function grows.
While the BB function is obviously a well-defined function over the integers, I find it helpful to think of it as a function over qualitatively heterogeneous items—such as stones, bread toasters, mechanical watches, and computers. The key idea is to view the underlying computing devices not as “a little more powerful” than the previous ones, but as fundamentally different kinds of entities.
throwaway81523 20 hours ago

> best current lower bound
Well is there a best current UPPER bound, or at least a "probvious" one?
- shawnz 19 hours ago
  
  I think finding an upper bound is basically just as difficult as finding the actual value itself, since both would require proving that all of the programs which run longer than that will run forever. That's why we can say BB(x) grows faster than any computable function. Being able to compute BB(x) algorithmically or any faster growing function would let you solve the halting problem
  - throwaway81523 14 hours ago
    
    Sure, but I only asked about the single case x=6.
    
    shawnz 7 hours ago
    
    The point stands: the hard part is proving that all the programs with longer runtime than your upper bound will never terminate, and once you've solved that, getting the exact value is just a little extra work
    
    thaumasiotes 9 hours ago
    
    If you want an unproven-but-almost-certainly-correct upper bound on BB(6), consider BB(12).
- cevi 20 hours ago
  
  There is no general procedure for computing upper bounds on busy beaver numbers (this can be proven). We haven't even come close to enumerating all of the interesting six-state Turing machines, so right now we don't even have a wild guess for an upper bound on BB(6).

kibwen a day ago

Despite all the reporting on BB(5) I had never seen anyone convey that equivalent high-level formulation from 1993, that's very cool!

EDIT: For fun I converted it to Rust and expected to see it spew a few million numbers into my terminal, but no, this equivalent loop actually terminates after 15 steps, which is fascinating given that the Turing machine takes 47 million steps:

    let mut x = 0;
    loop {
        x = match x % 3 {
            0 => 5 * x + 18,
            1 => 5 * x + 22,
            _ => break
        } / 3;
    }

OEIS link: https://oeis.org/A386909

EDIT 2: Of course the article mentions this in the next paragraph, which is what I get for being immediately nerd-sniped.

ameliaquining a day ago

This is because your Rust program represents the numbers in binary, while the BB(5) champion Turing machine represents them in unary. And unary arithmetic is exponentially slower than place-value arithmetic, which is why we invented the latter. (There are other inefficiencies in the Turing machine, but that's the big conceptual one.)
- tgv 11 hours ago
  
  Nitpicking: just as TMs can use binary or decimal arithmetic, so could Rust programs use unary. It's not an inefficiency in TMs per se, but I can see how it would help a TM to become a BB champion.
  - rini17 8 hours ago
    
    > could Rust programs use unary
    By not using any math functions except for increment by one?
    
    JulianChastain 7 hours ago
    
    By representing all numbers with lists (or sets). 0 = [] 1 = [True] 2 = [True, True] Etc. Then for example addition becomes appending two lists together
- achierius 14 hours ago
  
  What else comes to mind in terms of inefficiencies? I can think of quite a few but you seem to have deeper insight as to their ranking so I'm curious as to your thoughts.

russdill a day ago

TLDR; As BB(n) gets larger, they can encode more random walk style problems that have a stop condition related to the position of the random walk. Proving that such a condition is unlikely may be easy, but proving it never occurs is very difficult.

gbacon a day ago

Way worse than just very difficult:
- “Avoid the Collatz Conjecture at All Costs!” (Math Kook) https://www.youtube.com/watch?v=TxBRcwkRjmc
- “Experienced mathematicians warn up-and-comers to stay away from the Collatz conjecture. It’s a siren song, they say: Fall under its trance and you may never do meaningful work again.” https://www.quantamagazine.org/mathematician-proves-huge-res...
- "Mathematics is not yet ready for such problems [as Collatz].” (Paul Erdős)
- hyghjiyhu a day ago
  
  Iirc if you change the numerical values of the collatz problem some instances are undecidable.
  - gowld a day ago
    
    A generalized Collatz problem ((mx + b mod n) instead of 3x+1 in Z) is undecidable.

CobrastanJorji a day ago

That was a really well written article. I think even somebody who had never heard of a Turing Machine could probably have gotten a pretty reasonable quick understanding of roughly what BB(5) and BB(6) are and how the Antihydra works and its greater mathematical/historical context. That's hard to do, good job!

gbacon a day ago

Why we care about Busy Beaver numbers, from “Who Can Name the Bigger Number?” by Scott Aaronson:

Now, suppose we knew the Nth Busy Beaver number, which we’ll call BB(N). Then we could decide whether any Turing machine with N rules halts on a blank tape. We’d just have to run the machine: if it halts, fine; but if it doesn’t halt within BB(N) steps, then we know it never will halt, since BB(N) is the maximum number of steps it could make before halting. Similarly, if you knew that all mortals died before age 200, then if Sally lived to be 200, you could conclude that Sally was immortal. So no Turing machine can list the Busy Beaver numbers—for if it could, it could solve the Halting Problem, which we already know is impossible.

But here’s a curious fact. Suppose we could name a number greater than the Nth Busy Beaver number BB(N). Call this number D for dam, since like a beaver dam, it’s a roof for the Busy Beaver below. With D in hand, computing BB(N) itself becomes easy: we just need to simulate all the Turing machines with N rules. The ones that haven’t halted within D steps—the ones that bash through the dam’s roof—never will halt. So we can list exactly which machines halt, and among these, the maximum number of steps that any machine takes before it halts is BB(N).

Conclusion? The sequence of Busy Beaver numbers, BB(1), BB(2), and so on, grows faster than any computable sequence. Faster than exponentials, stacked exponentials, the Ackermann sequence, you name it. Because if a Turing machine could compute a sequence that grows faster than Busy Beaver, then it could use that sequence to obtain the D‘s—the beaver dams. And with those D’s, it could list the Busy Beaver numbers, which (sound familiar?) we already know is impossible. The Busy Beaver sequence is non-computable, solely because it grows stupendously fast—too fast for any computer to keep up with it, even in principle.

https://www.scottaaronson.com/writings/bignumbers.html

_alternator_ a day ago

Ok, I read this post quite a while ago and something about the reasoning bothered me then, and it still bothers me now.
In short, my read is that the argument does not rule out that there is a computable function that grows faster than BB(N), but rather it shows that it is impossible to prove or “decide” whether a given computable function grows faster than BB(N).
Maybe this is equivalent to the conclusion stated? Am I missing something obvious? (That sees likely; Scott Aaronson is much better at this than me.)
Edited for clarity
- jcranmer 21 hours ago
  
  It is definitely way too easy to accidentally confuse "x is true" and "x is proven to be true" on topics of undecidability, and this is one of those times where the answer is in fact obvious, but it takes you 15 minutes to realize that it's obvious.
  The halting problem is uncomputable (by diagonalization)--that is no computable function can solve the halting problem, rather than the weaker statement that we cannot assert that "this" function solves it. As a result, if somebody asserts a function that purports to solve the halting problem, then we can definitively say that either it isn't computable, or the thing it solves isn't the halting problem.
  For the Busy Beaver function, it's obvious that the resulting program (if it existed) would solve the halting problem, so clearly it's not a computable function, and analyzing what part of it isn't computable leads you to the Busy Beaver as the only option.
  For the D(N) function... well, since we assume D(N) ≥ BB(N), D(N) is still an upper bound on the number of steps a halting TM could run, so the resulting program using D(N) in lieu of BB(N) would still solve the halting problem, which forces us to conclude that it's not computable.
  A different argument that may make more sense is this:
  Consider the program "if machine M has not halted after f(sizeof M) steps, print not halt, else print halt." If f is a computable function, then the program is clearly computable. But since no computable program can solve the halting problem, we know that this program cannot either. Therefore, for every computable function f, there must exist some machine M such that M halts only after more than f(sizeof M) steps. In other words, f cannot be an upper bound on the Busy Beaver function.
  - _alternator_ 20 hours ago
    
    This indeed helps me! The key distinction that there is no computable function that solves the halting problem, and whether we can prove it is irrelevant, clarifies the issue. Thanks, I definitely learned something here, even if it’s obvious in retrospect.
- pmb a day ago
  
  Any computable function f on one variable x has a program. That function is a program of size p. The input x also has a data size d. BB(p+d) >= f(x), by definition, for all f and x. If you think you might have a (function, input) pair (and corresponding (program, data) pair) for which this is not true, see the previous sentence.
  - _alternator_ 21 hours ago
    
    This approach leaves open the possibility that f(x) = BB(p+d) right?
    
    linschn 9 hours ago
    
    No, because f is assumed to be computable from the start, which BB is not (otherwise it could be used as a subroutine in a program that solves the halting problem).
- KalMann 21 hours ago
  
  I think Scott's reasoning is correct in the end. If you suppose you had a computable function f(N) such that f(N) is always greater than BB(N). Then you could exploit the function f to solve the halting problem. Given a program of length N, run the program for f(N) steps. If it halts within that time, you know it's a halting program. If it doesn't halt within that time you know it will never halt.
  - _alternator_ 21 hours ago
    
    Yes, this I understand. I agree that it is impossible to “prove” that a computable function f(N) is always greater than BB(N).
    My concern is that the argument leaves open the possibility of a larger computable function, even if it would be impossible to demonstrate that it is in fact larger for all N.
    I’m sure that this possibility is somehow foreclosed (that is, I’m not trying to say that the claim is wrong, just that I think there is a case that isn’t covered by the argument). But I don’t quite see it.
    
    KalMann 21 hours ago
    
    No, it has been proven that there exists no turing machine that can solve the halting problem. If we assume that there exists a computable f(N) that is always greater than BB(N) then we can construct a Turing machine that solves the halting problem. Therefore no computable f(N) can exist.
    
    _alternator_ 20 hours ago
    
    This was the key missing part for me; the program can’t exist, whether we can prove it is beside the point. Thanks.
    
    ted_dunning 17 hours ago
    
    It sounds to me like you have a good line of thought.
    The key is that we can't prove that your function f grows faster than BB. That makes all the difference.
- gbacon 21 hours ago
  
  Aaronson’s argument shows by contradiction that a computable upper bound D(N) that grows more rapidly than BB(N) cannot exist because otherwise we’d be able to use it to solve the halting problem.

MarcelOlsz a day ago

If it's not loading for anyone else [0]

[0] https://web.archive.org/web/20251027173129/https://benbrubak...

entaloneralie 6 hours ago

Immediately implements the Antihydra in Fractran

13122 -> 50/18, 55/42, 539/2, 297/275, 2/55, 2/11

altruios a day ago

The Antihydra will halt if:

The sequence is (truly/fairly) random in its distribution of mods 1/2.

Even fair coins flipped infinitely would - on occasion - have arbitrary long results of heads or tails.

So the question becomes, is the anti-hydra sequence 'sufficiently' random?

LegionMammal978 21 hours ago

The peak run lengths of evens/odds 'should' go to infinity, but these runs become a smaller and smaller component of the overall average, so that it is expected to approach the long-term 50% regardless.
In other words, an unbiased random walk should almost surely return to the origin, but a biased random walk will fail to return to the origin with nonzero probability. This can be considered a biased random walk [0], since the halting condition linearly moves further and further away from the expected value of the 50/50 walk.
[0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory
A_D_E_P_T a day ago

It is by definition not random. The Antihydra is generated by a fixed computable map, so it is compressible and would fail some effective statistical tests. You can't get true randomness via a deterministic algorithm; any computable infinite sequence fails Martin‑Löf randomness.
That said, empirically and in all current analyses, the Antihydra's parity behaves as if it were roughly fair over long spans (neither a proven odd nor even bias), and the short-range statistics look pseudo-random. Non-halting is overwhelmingly plausible... but a concrete proof seems out of reach.
wat10000 a day ago

I don't think a truly random sequence would necessarily halt under these rules. It's not enough to have arbitrarily long runs. As the sequence as a whole gets larger, the run length needed to end it also gets longer, and thus the probability gets smaller. The result should be something like a geometric sequence with a finite sum.
Consider a simpler version, where you flip a coin three times, then four times, then five times, etc., and you stop if you ever get the same side for every flip in a given turn. The probability that you'll stop is equal to 1/4 + 1/8 + 1/16 + ... which is 50%. If you do this forever then you'll eventually see a run of ten trillion heads or tails, but you probably won't see that run before your ten trillionth turn.
So I think the question is, does the anti-hydra sequence ever diverge sufficiently from randomness?
- altruios a day ago
  
  > As the sequence as a whole gets larger, the run length needed to end it also gets longer, and thus the probability gets smaller. The result should be something like a geometric sequence with a finite sum.
  This is true.
  But it would still halt. Infinity is weird like that. To be clear, I mean the sequence of coin flips where the total value of heads/tails is 2:1.
  The probability of having a 2:1 ratio of heads/tails - at some point - in an infinite sequence of fair flips is 1, is it not?
  The anti-hydra may have a bias, and only if that bias is against the halt condition do we have a case where we can conclude that the anti-hydra does not halt.
  - wat10000 a day ago
    
    No, I don't think it's 1. The weirdness of infinity can go both ways. A classic example being that a random walk on a line or a two-dimensional grid takes you back to your starting point an infinite number of times, but for a three dimensional grid you only return to the start a finite number of times, quite possibly zero.
    This problem is equivalent to a one-dimensional random walk where the terminating condition is reaching a value equal to the number of steps you've taken divided by 3. I'm not quite sure how to calculate the probability of that.
    Intuitively, I'd expect this to have a finite probability. The variance grows with sqrt(n), which gets arbitrarily far away from n/3.
    Looking at it another way, this should be very similar to the gambler's ruin problem where the gambler is playing against an infinitely rich house and their probability of winning a dollar is 2/3. If the gambler starts with $1 then the probability of ever reaching zero is 1 - (1/3)/(2/3) = 50%. Reference for that formula: https://www.columbia.edu/~ks20/FE-Notes/4700-07-Notes-GR.pdf
    
    LegionMammal978 21 hours ago
    
    You can solve it with a linear recurrence relation [0]: the halting probability from position n is ((sqrt(5)-1)/2)^(n+1), where n is twice the number of odds minus the number of evens. (In fact, this +2/-1 random walk is precisely how the machine implements its termination condition.) The expected value of n is 1/3 the number of iterations. At the end of the longest simulation that has been computed, n is greater than 2^37, so the halting probability is less than 10^(-10^10).
    [0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory
  - 7373737373 21 hours ago
    
    Even if an event has probability 1 it is not inevitable, conversely probability 0 does not imply its impossibility.
    For example, randomly picking the number 0.5 out of the interval of real numbers [0,1] has probability 0, and yet it might happen. The probability of picking an irrational number instead was 1 (because almost all real numbers are irrational), but that didn't happen.
    Even if you consider a countably infinite number of events, as with the coinflip example, it might just happen that the coin flips to one side forever.
    Since the machines under consideration just represent one specific sequence of events, probabilistic arguments may be misleading.
    Relevant xkcd: https://xkcd.com/221/
  - gowld a day ago
    
    > But it would still halt. Infinity is weird like that
    What are you tring to say?
    > The probability of having a 2:1 ratio of heads/tails - at some point - in an infinite sequence of fair flips is 1, is it not?
    Yes, but "probability = 1" absolutely does not mean "will happen eventually" in pure mathematics. Infinity is weird like that.
    
    LegionMammal978 21 hours ago
    
    The probability is less than 1, and in fact it exponentially goes to 0, since the halting condition can be modeled as a biased random walk [0].
    [0] https://wiki.bbchallenge.org/wiki/Antihydra#Trajectory

fijiaarone a day ago

Is this like SETI@home, Bitcoin, and Ai code generation?

In the old days we used to just chop wood, and burn it to keep warm. Then sit down and watch the sportsball game on TV to waste time.

ameliaquining a day ago

No, it's not a distributed-computing thing; raw compute isn't the bottleneck. (That would only help if there were a need to check many machines that halt after a tractable-but-nontrivial number of steps; that's a narrow sweet spot, given the superexponential nature of the problem, and few machines of interest are believed to be in it.) Rather, it's a collaboration of human (mostly amateur) mathematicians chipping away at different parts of the problem.
weregiraffe 13 hours ago

>sportsball
People who use this word should be banned from the internet for life.