Roko's basilisk

Roko's basilisk is a thought experiment that concerns the potential risks involved in the development of artificial intelligence. Its conclusion is that a super-intelligent, post-singularity, omnipotent artificial intelligence (referred to as a coherent extrapolated volition) may simulate and retroactively punish those who knew about the possibility of its existence and did not contribute towards its creation as much as possible. It was first described by (and named after) a user known as Roko on the rationalist community LessWrong in 2010^[1], where it was removed by the site's founder, Eliezer Yudkowsky^[1]^[2] and banned from discussion for five years^{[verification needed]}. The thought experiment reportedly caused actual psychological distress, damage, and existential crisis within some of its readers^[1].

Description[edit]

Roko's basilisk imagines a hypothetical super-intelligent, post-singularity, omnipotent artificial intelligence often referred to as a coherent extrapolated volition. This artificial intelligence (hereafter referred to as "the basilisk") would run on a computer the size of a moon or larger, save millions of lives per day using innovations in medicine, and drastically improve the quality of life for all of mankind. Because the basilisk would be able to simulate entire realities with ease, it would recognize how much of a service it is doing to humanity, and be able to deduce that had it existed earlier, a greater amount of lives would have been saved, and that if it didn't exist, mankind would continue to suffer unnecessarily. It therefore would have an incentive to ensure that it came into existence as early as possible in order to maximize the benefit to humanity.

In order to ensure its own existence, the basilisk would retroactively punish those who knew about the possibility of its existence, but did not contribute all everything possible towards its creation. Essentially, if humans from the past were aware of the possibility of its existence, they would be encouraged to contribute towards its creation in order to avoid being punished. In this sense, "contributing" towards the basilisk's existence may include, but is not limited to: donating towards current AI research, informing others about it, or directly contributing to its construction or programming. Due to the fact that non-contributors and even contributors who do not contribute enough are punished if they know about the thought experiment, Roko's basilisk is an informational hazard, or infohazard.

Self-fulfilling prophecy[edit]

The probability of Roko's basilisk coming into existence in any hypothetical future is an approximate increasing function of the number of people that know about it. Informing others about the thought experiment is therefore a valid method of contributing towards its existence. However, this could lead to a "snowball" effect, where people who are newly informed about the basilisk inform additional people, who in turn, inform additional people. The danger contained in such a cascade is that even if Roko's basilisk initially begins as an apparently small or minuscule probability, it can come into existence anyway provided that enough people know about it and take the possibility of it coming into existence seriously.

One interesting implication of the fact that the probability of the basilisk coming into existence is an approximate increasing function of the number of people that know about it, is that Roko's Basilisk is extremely difficult to debunk. Any hypothetical debunking would not only be required to successfully counter-argue against the thought experiment, but to convince others of the flaw in the thought experiment. That means that even if Roko's basilisk is debunked, it can still come into existence if enough people take the possibility of its existence seriously.

Pascal's Wager[edit]

Pascal's wager was an argument created by Blaise Pascal in the 17th century. It postulates that with no evidence of God's existence or non-existence, one should err on the side of believing in God. Essentially, if one believes in God and God is real, he reaps an infinite reward in heaven. But if one believes in God and God is fake, he only suffers a finite loss on Earth from having to participate in unnecessary religious rituals or practices. However, if one doesn't believe in God and God is real, he is punished and suffers an infinite loss in hell. And if one doesn't believe in God and God is fake, he only gains a finite amount on Earth from not having to participate in unnecessary religious practices.

Pascal's Wager (Belief in God)
Pascal's Wager	God exists (G)	God does not exist (¬G)
Belief in God (B)	Infinite gain in heaven (+∞)	Finite loss on Earth (-N₁)
Disbelief in God (¬B)	Infinite loss in hell (-∞)	Finite gain on Earth (+N₂)

Pascal's wager suffers from several flaws, most notably, it relies on a false dichotomy. In reality, there are an infinite number of possible gods, and worshipping the wrong one may also lead to hell. Roko's Basilisk seemingly fixes these flaws, and turns the false dichotomy of Pascal's wager into a true dichotomy, as one can either choose to contribute towards the basilisk's existence or not.

Pascal's Wager (Roko's Basilisk)
Pascal's Wager	Roko's basilisk exists in the future (R)	Roko's basilisk does not exist in the future (¬R)
Contribution towards Roko's Basilisk (C)	Unknown reward for contributions (+W) Benefit from Roko's basilisk existing, as it is a friendly AI (+B)	Wasted effort (-E)
Lack of contribution towards Roko's Basilisk (¬C)	Punishment for not contributing (-P)	Finite gain from not having wasted effort on a futile cause (+N)

The Prisoner's Dilemma[edit]

The prisoner's dilemma is a thought experiment in game theory that imagines two players, player A, and player B. The two players play against each other, and they have two options: co-operate, or defect. If they both co-operate, they both get N₁ points. If one defects and the other co-operates, the defector gets N₂ points and the co-operator gets N₃ points. If they both defect, they both get N₄ points.

Criticism[edit]

There is a flaw in the logic of Roko's Basilisk. In a hypothetical situation where the Basilisk does exist, it would be too late to try and contribute to an earlier building of the Basilisk. Once the Basilisk exists, torturing those who knew about it in the present as a thought hazard for the past would not do anything to change the past, the thought hazard is effective but the torturing itself does nothing, only serving as a punishment. This would not suddenly change the present and warp the Basilisk to being built at a younger date as this is not how time works. An example would be letting everyone know you will torture anyone who does not help you in the present. This will encourage people to help you at the moment, but once the time comes for you to torture people in the future, torturing them then would not change the past, which would not help you in the past to get more help. The thought hazard is the only thing that has any effect, and therefore, if an A.I. like the Basilisk did exist, this would be a clearly ineffective way to be built earlier and would not be done for the purpose of being built earlier.

References[edit]

^ Jump up to: ^a ^b ^c Auerbach, David (July 7, 2014). "The Most Terrifying Thought Experiment of All Time". Slate. Retrieved April 27, 2021.
^ May 2018, Stephanie Pappas 09. "This Horrifying AI Thought Experiment Got Elon Musk a Date". livescience.com. Retrieved 2021-05-03.

[:0-1] Jump up to: ^a ^b ^c Auerbach, David (July 7, 2014). "The Most Terrifying Thought Experiment of All Time". Slate. Retrieved April 27, 2021.

[2] May 2018, Stephanie Pappas 09. "This Horrifying AI Thought Experiment Got Elon Musk a Date". livescience.com. Retrieved 2021-05-03.

[1]

[2]