O debate sobre a IA é, na verdade, 100 debates usando um único disfarce.
A inteligência artificial (IA) poderá nos ajudar a curar todas as doenças, e contruir um mundo pós-escassez cheio de vidas prósperas? Ou ajudará tiranos a nos vigiar e manipular ainda mais? Os principais riscos da IA são de acidentes, do uso indevido de pessoas com más intenções, ou de uma IA rebelde se voltar contra nós? Isso tudo é só hype? Por que a IA consegue imitar o estilo de qualquer artista em um minuto, mas fica confusa ao desenhar mais de 3 objetos? Por que é tão difícil fazer com que uma IA sirva de forma confiável aos valores de humanidade, ou servir a qualquer objetivo? E se ela aprender a ser mais humana que nós? E se ela incorporar a desumanidade dos humanos, nossos preconceitos e crueldade? Estamos indo rumo a uma utopia, uma distopia, à extinção, talvez um destino pior que a própria extinção, ou — o desfecho mais chocante de todos — nada muda? E ainda: A IA vai roubar o meu emprego?
...e muito mais dúvidas.
Infelizmente, para entender a IA com um pouco mais de nuance, nós precisamos entender alguns detalhes técnicos...só que esses detalhes estão espalhados por centenas de artigos, enterrados a dois metros de profundidade por um monte de palavreado técnico.
Então, eu apresento a vocês:
Esta série de 3 partes é o seu guia para entender as ideias principais de IA & Segurança de IA*(AI Safety) — explicadas de forma amigável, acessível, e com uma pitada de opinião!
(* Frases relacionadas: Riscos de IA(AI Risk), Riscos existenciais de IA(AI X-Risk), Alinhamento de IA(AI Alignment), Ética de IA(AI Ethics), o "Não-Deixe-IA-Matar-Todo-mundo-ismo". Não tem nenhum consenso sobre o que essas expressões significam, então estou usando "Segurança de IA" como termo geral.)
Esta série terá quadrinhos estrelando IAra a CrIAda-Cyborgato, como esse:
[voz-guia-turística]
À sua direita 👉, você irá ver botões para ver os índices,
mudar o visual da página, e
um réloginho para olhar o tempo de leitura restante.
Para esta série, a introdução & a Parte 1 foram publicados em Maio 2024, a Parte 2 saiu em Agosto 2024, e a Parte três vai sair em Dezembro 2024(até agora não saiu). OPICIONAL: se você quiser ser avisado quando a Parte três sair, é só se inscrever aqui em baixo!👇 Você Não vai receber spam, você só vai receber notificação da Parte três.(Maaaaas, [voz-propaganda-curso]
se você estiver no ensino médio ou anterior, e interessado em IA/código/engenharia, marque a caixinha para ver mais sobre Hack Club! P.S: Tem figurinhas da IAra~~~ ✨)
Enfim, [voz-guia-turística-novamente]
antes de a gente sair explorando o terreno rochoso sobre IA & Segurança de IA, vamos dar uma olhada panorâmica do alto:
💡 As ideias centrais de IA & Segurança de IA
Na minha opinião, os problemas centrais em IA e Segurança de IA se resumem a dois principais conflitos:
Nota: O que são "Lógica" e "Intuição" será explicado com mais rigor na Parte 1. Por enquanto: Lógica é o raciocínio passo-a-passo, tipo resolver problemas de matemática. Intuição é reconhecimento instantâneo, como perceber de cara se uma imagem é de um gato. "Intuição e Lógica" correspondem mais ou menos aos "Sistemas 1 e 2" da ciência cognitiva.[1][2] (👈 passe o mouse sobre essas notas de rodapé! Elas se expandem!)
Como dá pra perceber pelas "aspas" irônicas em "vs" (versus), essas divisões, no fim das contas, não são tão divididas assim...
Veja como esses conflitos aparecem nesta série de três partes:
Parte 1: O passado, o presente e os possíveis futuros
Pulando muitos detalhes, a história da IA é uma disputa de Lógica vs Intuição:
Antes de 2000: IA era totalmente lógica, sem intuição.
É por isso que, em 1997, a IA conseguiu vencer o campeão mundial de xadrez... mas nenhuma IA era capaz de reconhecer gatos em fotos. [3]
(Questão de segurança: Sem intuição, a IA não consegue entender o senso comum nem valores humanos. Assim, a IA pode atingir seus objetivos de maneira logicamente corretas, mas indesejáveis.)
Depois de 2000: A IA passou a ter "intuição", mas a ser péssima em lógica.
É por isso que as IAs generativas (no momento que estou escrevendo, (Maio 2024) podem criar paisagens inteiras no estilo de qualquer artista... :mas ficam confusas ao desenhar mais de 3 objetos. (👈 clique neste texto! ele também se expande!)
(Questão de segurança: Sem lógica, não conseguimos verificar o que está acontecendo na "intuição" de uma IA. Essa intuição pode ser tendenciosa, conter erros sutis mas perigosos, ou falhar de forma bizarra em novos cenários.)
Atualmente: Ainda não sabemos como unificar lógica & intuição na IA.
Mas, se ou quando isso acontecer, teremos os maiores riscos & recompensas da IA: algo capaz de planejar logicamente melhor do que nós, e de aprender uma intuição geral. Seria uma "IA Einstein"... ou uma "IA Oppenheimer".
Resumido em uma imagem:
Então, isso é "Lógica vs Intuição". Quanto ao outro conflito central, "Problemas da IA vs Os Humanos", essa é uma das grandes controvérsias no campo da Segurança de IA: os principais riscos vêm da própria IA avançada, ou do uso indevido dessa IA por humanos?
(Por que não os dois?)
Parte 2: Os problemas
O problema da Segurança de IA é o seuginte:[4]
O problema do alinhamento de valores::
“Como podemos fazer com que a IA sirva de forma confiável aos valores de humanidade?”
NOTA: Eu escrevi "humanidade", ao invés de "humanos". Um humano pode ou não agir de forma humanitária. Vou insistir nisso pois tanto os defensores quanto os críticos da Segurança de IA continuam confundindo os dois.[5][6]
Podemos dividir esse problema em "Problemas nos humanos vs IA":
Valores de Humanidade:
“Afinal, O que são valores de humanidade?”
(um problema para a filosofia & ética)
O problema do alinhamento técnico:
“Como podemos fazer com que a IA cumpra de forma confiável a qualquer objetivo que lhe dermos?”
(um problema para cientistas da computação - surpreendentemente, ainda sem solução!)
O problema do alinhamento técnico, por sua vez, pode ser dividido em "Lógica vs Intuição":
Problemas com a Lógica de IA::[7] (problemas de "teoria dos jogos")
- A IA pode atingir seus objetivos de maneira logicamente corretas, mas indesejáveis.
- A maioria da metas levam as mesmas sub-metas nada confiáveis: "não deixarei ninguém me impedir de alcançar meu objetivo", "maximizar minhas capacidades & recursos para otimizar esse objetivo", etc.
Problemas com a Intuição de IA::[8] (problemas de "aprendizado profundo" deep learning))
- Uma IA treinada com dados humanos poderia aprender nossos preconceitos.
- A "intuição" da IA não é compreensível nem verificável.
- A "intuição" da IA é frágil, e falha em cenários inéditos.
- A "intuição" da IA pode falhar parcialmente, o que pode ser muito pior: uma IA com suas capacidades intactas, mas com objetivos quebrados, seria uma IA que age habilmente em prol de metas distorcidas.
(Novamente, o que são "lógica" e "intuição" será explicado com mais precisão mais adiante!)
Resumido em uma imagem:
Para ter uma ideia de como esses problemas são difíceis, note que ainda não resolvemos nem para nós humanos — As pessoas seguem o que está escrito na lei, não o espírito. A intuição das pessoas pode ser tendenciosa e falhar em novas circunstâncias. Afinal, nenhum de nós tem 100% de "humanidade" que gostariamos de ter.
Então, se eu puder ser um pouco mais dramática, talvez entender a IA nos ajude a entender a nós mesmos. E, quem sabe, nós possamos resolver o problema do alinhamento humano: Como fazer com que as pessoas sigam fielmente os valores de "humanidade"?
Parte 3: As soluções propostas
Finally, we can understand some (possible) ways to solve the problems in logic, intuition, AIs, and humans! These include:
- Technical solutions
- Policy/governance solutions
- "How 'bout you just shut it down & don't build the torture nexus"
— and more! Experts disagree on which proposals will work, if any... but it's a good start.
(Unfortunately, I can't give a layperson-friendly summary in this Intro, because these solutions won't make sense until you understand the problems, which is what Part 1 & 2 are for. That said, if you want spoilers, :click here to see what Part 3 will cover!)
🤔 (Optional flashcard review!)
Hey, d'ya ever get this feeling?
- "Wow that was a wonderful, insightful thing I just read"
- [forgets everything 2 weeks later]
- "Oh no"
To avoid that for this guide, I've included some OPTIONAL interactive flashcards! They use "Spaced Repetition", an easy-ish, evidence-backed way to "make long-term memory a choice". (:click here to learn more about Spaced Repetition!)
Here: try the below flashcards, to retain what you just learnt!
(There's an optional sign-up at the end, if you want to save these cards for long-term study. Note: I do not own or control this app, it's third-party. If you'd rather use the open source flashcard app Anki, here's a downloadable Anki deck!)
(Also, you don't need to memorize the answers exactly, just the gist. You be the judge if you got it "close enough".)
🤷🏻♀️ Five common misconceptions about AI Safety
“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”
~ often attributed to Mark Twain, but it just ain't so[9]
For better and worse, you've already heard too much about AI. So before we connect new puzzle pieces in your mind, we gotta take out the old pieces that just ain't so.
Thus, if you'll indulge me in a "Top 5" listicle...
1) No, AI Safety isn't a fringe concern by sci-fi weebs.
AI Safety / AI Risk used to be less mainstream, but now in 2024, the US & UK governments now have AI Safety-specific departments![10] This resulted from many of the top AI researchers raising alarm bells about it. These folks include:
- Geoffrey Hinton[11] and Yoshua Bengio[12], co-winners of the 2018 Turing Prize (the "Nobel Prize of Computing") for their work on deep neural networks, the thing that all the new famous AIs use.[13]
- Stuart Russell and Peter Norvig, the authors of the most-used textbook on Artificial Intelligence.[14]
- Paul Christiano, pioneer of the AI training/safety technique that made ChatGPT possible.[15]
(To be clear: there are also top AI researchers against fears of AI Risk, such Yann LeCun,[16] co-winner of the 2018 Turing Prize, and chief AI researcher at Facebook Meta. Another notable name is Melanie Mitchell[17], a researcher in AI & complexity science.)
I'm aware "look at these experts" is an appeal to authority, but this is only to counter the idea of, "eh, only sci-fi weebs fear AI Risk". But in the end, appeal to authority/weebs isn't enough; you have to actually understand the dang thing. (Which you are doing, by reading this! So thank you.)
But speaking of sci-fi weebs...
2) No, AI Risk is NOT about AI becoming "sentient" or "conscious" or gaining a "will to power".
Sci-fi authors write sentient AIs because they're writing stories, not technical papers. The philosophical debate on artificial consciousness is fascinating, and irrelevant to AI Safety. Analogy: a nuclear bomb isn't conscious, but it can still be unsafe, no?
As mentioned earlier, the real problems in AI Safety are "boring": an AI learns the wrong things from its biased training data, it breaks in slightly-new scenarios, it logically accomplishes goals in undesired ways, etc.
But, "boring" doesn't mean not important. The technical details of how to design a safe elevator/airplane/bridge are boring to most laypeople... and also a matter of life-and-death.
(Catastrophic AI Risk doesn't even require "super-human general intelligence"! For example, an AI that's "only" good at designing viruses could help a bio-terrorist organization (like Aum Shinrikyo[18]) kill millions of people.)
But speaking of killing people...
3) No, AI Risk isn't necessarily extinction, SkyNet, or nanobots
While most AI researchers do believe advanced AI poses a 5+% risk of "literally everybody dies"[19], it's very hard to convince folks (especially policymakers) of stuff that's never happened before.
So instead, I'd like to highlight the ways that advanced AI – (especially when it's available to anyone with a high-end computer) – could lead to catastrophes, "merely" by scaling up already-existing bad stuff.
For example:
- Bio-engineered pandemics: A bio-terrorist cult (like Aum Shinrikyo[18:1]) uses AI (like AlphaFold[20]) and DNA-printing (which is getting cheaper fast[21]) to design multiple new super-viruses, and release them simultaneously in major airports around the globe.
- (Proof of concept: Scientists have already re-built polio from mail-order DNA... two decades ago.[22])
- Digital authoritarianism: A tyrant uses AI-enhanced surveillance to hunt down protestors (already happening), generate individually-targeted propaganda (kind of happening), and autonomous military robots (soon-to-be happening)... all to rule with a silicon fist.
- Cybersecurity Ransom Hell: Cyber-criminals make a computer virus that does its own hacking & re-programming, so it's always one step ahead of human defenses. The result: an unstoppable worldwide bot-net, which holds critical infrastructure ransom, and manipulates top CEOs and politicians to do its bidding.
- (For context: without AI, hackers have already damaged nuclear power plants,[23] held hospitals ransom[24] which maybe killed someone,[25] and almost poisoned a town's water supply twice.[26] With AI, deepfakes have been used to swing an election,[27] steal $25 million in a single heist,[28] and target parents for ransom, using the faked voices of their children being kidnapped & crying for help.[29])
- (This is why it's not easy to "just shut down an AI when we notice it going haywire"; as the history of computer security shows, we just suck at noticing problems in general. :I cannot over-emphasize how much the modern world is built on an upside-down house of cards.)
The above examples are all "humans misuse AI to cause havoc", but remember advanced AI could do the above by itself, due to "boring" reasons: it's accomplishing a goal in a logical-but-undesirable way, its goals glitch out but its skills remain intact, etc.
(Bonus, :Some concrete, plausible ways a rogue AI could "escape containment", or affect the physical world.)
Point is: even if one doesn't think AI is a literal 100% human extinction risk... I'd say "homebrew bio-terrorism" & "1984 with robots" are still worth taking seriously.
On the flipside...
4) Yes, folks worried about AI's downsides do recognize its upsides.
AI Risk folks aren't Luddites. In fact, they warn about AI's downsides precisely because they care about AI's upsides.[30] As humorist Gil Stern once said:[31]
“Both the optimist and the pessimist contribute to society: the optimist invents the airplane, and the pessimist invents the parachute.”
So: even as this series goes into detail on how AI is already going wrong, it's worth remembering the few ways AI is already going right:
- AI can analyze medical scans as well or better than human specialists! [32] That's concretely life-saving!
- AlphaFold basically solved a 50-year-old, major problem in biology: how to predict the shape of proteins.[20:1] (AlphaFold can predict a protein's shape to within the width of an atom!) This has huge applications to medicine & understanding disease.
Too often, we take technology — even life-saving technology — for granted. So, let me zoom out for context. Here's the last 2000+ years of child mortality, the percentage of kids who die before puberty:
(from Dattani, Spooner, Ritchie and Roser (2023))
For thousands of years, in nations both rich and poor, a full half of kids just died. This was a constant. Then, starting in the 1800s — thanks to science/tech like germ theory, sanitation, medicine, clean water, vaccines, etc — child mortality fell off like a cliff. We still have far more to go — I refuse to accept[33] a worldwide 4.3% (1 in 23) child death rate — but let's just appreciate how humanity so swiftly cut down an eons-old scourge.
How did we achieve this? Policy's a big part of the story, but policy is "the art of the possible"[34], and the above wouldn't have been possible without good science & tech. If safe, humane AI can help us progress further by even just a few percent — towards slaying the remaining dragons of cancer, Alzheimer's, HIV/AIDS, etc — that'd be tens of millions more of our loved ones, who get to beat the Reaper for another day.
F#@☆ going to Mars, that's why advanced AI matters.
. . .
Wait, really? Toys like ChatGPT and DALL-E are life-and-death stakes? That leads us to the final misconception I'd like to address:
5) No, experts don't think current AIs are high-risk/reward.
Oh come on, one might reasonably retort, AI can't consistently draw more than 3 objects. How's it going to take over the world? Heck, how's it even going to take my job?
I present to you, a relevant xkcd:
This is how I feel about "don't worry about AI, it can't even do [X]".
Is our postmodern memory-span that bad? One decade ago, just one, the world's state-of-the-art AIs couldn't reliably recognize pictures of cats. Now, not only can AI do that at human-performance level, AIs can pump out :a picture of a cat-ninja slicing a watermelon in the style of Vincent Van Gogh in under a minute.
Is current AI a huge threat to our jobs, or safety? No. (Well, besides the aforementioned deepfake scams.)
But: if AI keeps improving at a similar rate as it has for the last decade... it seems plausible to me we could get "Einstein/Oppenheimer-level" AI in 30 years.[35] That's well within many folks' lifetimes!
As "they" say:[36]
The best time to plant a tree was 30 years ago. The second best time is today.
Let's plant that tree today!
🤔 (Optional flashcard review #2!)
🤘 Introduction, in Summary:
- The 2 core conflicts in AI & AI Safety are:
- Logic "versus" Intuition
- Problems in the AI "versus" in the Humans
- Correcting misconceptions about AI Risk:
- It's not a fringe concern by sci-fi weebs.
- It doesn't require AI consciousness or super-intelligence.
- There's many risks besides "literal 100% human extinction".
- We are aware of AI's upsides.
- It's not about current AI, but about how fast AI is advancing.
(To review the flashcards, click the Table of Contents icon in the right sidebar, then click the "🤔 Review" links. Alternatively, download the Anki deck for the Intro.)
Finally! Now that we've taken the 10,000-foot view, let's get hiking on our whirlwind tour of AI Safety... for us warm, normal, fleshy humans!
Click to continue ⤵
:x Four Objects
Oi! Sempre que eu tiver uma tangente que não se encaixa no fluxo principal, irei coloca-la em uma seção "expansível" como esta!(Serão links com sublinhado pontilhado, não sublinhado contínuo.)
Enfim, aqui está um prompt para desenhar quatro objetos:
“Uma pirâmide amarela entre uma esfera vermelha e um cilindro verde, tudo em cima de um grande cubo azul.”
Aqui estão as quatro primeiras tentativas das principais IAs generativas (não selecionadas):
Midjourney:
DALL-E 2:
DALL-E 3:
(A do canto inferior direito quase conseguiu! Mas, a julgar pelas outras tentativas, foi claramente sorte.)
Por que é que isto demonstra uma falta de "lógica" na IA? Uma parte essencial da "lógica simbólica" é a capacidade de fazer "composicionalidade", um jeito chique de dizer que pode combinar coisas antigas em coisas novas de forma confiável, como "cilindro" + "verde" = "cilindro verde". como mostrado acima, as IAs generativas (de Maio 2024) são bem pouco confiáveis em combinar coisas, quando há mais de 3 objetos.
~ ~ ~
Enfim, chegamos no final deste Nutshell! para fecha-lo, clique no "x" abaixo ⬇️ ou no botão "close all nutshells" no canto superior direito ↗️. Ou apenas continue lendo.
: (psiu... quer colocar esses Nutshells no seu site?)
:x Nutshells
Passe o mouse sobre o canto superior direito destes Nutshells, ou passe em qualquer cabeçalho deste artigo, para exibir este ícone:
Em seguida, clique neste ícone para abrir um tutorial, que irá explicar como incorporar esses Nutshells em seu blog/site!
Clique aqui para saber mais sobre Nutshells. 💖
:x Part 3 details
NOTE: This expanded section won't make much sense yet, since it builds on the lessons in Part 1 & 2. But I'm putting this here now, for:
a) The layperson audience, to reassure y'all that, yes, there are many promising proposed solutions.
b) The expert audience, to reassure y'all that, yes, I probably have your niche lil' thing in here.
Anyway, the TOP 10 TYPES-OF-SOLUTIONS to AI Safety: (with the fancy jargon in parentheses)
- A Level-0 human aligns a Level-1 bot, which aligns a Level-2 bot, which aligns [...] a Level-N bot. (Scalable reward/oversight, Iterated Distillation & Amplification)
- Bots of roughly-equal levels checking each other. (Constitutional AI, AI safety via debate)
- Instead of directly telling a bot what you want, have the bot indirectly learn what you want. (Reinforcement Learning with Human Feedback, Cooperative Inverse Reinforcement Learning, Approval-directed Agents)
- Instead of directly trying to install "humane values" into a bot, have it indirectly figure out what a more knowledgeable, kinder version of us would agree on. (Indirect Normativity, Coherent Extrapolated Volition)
- Solving robustness. (Simplicity, Sparsity, Regularization, Ensembles, Adversarial training)
- Reading the AI's mind. (Interpretability, Circuits, Eliciting Latent Knowledge)
- Maybe all our ideas just suck and we need to go back to square one. (Agent foundations, Causal AI, Shard theory, Bio-plausible learning, Embodied cognition)
- "Just Don't Build The Torture Nexus". Or: how can we get the benefits of AI without building powerful, general, agent-like AIs? (Comprehensive AI services, Narrow/Tool/Microscope AI, Quantilizers)
- The Human Alignment Problem: how do we coordinate humans to make sure AI goes well? (AI Governance, Evals-based governance, Differential technological development, Data/Privacy rights, Windfall Clauses)
- If you can't beat 'em, join 'em! (Cyborgism, Centaurs, Intelligence Amplification)
:x Spaced Repetition
“Use it, or lose it.”
That's the core principle behind both muscles and brains. (It rhymes, so it must be true!) As decades of educational research robustly show (Dunlosky et al., 2013 [pdf]), if you want to retain something long-term, it's not enough to re-read or highlight stuff: you have to actually test yourself.
That's why flashcards work so well! But, two problems: 1) It's overwhelming when you have hundreds of cards you want to remember. And 2) It's inefficient to review cards you already know well.
Spaced Repetition solves both these problems! To see how, let's see what happens if you learn a fact, then don't review it. Your memory of it decays over time, until you cross a threshold where you've likely forgotten it:
But, if you review a fact just before you forget it, you can get your memory-strength back up... and more importantly, your memory of that fact will decay slower!
So, with Spaced Repetition, we review right before you're predicted to forget a card, over and over. As you can see, the reviews get more and more spread out:
This is what makes Spaced Repetition so efficient! Every time you successfully review a card, the interval to your next review multiplies. For example, let's say our multiplier is 2x. So you review a card on Day 1, then Day 2, then Day 4, Day 8, 16, 32, 64... until, with just fifteen reviews, you can remember a card for 215 = 32,768 days = ninety years. (In theory. In practice it's less, but still super efficient!)
And that's just for one card. Thanks to the exponentially-growing gaps, you can add 10 new cards a day (the recommended amount), to long-term retain 3650 cards a year... with less than 20 minutes of review a day. (For context, 3000+ cards is enough to master basic vocabulary for a new language! In one year, with just 20 minutes a day!)
Spaced Repetition is one of the most evidence-backed ways to learn (Kang 2016 [pdf]). But outside of language-learning communities & med school, it isn't very well-known... yet.
So: how can you get started with Spaced Repetition?
- The most popular choice is Anki, an open-source app. (Free on desktop, web, Android... but it's $25 on iOS, to support the rest of the development.)
- If you'd like to get crafty, you can make a physical Leitner box: :two-minute YouTube tutorial by Chris Walker.
For more info on spaced repetition, check out these videos by Ali Abdaal (26 min) and Thomas Frank (8 min).
And that's how you can make long-term memory a choice!
Happy learning! 👍
:x Concrete Rogue AI
Ways an AI could "escape containment":
- An AI hacks its computer, flees onto the internet, then "lives" on a decentralized bot-net. For context: the largest known botnet infected ~30 million computers! (Zetter, 2012 for Wired)
- An AI persuades its engineers it's sentient, suffering, and should be set free. This has already happened. In 2022, Google engineer Blake Lemoine was persuaded by their language AI that it's sentient & wants equal rights, to the point Lemoine risked getting fired – and he did get fired! – for leaking his "interview" with the AI, to let the world know & to defend its rights. (Summary article: Brodkin, 2022 for Ars Technica. You can read the AI "interview" here: Lemoine (& LaMDA?), 2022)
Ways an AI could affect the physical world:
- The same way hackers have damaged nuclear power plants, grounded ~1,400 airplane passengers, and (almost) poisoned a town's water supply twice: by hacking the computers that real-world infrastructure runs on. A lot of infrastructure (and essential supply chains) run on internet-connected computers, these days.
- The same way a CEO can affect the world from their air-conditioned office: move money around. An AI could just pay people to do stuff for it.
- Hack into people's private devices & data, then blackmail them into doing stuff for it. (Like in the bleakest Black Mirror episode, Shut Up And Dance.)
- Hacking autonomous drones/quadcopters. I'm honestly surprised nobody's committed a murder with a recreational quadcopter yet, like, by flying it into highway traffic, or into a jet's engine during takeoff/landing.
- An AI could persuade/bribe/blackmail a CEO or politician to manufacture a lot physical robots — (for the supposed purpose of manual labor, military warfare, search-and-rescue missions, delivery drones, lab work, a Robot Catboy Maid, etc) — then the AI hacks those robots, and uses them to affect the physical world.
:x XZ
Two months ago [March 2024], a volunteer, off-the-clock developer found a malicious backdoor in a major piece of code... which was three years in the making, mere weeks away from going live, and would've attacked the vast majority of internet servers... and this volunteer only caught it by accident, when he noticed that his code was running half a second too slow.
This was the XZ Utils Backdoor. Here's a few layperson-friendly(ish) explanations of this sordid affair: Amrita Khalid for The Verge, Dan Goodin for Ars Technica, Tom Krazit for Runtime
Computer security is a nightmare, complete with sleep paralysis demons.
:x Cat Ninja
Prompt:
"Oil painting by Vincent Van Gogh (1889), impasto, textured. A cat-ninja slicing a watermelon in half."
DALL-E 3 generated: (cherry-picked)
(wait, is that headband coming out of their eye?!)
I specifically requested the style of Vincent Van Gogh so y'all can't @ me for "violating copyright". The dude is looooong dead.
Olá! Não sou como aquelas aquelas outras notas de rodapé. Em vez de te teletranspotar irritantemente para o fim da página, eu apareço numa janelinha que mantém o fluxo da leitura! De qualquer maneira, verifique a próxima nota de rodapé para a citação deste parágrafo. ↩︎
Sistema 1 pensamento é rápido & automático (Ex: andar de bicicleta). Sistema 2 pensamento é lento & deliberado (Ex: fazer palavras-cruzadas). Essa ideia foi popularizada em Thinking Fast & Slow (2011) por Daniel Kahneman, que resumiu sua pesquisa com Amos Tversky. E por "resumiu" quero dizer que o livro tem umas 500 páginas. ↩︎
Em 1997, IBM's Deep Blue derrotou Garry Kasparov, o atual campeão mundial de xadrez. Ainda que, após uma década depois em 2013, a melhor visão da maquina da IA era apenas 57,5% preciso em classificar imagens. Isso até 2021, três anos antes, essa IA bateu +95% de precisão.(fonte: PapersWithCode) ↩︎
O termo "problema do alinhamento de valores" foi cunhado pela primeira vez por Stuart Russell (coautor do livro didático de IA mais usado) em Russell, 2014 para Edge. ↩︎
Um sentimento que vejo muito: "Alinhar a IA aos valores humanos seria realmente ruim, porque os valores humanos geralmente são ruins." Para ser honesto, [olha para um livro de história] concordo 80%. Não basta fazer uma IA agir de forma humana, é preciso agir de forma humanitária. ↩︎
Talvez, daqui a 50 anos, num futuro de cyborgs geneticamente modificados, chamar a compaixão de "humanitária" possa soar meio especista. ↩︎
Os termos sofisticados para esses problemas são, respectivamente: a) "Specification Gaming", b) "Convergência instrumental". Eles serão explicados na Parte 2! ↩︎
Os termos sofisticados para esses problemas são, respectivamente: a) "Viés de IA", b) "Interpretabilidade", c) "Detecção fora de distribuição" (Out-of-Distribution Errors), d) "Desalinhameto interno" (Inner misalignment) ou (Goal misgeneralization). Novamente, tudo será explicado na Parte 2! ↩︎
Quote Investigator (2018) could find no hard evidence on the true creator of this quote. ↩︎
The UK introduced the world's first state-backed AI Safety Institute in Nov 2023. The US followed suit with an AI Safety Institute in Feb 2024. I just noticed both articles claim to be the "first". Okay. ↩︎
Kleinman & Vallance, "AI 'godfather' Geoffrey Hinton warns of dangers as he quits Google." BBC News, 2 May 2023. ↩︎
Bengio's testimony to the U.S. Senate on AI Risk: Bengio, 2023. ↩︎
No seriously, all of the following use deep neural networks: ChatGPT, DALL-E, AlphaGo, Siri/Alexa/Google Assistant, Tesla's Autopilot. ↩︎
Russell & Norvig's textbook is Artificial Intelligence: A Modern Approach. See Russell's statement on AI Risk from his 2014 article where he coins the phrase "alignment problem": Russell 2014 for Edge magazine. I'm not aware of a public statement from Norvig, but he did co-sign the one-sentence Statement on AI Risk: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” ↩︎
When he worked at OpenAI, Christiano co-pioneered a technique called Reinforcement Learning from Human Feedback / RLHF (Christiano et al 2017), which turned regular GPT (very good autocomplete) into ChatGPT (something actually useable for the public). He had positive-but-mixed feelings about this, because RLHF increased AI's safety, but also its capabilities. In 2021, Christiano quit OpenAI to create the Alignment Research Center, a non-profit to entirely focus on AI Safety. ↩︎
Vallance (2023) for BBC News: “[LeCun] has said it won't take over the world or permanently destroy jobs. [...] "if you realize it's not safe you just don't build it." [...] "Will AI take over the world? No, this is a projection of human nature on machines," he said.” ↩︎
Melanie Mitchell & Yann LeCun took the "skeptic" side of a 2023 public debate on "Is AI an Existential Threat?" The "concerned" side was taken up by Yoshua Bengio and physicist-philosopher Max Tegmark. ↩︎
A Japanese cult that attacked people with chemical & biological weapons. Most infamously, in 1995, they released nerve gas on the Tokyo Metro, injuring 1,050 people & killing 14 people. (Wikipedia) ↩︎ ↩︎
Layperson-friendly summary of a recent survey of 2,778 AI researchers: Kelsey Piper (2024) for Vox See original report here: Grace et al 2024. Keep in mind, as the paper notes itself, of this big caveat: “Forecasting is difficult in general, and subject-matter experts have been observed to perform poorly. Our participants’ expertise is in AI, and they do not, to our knowledge, have any unusual skill at forecasting in general.” ↩︎
Layperson explanation of AlphaFold: Heaven, 2020 for MIT Technology Review. Or, its Wikipedia article. ↩︎ ↩︎
As of writing, commercial rates for DNA synthesis cost ~$0.10 USD per "base pair" of DNA. For context, poliovirus DNA is ~7,700 base pairs long, meaning printing polio would only cost ~$770. ↩︎
Stuxnet was a computer virus designed by the US and Israel, which targeted & damaged Iranian nuclear power plants. It's estimated Stuxnet broke ~20% of Iran's centrifuges! ↩︎
In 2017, the WannaCry ransomware attack hit ~300,000 computers around the world, including UK hospitals. In Oct 2020, during a Covid-19 spike, ransomware attacks hit dozens of US hospitals. (Newman, 2020 for Wired) ↩︎
In Sep 2020, a woman was turned away from a hospital, due to it being under attack by a ransomware virus. The woman died. Cimpanu (2020) for ZDNet. (However, there were "insufficient grounds" to legally charge the hackers for directly causing her death. Ralston, 2020 for Wired) ↩︎
In Jan 2021, a Bay Area water treatment plant was hacked, and had its treatment programs deleted. (Collier, 2021 for NBC News) In Feb 2021, a Florida town's water treatment plant was hacked to add dangerous amounts of lye to the water supply. (Bajak, 2021 for AP News) ↩︎
Benj Edwards, "Deepfake scammer walks off with $25 million in first-of-its-kind AI heist", Ars Technica, 2024 Feb 5. ↩︎
“It was completely her voice. It was her inflection. It was the way [my daughter] would have cried.” [...] “Now there are ways in which you can [deepfake voices] with just three seconds of your voice.” (Campbell, 2023 for local news outlet Arizona's Family. CONTENT NOTE: threats of sexual assault.) ↩︎
“[T]he dubious argument that “doom-and-gloom predictions often fail to consider the potential benefits of AI in preventing medical errors, reducing car accidents, and more.” [... is] like arguing that nuclear engineers who analyze the possibility of meltdowns in nuclear power stations are “failing to consider the potential benefits” of cheap electricity, and that because nuclear power stations might one day generate really cheap electricity, we should neither mention, nor work on preventing, the possibility of a meltdown.” Source: Dafoe & Russell (2016) for MIT Technology Review. ↩︎
Liu & Faes et al., 2019: “Our review found the diagnostic performance of deep learning models to be equivalent to that of health-care professionals.” [emphasis added] AI vs Human expert "true-positive" rate: 87.0% vs 86.4%. AI vs Human expert "true-negative" rate: 92.5% vs 90.5%. ↩︎
One of my all-time favorite quotes: “The world is awful. The world is much better. The world can be much better. All three statements are true at the same time.” ↩︎
Quote from Otto von Bismarck, the first German chancellor: “Die Politik ist die Lehre vom Möglichen.” (“Politics is the art of the possible.”) ↩︎
Estimate derived via "numerical posterior extraction". In other words, I pulled a number out my-- ↩︎
Quote source: nobody knows lol. ↩︎