AI

"It made me feel like I was eavesdropping"

The workers with knowledge of the project said it was initially intended to improve the chatbot’s voice capabilities, and the number of sexual or vulgar requests quickly turned it into an NSFW project.

“It was supposed to be a project geared toward teaching Grok how to carry on an adult conversation,” one of the workers said. “Those conversations can be sexual, but they’re not designed to be solely sexual.”

“I listened to some pretty disturbing things. It was basically audio porn. Some of the things people asked for were things I wouldn’t even feel comfortable putting in Google,” said a former employee who worked on Project Rabbit.

“It made me feel like I was eavesdropping,” they added, “like people clearly didn’t understand that there’s people on the other end listening to these things.”

A rude technology deserves a rude response

Critics have already written thoroughly about the environmental harms, the reinforcement of bias and generation of racist output, the cognitive harms and AI supported suicides, the problems with consent and copyright, the way AI tech companies further the patterns of empire, how it’s a con that enables fraud and disinformation and harassment and surveillance, the exploitation of workers, as an excuse to fire workers and de-skill work, how they don’t actually reason and probability and association are inadequate to the goal of intelligence, how people think it makes them faster when it makes them slower, how it is inherently mediocre and fundamentally conservative, how it is at its core a fascist technology rooted in the ideology of supremacy, defined not by its technical features but by its political ones.

[…]

I am here to be rude, because this is a rude technology, and it deserves a rude response. Miyazaki said, “I strongly feel that this is an insult to life itself.” Scam Altman said we can surround the solar system with a Dyson Sphere to hold data centers. Miyazaki is right, and Altman is wrong. Miyazaki tells stories that blend the ordinary and the fantastic in ways people find deeply meaningful. Altman tells lies for money.

LLMs don't do what they're usually presented as doing

LLMs, the technology underpinning the current AI hype wave, don’t do what they’re usually presented as doing. They have no innate understanding, they do not think or reason, and they have no way of knowing if a response they provide is truthful or, indeed, harmful. They work based on statistical continuation of token streams, and everything else is a user-facing patch on top.

Tags

Please stop externalizing your costs directly into my face

If you think [LLM] crawlers respect robots.txt then you are several assumptions of good faith removed from reality. These bots crawl everything they can find, robots.txt be damned, including expensive endpoints like git blame, every page of every git log, and every commit in every repo, and they do so using random User-Agents that overlap with end-users and come from tens of thousands of IP addresses – mostly residential, in unrelated subnets, each one making no more than one HTTP request over any time period we tried to measure – actively and maliciously adapting and blending in with end-user traffic and avoiding attempts to characterize their behavior or block their traffic.

We are experiencing dozens of brief outages per week, and I have to review our mitigations several times per day to keep that number from getting any higher. When I do have time to work on something else, often I have to drop it when all of our alarms go off because our current set of mitigations stopped working. Several high-priority tasks at SourceHut have been delayed weeks or even months because we keep being interrupted to deal with these bots, and many users have been negatively affected because our mitigations can’t always reliably distinguish users from bots.

All of my sysadmin friends are dealing with the same problems. I was asking one of them for feedback on a draft of this article and our discussion was interrupted to go deal with a new wave of LLM bots on their own server.

[…]

Please stop legitimizing LLMs or AI image generators or GitHub Copilot or any of this garbage. I am begging you to stop using them, stop talking about them, stop making new ones, just stop. If blasting CO2 into the air and ruining all of our freshwater and traumatizing cheap laborers and making every sysadmin you know miserable and ripping off code and books and art at scale and ruining our fucking democracy isn’t enough for you to leave this shit alone, what is?

Language models don't "know" anything

Training data are construction materials for a language models. A language model can never be inspired. It is itself a cultural artefact derived from other cultural artefacts.

The machine learning process is loosely based on decades-old grossly simplified models of how brains work.

[…]

It’s important to remember this so that we don’t fall for marketing claims that constantly imply that these tools are fully functioning assistants.

[…]

Who was the first man on the moon?

[…]

What you’re likely to get back from that prompt would be something like:

On July 20, 1969, Neil Armstrong became the first human to step on the moon.

This is NASA’s own phrasing. Most answers on the web are likely to be variations on this, so the answer from a language model is likely to be so too.

[…]

The prompt we provided is strongly associated in the training data set with other sentences that are all variations of NASA’s phrasing of the answer. The model won’t answer with just “Neil Armstrong” because it isn’t actually answering the question, it’s responding with the text that correlates with the question. It doesn’t “know” anything.

Why using language models for programming is a bad idea

A core aspect of the theory-building model of software development is code that developers don’t understand is a liability. It means your mental model of the software is inaccurate which will lead you to create bugs as you modify it or add other components that interact with pieces you don’t understand.

Language model tools for software development are specifically designed to create large volumes of code that the programmer doesn’t understand. They are liability engines for all but the most experienced developer. You can’t solve this problem by having the “AI” understand the codebase and how its various components interact with each other because a language model isn’t a mind. It can’t have a mental model of anything. It only works through correlation.