2793 stories
·
0 followers

First I wrote the wrong book, then I wrote the right book (xpost)

1 Share

I’m not sure whether to say “thank you” or “HOW COULD YOU DO THIS TO ME”, but this one goes out to all the people who sent me advice on buying software last fall.

This is the second in a two-part episode. The first part ended on a ✨cliffhanger!!!✨ — so if you missed the first episode, catch up here:

Six long weeks of writer’s block

I was merrily cranking away what I believed to be my last chapter when I asked the internet — YOU guys — for help the first time. “Are you an experienced software buyer? I could use some help,” went up on September 19th, 2025.

The response was overwhelming. I heard from software engineers, SREs, observability leads, CTOs, VPs, distinguished engineers, consultants, even the odd CISO. All these emails and responses and lengthy threads kept me busy for a while, but eventually I had to get back to writing. That’s when I discovered, to my unpleasant surprise, that I couldn’t seem to write anymore.

“Well,” I reasoned, “maybe I’ll just ask the internet for EVEN MORE advice” — and out popped Buffy-themed post number two, on October 13th.

Keep in mind, I thought I would be done by then. November was my stretch deadline, my just in caseI better leave myself some breathing room kind of deadline.

As November 1st came and went, my frustration began spiraling out into blind panic. What the hell is going on and why can I not finish this???

In which I finally listen to the advice I asked for

A week before Thanksgiving, I was up late tinkering with Claude. I imported all the emails and advice I had gotten from y’all, and started sorting into themes and picking out key quotes, and that is when it finally hit me: I had written the wrong thing.

No, this deserves a bigger font.

✨I wrote the wrong thing.✨

I wrote the wrong thing, for the wrong people, and none of it was going to move the needle in any meaningful way.

The chapters I had written were full of practical advice for observability engineering teams and platform engineering teams, wrestling with implementation challenges like instrumentation and cost overflows. Practical stuff.

Yes.

The internet was right (this ONE time)

My inbox, on the other hand, was overflowing with stories like these:

  • “Many times [competitive research] is faked. One person has their favorite option and then they do just enough ‘competitive analysis’ to convince the sourcing folks that due diligence was done or to nullify the CIO/CTO/whoever is accepting this on to their budget”
  • “We [the observability team] spent six months exhaustively trialing three different solutions before we made a decision. The CEO of one of the losing vendors called our CEO, and he overruled our decision without even telling us.” (Does your CEO know anything at all about engineering??) “No.”
  • “Our SRE teams have vetoed any attempt to modernize our tool stack. ($Vendor) is part of their identity, and since they would have to help roll out and support any changes, we are stuck living in 2015 apparently forever.” (What does management have to say?) “It’s been twenty years since they touched a line of code.”
  • “We’re weird in that most of the company hates technology and really hates that we have to pay for it since they don’t understand the value it brings to the company. This is intentional ignorance, we make the value props continually and well, we just haven’t succeeded yet….We’re a little obsessed with trying to get champagne quality at Boone’s prices.”
  • “When it comes to dealing with salespeople and the enterprise sales process, the best tip for engineers is to not anthropomorphize sales professionals who are driven by commission. The best ones are like robot lawn mowers dressed in furry unicorn costumes. They may seem cute and nice but they do not care about anything besides closing the next deal….All of the best SaaS companies are full of these friendly fake unicorn zombies who suck cash instead of blood.”

Nearly all of the emails I got were either describing a terminally fucked up buying process from the top down, or the long term consequences of those fucked up decisions.

In other words: I was writing tactical advice for teams who were surviving in a strategic vacuum.

So I threw the whole thing out, and started over from scratch. 😭

Even good teams are struggling right now

As Tolstoy once wrote, “Happy teams are all alike; every fucked up team is fucked up in its own precious way.”

There is an infinity of ways to screw something up. But there is one pattern I see a critical mass of engineering orgs falling into right now, even orgs that are generally quite solid. That is when there is no shared alignment or even shared vocabulary between engineering and other stakeholders directors, VPs and SVPs, CTO, CIO, principal and distinguished engineers — on some pretty clutch questions. Such as:

  • “What is observability?”
  • “Who needs it?”
  • “What problem are we trying to solve?”

And my favorite: “Is observability still relevant in a post-AI era? Can’t agents do that stuff now?”

Even some generally excellent CTOs[1] have been heard saying things like, “yeah, observability is definitely very important, but all our top priorities are related to AI right now.”

Which gets causality exactly backwards. Because your ability to get any returns on your investments into AI will be limited by how swiftly you can validate your changes and learn from them. Another word for this is “OBSERVABILITY”.

Enough ranting. Want a peek? I’ll share the new table of contents, and a sentence or two about a couple of my own favorite chapters.

Part 6: “Observability Governance” (v2)

The new outline is organized to speak to technical decision-makers, starting at the top and loosely descending. What do CTOs need to know? What do VPs and distinguished engineers need to know? and so on. We start off abstract, and become more concrete.

Since every technical term (e.g. high cardinality, high dimensionality, etc) has become overloaded and undifferentiated by too much sales and marketing, we mostly avoid it. Instead, we use the language of systems and feedback loops.

Again, we are trying to help your most senior engineers and execs develop a shared understanding of “What problem are we solving?” and “What is our goal? Technical terms can actually detract and distract from that shared understanding.

  1. An Open Letter to CTOs: Why Organizational Learning Speed is Now Your Biggest Constraint. Organizations used to be limited by the speed of delivery; now they are limited by how swiftly they can validate and understand what they delivered.
  2. Systems Thinking for Software Delivery. Observability is the signal that connects the dots to make a feedback loop; no observability, no loop. What happens to amplifying or balancing loops when that signal is lossy, laggy, or missing?
  3. The Observability Landscape Through a Systems Lens. What feedback loops do developers need, and what feedback loops does ops need? How do these map to the tools on the market?
  4. The Business Case for Observability. Is your observability a cost center or an investment? How should you quantify your RoI?
  5. Diagnosing Your Observability Investment
  6. The Organizational Shift
  7. Build vs Buy (vs Open Source)
  8. The Art and Science of Vendor Partnerships. Internal transformations run on trust and credibility; vendor partnerships run on trust and reciprocity. We’ll talk about both of these, as well as how to run a strong POC.
  9. Instrumentation for Observability Teams
  10. Where to Go From Here

Hey, I have a lot of empathy right now for leaders and execs who feel like they’re behind on everything. I feel it too. Anyone who doesn’t is lying to themselves (or their name is Simon Willison).

But the role observability plays in complex sociotechnical systems is one of those foundational concepts you need to understand. You’re not gonna get this right by accident. You’re not going to win by doing the same thing you were doing five years ago. And if you screw up your observability, you screw up everything downstream of it too.

To those of you who do understand this, and are working hard to drive change in your organizations: I see you. It is hard, often thankless work, but it is work worth doing. If I can ever be of help: reach out.

A longer book, but a better book

The last few chapters are heading into tech review on Friday, February 20th. Finally. The last 3.5 months have been some of the most panicky and stressful of my life. I….just typed several paragraphs about how terrible this has been, and deleted them, because you do not need to listen to me whine. ☺️

Like I said, I have never felt especially proud of the first edition. I am not UN proud, it’s just…eh. I feel differently this time around. I think—I hope—it can be helpful to a lot of different people who are wrestling with adapting to our new AI-native reality, from a lot of different angles.[2]

Thanks, Christine. (Another for the folder marked ”NOW YOU TELL ME”)

I am incredibly grateful to my co-authors, collaborators, and our editor, Rita Fernando, without whom I never would have made it through.

But there’s one more group that deserves some credit, and it’s…you guys. I asked for help, and help I got. So many people wrote me such long, thought-provoking emails full of stories, advice and hard-earned wisdom. The better the email, the more I peppered you with followup questions, which is a great way to punish a good deed.

Blame these people

I am a tiny bit torn on whether to say “thank you” or “fuck you”, because my life would have been much nicer if I had stuck to the plan and wrapped in October.

But the following list of people were especially instrumental in forcing me to rethink my approach. It made the book much stronger, so if you catch one of them in the wild, please buy them a stiff drink. (Or buy yourself one, and throw it in their face with my sincere compliments.)

  • Abraham Ingersoll, the aforementioned “odd CISO”, who would be quoted in the book had his advice not been so consistently unprintable by the standards of respectable publications
  • Benjamin Mann of Delivery Hero, who I would work for in a heartbeat, and not just for the way he wields “NOPE” as a term of art
  • Marty Lindsay, who has spent more time explaining POCs and tech evals to me than anyone should have to. (If you need an o11y consultant, Marty should be your very first stop).
  • Sam Dwyer, whose stories seeded my original plan to write a set of chapters for observability engineering teams. (I hope the replacement plan is useful too!)

Many others sent me terrific advice, and endured multiple rounds of questions and more questions and clarifications on said questions. A few of them:

Matthew Sanabria, Chris Cooney, Glen Mailer, Austin Culbertson, John Scancella, John Doran, Bryan Finster, Hazel Weakly, Chris Ziehr, Thomas Owens, Mike Lee, Jay Gengelbach, Will Hegedus, Natasha Litt, Alonso Suarez, Jason McMunn, Evgeny Rubtsov, George Chamales, Ken Finnegan, Cliff Snyder, Robyn Hirano, Rita Canavarro, Matt Schouten, Shalini Samudri Ananda Rao (Sam).

I am definitely forgetting some names; I will try to update the list as I remember them.

But seriously: thank you, from the bottom of my heart. I loved hearing your stories, your complaints, your arguments about how the world should improve. Your DNA is in this book; I hope it does you justice.

~charity
💜💙💚💛🧡❤️💖

 

[1] It’s ironic (and makes me uncomfortably self-conscious), but some of the worst top-down decision-making processes I have ever seen have come from companies where CEO and CTO are both former engineers. The confidence they have in their own technical acumen may be not wholly unfounded, but it is often ten or more years out of date. We gotta update those priors, my friends. Stay humble.

[2] On the other hand, as my co-founder, Christine Yen, informed me last week: “Nobody reads books anymore.”

Read the whole story
huskerboy
5 hours ago
reply
Seattle
Share this story
Delete

Nobody knows how the whole system works

1 Share

One of the surprising (at least to me) consequences of the fall of Twitter is the rise of LinkedIn as a social media site. I saw some interesting posts I wanted to call attention to:

First, Simon Wardley on building things without understanding how they work:

Here’s Adam Jacob in response:

And here’s Bruce Perens, whose post is very much in conversation with them, even though he’s not explicitly responding to either of them.

Finally, here’s the MIT engineering professor Louis Bucciarelli from his book Designing Engineers, written back in 1994. Here I’m just copying and paste the quotes from my previous post on active knowledge.

A few years ago, I attended a national conference on technological literacy… One of the main speakers, a sociologist, presented data he had gathered in the form of responses to a questionnaire. After a detailed statistical analysis, he had concluded that we are a nation of technological illiterates. As an example, he noted how few of us (less than 20 percent) know how our telephone works.

This statement brought me up short. I found my mind drifting and filling with anxiety. Did I know how my telephone works?

I squirmed in my seat, doodled some, then asked myself, What does it mean to know how a telephone works? Does it mean knowing how to dial a local or long-distance number? Certainly I knew that much, but this does not seem to be the issue here.

No, I suspected the question to be understood at another level, as probing the respondent’s knowledge of what we might call the “physics of the device.”I called to mind an image of a diaphragm, excited by the pressure variations of speaking, vibrating and driving a coil back and forth within a a magnetic field… If this was what the speaker meant, then he was right: Most of us don’t know how our telephone works.

Indeed, I wondered, does [the speaker] know how his telephone works? Does he know about the heuristics used to achieve optimum routing for long distance calls? Does he know about the intricacies of the algorithms used for echo and noise suppression? Does he know how a signal is transmitted to and retrieved from a satellite in orbit? Does he know how AT&T, MCI, and the local phone companies are able to use the same network simultaneously? Does he know how many operators are needed to keep this system working, or what those repair people actually do when they climb a telephone pole? Does he know about corporate financing, capital investment strategies, or the role of regulation in the functioning of this expansive and sophisticated communication system?

Does anyone know how their telephone works?

There’s a technical interview question that goes along the lines of: “What happens when you type a URL into your browser’s address bar and hit enter?” You can talk about what happens at all sorts of different levels (e.g., HTTP, DNS, TCP, IP, …). But does anybody really understand all of the levels? Do you know about the interrupts that fire inside of your operating system when you actually strike the enter key? Do you know which modulation scheme being used by the 802.11ax Wi-Fi protocol in your laptop right now? Could you explain the difference between quadrature amplitude modulation (QAM) and quadrature phase shift keying (QPSK), and could you determine which one your laptop is currently using? Are you familiar with the relaxed memory model of the ARM processor? How garbage collection works inside of the JVM? Do you understand how the field effect transistors inside the chip implement digital logic?

I remember talking to Brendan Gregg about how he conducted technical interviews, back when we both worked at Netflix. He told me that he was interested in identifying the limits of a candidate’s knowledge, and how they reacted when they reached that limit. So, he’d keep asking deeper questions about their area of knowledge until they reached a point where they didn’t know anymore. And then he’d see whether they would actually admit “I don’t know the answer to that”, or whether they would bluff. He knew that nobody understood the system all of the way down.

In their own ways, Wardley, Jacob, Perens, and Bucciarelli are all correct.

Wardley’s right that it’s dangerous to build things where we don’t understand the underlying mechanism of how they actually work. This is precisely why magic is used as an epithet in our industry. Magic refers to frameworks that deliberately obscure the underlying mechanisms in service of making it easier to build within that framework. Ruby on Rails is the canonical example of a framework that uses magic.

Jacob is right that AI is changing the way that normal software development work gets done. It’s a new capability that has proven itself to be so useful that it clearly isn’t going away. Yes, it represents a significant shift in how we build software, it moves us further away from how the underlying stuff actually works, but the benefits exceed the risks.

Perens is right that the scenario that Wardley fears has, in some sense, already come to pass. Modern CPU architectures and operating systems contain significant complexity, and many software developers are blissfully unaware of how these things really work. Yes, they have mental models of how the system below them works, but those mental models are incorrect in fundamental ways.

Finally, Bucciarelli is right that systems like telephony are so inherently complex, have been built on top of so many different layers in so many different places, that no one person can ever actually understand how the whole thing works. This is the fundamental nature of complex technologies: our knowledge of these systems will always be partial, at best. Yes, AI will make this situation worse. But it’s a situation that we’ve been in for a long time.



Read the whole story
huskerboy
10 days ago
reply
Seattle
Share this story
Delete

Backdoor in Notepad++

1 Share

Hackers associated with the Chinese government used a Trojaned version of Notepad++ to deliver malware to selected users.

Notepad++ said that officials with the unnamed provider hosting the update infrastructure consulted with incident responders and found that it remained compromised until September 2. Even then, the attackers maintained credentials to the internal services until December 2, a capability that allowed them to continue redirecting selected update traffic to malicious servers. The threat actor “specifically targeted Notepad++ domain with the goal of exploiting insufficient update verification controls that existed in older versions of Notepad++.” Event logs indicate that the hackers tried to re-exploit one of the weaknesses after it was fixed but that the attempt failed.

Make sure you’re running at least version 8.9.1.

Read the whole story
huskerboy
14 days ago
reply
Seattle
Share this story
Delete

Volleyball Player Does Sliding Dogeza Apology

1 Share
During an exhibition, Japanese volleyball player Yuji Nishida hit a courtside judge in the back with an errant serve. He immediately sprinted across the court and dove prostrate in apology. The gesture was a sort of sliding dogeza:

Even in a country where a sincere apology can go a long way, Nishida’s mea culpa was an extreme example. The most extravagant form in Japanese culture is the dogeza, which can also be used to express deep respect.

When used as an apology, the person in the wrong prostrates themselves and bows so that their forehead touches the floor between their hands. While the dogeza is rarely seen in public, scandal-hit politicians have used equally theatrical gestures to communicate their remorse.

Nishida followed up his slide with several more bows.

Tags: Japan · sports · video · volleyball · Yuji Nishida

💬 Join the discussion on kottke.org

Read the whole story
huskerboy
15 days ago
reply
Seattle
Share this story
Delete

Elon Musk and other internet racists started an internet...

1 Share
Elon Musk and other internet racists started an internet war over Christopher Nolan’s casting of Lupita Nyong’o as Helen of Troy, thereby immediately disproving their point.

💬 Join the discussion on kottke.org

Read the whole story
huskerboy
15 days ago
reply
Seattle
Share this story
Delete

Been thinking a lot about this Ted Chiang quote...

2 Shares
Been thinking a lot about this Ted Chiang quote recently: “I tend to think that most fears about A.I. are best understood as fears about capitalism. And I think that this is actually true of most fears of technology, too.”

💬 Join the discussion on kottke.org

Read the whole story
huskerboy
15 days ago
reply
Seattle
Share this story
Delete
Next Page of Stories