Part 2: The 2 AM Call That Changed Everything

When 24/7 Support Means 24/7 Chaos

Starting 2022 as the Product Engineering Manager responsible for delivering McDonald's backend marketing tools, I thought I had found my stride. The role was demanding, sure, but it felt manageable—until it wasn't.

With guidance from my direct manager and mentors, I made what seemed like a natural progression into Global Technology Infrastructure and Operations as the lone Cloud Operations Manager. I walked in with eyes wide open, knowing there was no foundation, no playbook, and limited resources. What I didn't anticipate was the perfect storm waiting on the other side of that door.

Not only was I stepping into one of the most demanding roles of my career, but I was doing it with almost no hands-on cloud experience. Under normal circumstances, this would have been a recipe for disaster. But I had something crucial: support from leaders who understood the challenge and gave me room to learn while delivering.

Still, understanding only goes so far when you're drowning in operational chaos. The pressure to reduce the constant noise—the alerts, the escalations, the firefighting—was relentless.

There Was No "Off" Switch

Here's what people don't tell you about cloud operations at scale: there's no such thing as business hours.

A call could come at 10 PM. Or 2 AM. Or during your kid's birthday party. It didn't matter. Product teams had put my face in front of operations, and that included managing our Managed Service Providers (MSPs). I became the human API between frustrated internal customers and third-party vendors who operated on their own timelines.

I still remember the day we hired my new manager and a counterpart to help share the load. I thought to myself: Thank God. Help has arrived.

My manager quickly recognized what I'd been living: we were in a fundamentally non-scalable position. The way we were delivering to product teams and global markets simply wasn't working. He and my counterpart sat down together and asked the critical question: "How do we get more accountability from our MSPs?"

On paper, we had everything we needed. Account managers. Architects. Even engineers onsite. But somehow, everything defaulted to the status quo. Nothing changed. The 2 AM calls kept coming.

The Shared Responsibility Problem

My counterpart and I felt the mounting frustration from every direction—internal customers, leadership, management—all pushing us to demand more from our MSPs.

But here's where the model broke down: our MSPs operated under a shared responsibility framework. In theory, this splits the workload. In practice? It meant accountability fell into a gap, and guess who was standing in that gap?

Us.

The Cloud Ops team became responsible for everything our MSP wouldn't touch. And here's the kicker: neither my counterpart nor I had deep hands-on development or cloud engineering experience at that level. We were learning on the fly, under fire, at 2 AM.

Now, maybe it's my military background, but I've always believed in adapting and overcoming. Eyes wide open. Do what needs to be done. My counterpart shared that mindset. But not everyone on the team did. When you're rotating overnight on-call shifts among just three people—and your name is on everything because you're "the manager"—burnout isn't a question of if, but when.

The Moment Everything Shifted

After yet another late-night crisis, the three of us looked at each other and said what we'd all been thinking: This is not scalable.

Then my counterpart, Anshul, asked me a question that would redirect my entire career trajectory:

"Have you ever used ChatGPT?"

I hadn't. It was late 2022, and ChatGPT had just launched in November. Most people were still treating it as a curiosity—a parlor trick that could write poems or answer trivia. Our security org certainly wasn't sold on it. In fact, we were explicitly told it wasn't an approved application.

At the time, that didn't matter. Anshul and I weren't trying to deploy ChatGPT into production. We were working directly with our AWS architects to build a proof of concept for leadership—something, anything, that could help us escape the cycle of reactive firefighting.

Our AWS partners understood the position we were in. They didn't just sell us services; they rolled up their sleeves and worked alongside us. They saw the problem clearly: we needed intelligent automation, not just scripts. We needed systems that could learn, adapt, and support themselves.

That collaboration became the foundation for what you'll hear about in Part 3.

What I Learned at 2 AM

Looking back, those sleepless nights taught me something fundamental about modern operations:

Human-only support doesn't scale. Not at cloud speed. Not at global scale. Not when "always on" actually means always on.

But here's what I also learned: the answer isn't replacing people. It's giving people better tools. It's building systems that can handle the repetitive, the urgent, and the predictable—so humans can focus on the strategic, the creative, and the complex.

The 2 AM calls kept coming in 2022. But they sparked something. A question that wouldn't go away: What if we could build operations differently?

Anshul and I, along with our AWS partners, started working on that proof of concept. We had a vision. We had support from leadership. We had the motivation—those 2 AM pages were all the motivation we needed.

We thought we knew what we were building.

We didn't.

Next week in Part 3: "We Tried This in 2022"—the story of what we built, what we learned, and why sometimes the best foundations come from unexpected places.

See you next week.

Author Bio Name: Brian Alvarez Title: Founder, AgenticFlowPro & HelixCloudOps Bio: Building the future of AI-powered cloud operations. Former Cloud Ops Manager at McDonald's, now creating autonomous DevOps agents that support—not replace—engineering teams.

Links:

LinkedIn: https://www.linkedin.com/in/brian-alvarez-helix/
Website: https://agenticflowpro.com