On this page
The chosen direction was Moved a telecom routing operation off unreliable commodity desktop hardware onto a geographically redundant private cloud, end to end — including the procurement, logistics, and physical build. That decision was not made because it was glamorous; it was chosen because it fit the constraint better than a long series of patches would have. Once the problem was understood, the job was to reduce the system to the smallest version that could still produce the required outcome.
Every project in this set had at least one tradeoff that mattered more than the others. Sometimes the tradeoff was short-term cost versus long-term flexibility, and sometimes it was more operational ownership in exchange for lower recurring cost or better control.
The common mistake would have been to treat the problem as a technology decision alone. In practice, the better answer was usually to keep the architecture simple enough that the business could keep running it after the initial build was done.
The right solution was the one that removed friction without creating a new layer of dependence, because that is what makes the result sustainable.
The work still had to respect the existing business process, the real data model, and the people who would operate the system after launch. That is why the solution paragraphs in the project files keep coming back to normalization, workflow control, and explicit automation rather than vague modernization language.
A system only improves the outcome if it can survive day-to-day use. That is the bar the project had to clear before any of the build work was worth shipping.
The implementation stayed close to QEMU/libvirt, Debian/RHEL Linux, Cisco firewalls & switching, distributed DNS because the new system still had to live inside the same operating environment as the old one. That kept the work from drifting into a clean-room exercise that would look better on paper than it would in production. The practical question was always whether the implementation could hold up under the real workflow and the real users. If it could not do that, it was not finished.
The constraint behind the step was that Production ran on repurposed commodity desktops bought through secondary channels — unreliable, inconsistent, hard to scale, and space- and cooling-hungry, with recurring service-impacting outages. That is why the work had to trade one kind of cost for another instead of trying to eliminate cost altogether. In almost every case, the useful move was to spend a little more effort on clarity, validation, or control so the business would spend less effort on repeated manual work later. That is the pattern the project files keep pointing to.
The role in the work was Sole infrastructure architect, systems engineer, deployment lead, and procurement owner. That meant the implementation could not stop at the code boundary because the operating model, handoff, and support path were part of the outcome. The relevant outcome was Cut service-impacting outages from ~2–3/month to ~98. The build only earns its place if the new result is visible in the way the business works after launch.
The specific step in this article was Moved a telecom routing operation off unreliable commodity desktop hardware onto a geographically redundant private cloud, end to end — including the procurement, logistics, and physical build. That is the piece that moves the story from analysis into execution. It is also the part that shows the difference between a conceptual fix and a system people can actually use. That distinction matters more than style or novelty.