On this page
The project existed because Production ran on repurposed commodity desktops bought through secondary channels — unreliable, inconsistent, hard to scale, and space- and cooling-hungry, with recurring service-impacting outages. Dedicated hosting was financially and operationally inefficient at this scale. The infrastructure needed carrier-grade availability and routing resiliency for 100+ countries, achieved without enterprise spend. The source summary is: Moved a telecom routing operation off unreliable commodity desktop hardware onto a geographically redundant private cloud, end to end — including the procurement, logistics, and physical build. The driving constraint was carrier-grade availability on a startup budget: the routing carried millions of international minutes monthly, so outages were customer-facing and expensive, but there was no enterprise budget to solve it with. The role was: Sole infrastructure architect, systems engineer, deployment lead, and procurement owner. Owned architecture, virtualization design, datacenter planning, vendor negotiation, equipment purchasing, international logistics and customs, network and routing, deployment, cutover, and documentation — and built the production services running on top. The work sat squarely inside the existing business, so the goal was never to add complexity for its own sake.
Operating flow
- Map the current system and the constraint first.
- Choose the smallest change that can hold the load.
- Build against the real workflow instead of a toy case.
- Roll it out with enough monitoring to catch the edge cases.
This series follows the build in the order it happened: discovery, the solution direction, the implementation steps, and the operational result. Each post stays on one decision or one build step so the reader can see how the system moved from the initial constraint to a working result.
The details come from the project files and the company context, not from a generic template. That keeps the story grounded in the mechanics of the work: what was built, what it replaced, and what changed when it shipped.
The implementation stayed close to QEMU/libvirt, Debian/RHEL Linux, Cisco firewalls & switching, distributed DNS because the new system still had to live inside the same operating environment as the old one. That kept the work from drifting into a clean-room exercise that would look better on paper than it would in production. The practical question was always whether the implementation could hold up under the real workflow and the real users. If it could not do that, it was not finished.
The constraint behind the step was that Production ran on repurposed commodity desktops bought through secondary channels — unreliable, inconsistent, hard to scale, and space- and cooling-hungry, with recurring service-impacting outages. That is why the work had to trade one kind of cost for another instead of trying to eliminate cost altogether. In almost every case, the useful move was to spend a little more effort on clarity, validation, or control so the business would spend less effort on repeated manual work later. That is the pattern the project files keep pointing to.
The role in the work was Sole infrastructure architect, systems engineer, deployment lead, and procurement owner. That meant the implementation could not stop at the code boundary because the operating model, handoff, and support path were part of the outcome. The relevant outcome was Cut service-impacting outages from ~2–3/month to ~98. The build only earns its place if the new result is visible in the way the business works after launch.
The specific step in this article was Moved a telecom routing operation off unreliable commodity desktop hardware onto a geographically redundant private cloud, end to end — including the procurement, logistics, and physical build. That is the piece that moves the story from analysis into execution. It is also the part that shows the difference between a conceptual fix and a system people can actually use. That distinction matters more than style or novelty.
The point is to show how the system works, not to turn the project into a slogan or a summary stub.
When the architecture changes, the real question is what the new system allows the business to do that the old one could not. That shows up here in throughput, reliability, operating cost, turnaround time, and how much manual work disappears once the workflow is redesigned.