> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
If you must to deploy every service because of a library change, you don't have services, you have a distributed monolith. The entire idea of a "shared library" which must be kept updated across your entire service fleet is antithetical to how you need to treat services.
wowohwow 9 hours ago [-]
I think your point while valid, it is probably a lot more nuanced. From the post it's more akin to an Amazon shared build and deployment system than "every library update needs to redeploy every time scenario".
It's likely there's a single source of truth where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
Likewise, if there's a -need- to remove a version for a vulnerability or what have you, then everyone needs to redeploy, sure, but the centralized benefit of this likely outweighs the security cost and complexity of tracking the patching and deployment process for each and every service.
I would say those systems -are- and likely would be classified as micro services but from a cost and ease perspective operate within a shared services environment. I don't think it's fair to consider this style of design decision as a distributed monolith.
By that level of logic, having a singular business entity vs 140 individual business entities for each service would mean it's a distributed monolith.
mjr00 9 hours ago [-]
> It's likely there's a single source of truth for where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
No, this misses one of the biggest benefits of services; you explicitly don't need everyone to upgrade library-latest to 2.0 at the same time. If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
rbranson 5 hours ago [-]
This is explicitly called out in the blog post in the trade-offs section.
I was one of the engineers who helped make the decisions around this migration. There is no one size fits all. We believed in that thinking originally, but after observing how things played out, decided to make different trade-offs.
nine_k 3 hours ago [-]
To me it sounds like so: "We realized that we were not running microservice architecture, but rather a distributed monolith, so it made sense to make it a regular monolith". It's a decision I would wholeheartedly agree with.
necovek 1 hours ago [-]
I don't think you read the post carefully enough: they were not running a distributed monolith, and every service was using different dependencies (versions of them).
This meant that it was costly to maintain and caused a lot of confusion, especially with internal dependencies (shared libraries): this is the trade-off they did not like and wanted to move away from.
They moved away from this in multiple steps, first one of those being making it a "distributed monolith" (as per your implied definition) by putting services in a monorepo and then making them use the same dependency versions (before finally making them a single service too).
petersellers 21 minutes ago [-]
I think the blog post is confusing in this regard. For example, it explicitly states:
> We no longer had to deploy 140+ services for a change to one of the shared libraries.
Taken in isolation, that is a strong indicator that they were indeed running a distributed monolith.
However, the blog post earlier on said that different microservices were using different versions of the library. If that was actually true, then they would never have to deploy all 140+ of their services in response to a single change in their shared library.
tstrimple 31 minutes ago [-]
If a change requires cascading changes in almost every other service then yes, you're running a distributed monolith and have achieved zero separation of services. Doesn't matter if each "service" has a different stack if they are so tightly coupled that a change in one necessitates a change in all. This is literally the entire point of micro-services. To reduce the amount of communication and coordination needed among teams. When your team releases "micro-services" which break everything else, it's a failure and hint of a distributed monolith pretending to be micro-services.
wowohwow 4 hours ago [-]
FWIW, I think it was a great write up. It's clear to me what the rationale was and had good justification. Based on the people responding to all of my comments, it is clear people didn't actually read it and are opining without appropriate context.
mjr00 5 hours ago [-]
> There is no one size fits all.
Totally agree. For what it's worth, based on the limited information in the article, I actually do think it was the right decision to pull all of the per-destination services back into one. The shared library problem can go both ways, after all: maybe the solution is to remove the library so your microservices are fully independent, or maybe they really should have never been independent in the first place and the solution is to put them back together.
I don't think either extreme of "every line of code in the company is deployed as one service" or "every function is an independent FaaS" really works in practice, it's all about finding the right balance, which is domain-specific every time.
mekoka 6 hours ago [-]
You're both right, but talking past each other. You're right that shared dependencies create a problem, but it can be the problem without semantically redefining the services themselves as a shared monolith. Imagine someone came to you with a similar problem and you concluded "distributed monolith", which may lead them to believe that their services should be merged into a single monolith. What if they then told you that it's going to be tough because these were truly separate apps, but that used the same OS wide Python install, one ran on Django/Postgres, another on Flask/SQLite, and another was on Fastapi/Mongo, but they all relied on some of the same underlying libs that are frequently updated. The more accurate finger should point to bad dependency management and you'd tell them about virtualenv or docker.
wowohwow 8 hours ago [-]
I disagree. Both can be true at the same time. A good design should not point to library-latest in a production setting, it should point to a stable known good version via direct reference, i.e library-1.0.0-stable.
However, the world we live in, people choose pointing to latest, to avoid manual work and trust other teams did the right diligence when updating to the latest version.
You can point to a stable version in the model I described and still be distributed and a micro service, while depending on a shared service or repository.
vlovich123 8 hours ago [-]
You can do that but you keep missing that you’re no longer a true microservice as originally defined and envisioned, which is that you can deploy the service independently under local control.
Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
OP is correct that you are indeed now in a weird hybrid monolith application where it’s deployed piecemeal but can’t really be deployed that way because of tightly coupled dependencies.
Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
GeneralMayhem 7 hours ago [-]
Internal and external have wildly different requirements. Google internally can't update a library unless the update is either backward-compatible for all current users or part of the same change that updates all those users, and that's enforced by the build/test harness. That was an explicit choice, and I think an excellent one, for that scenario: it's more important to be certain that you're done when you move forward, so that it's obvious when a feature no longer needs support, than it is to enable moving faster in "isolation" when you all work for the same company anyway.
But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.
vlovich123 6 hours ago [-]
The whole premise of microservices is loose coupling - external just makes it plainly obvious that it’s a non starter. If you’re not loosely coupling you can call it microservices but it’s not really.
Yes I understand it’s a shared library but if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong - that library should be published as a v2 or dependents should pin to a specific version. But having a shared library that has backward incompatible changes that is automatically vendored into all downstream dependencies is insane. You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service and the version that was published in the registry.
GeneralMayhem 5 hours ago [-]
> if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong that library should be published as a v2 or dependents should pin to a specific version
...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time. If it's because you can't be certain from testing that it's actually a safe change, then fine, but note that that option is still available to you by copy/pasting to a v2/ or adding a feature flag. Going to a monorepo gives you strictly more options in how to deal with changes.
> You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service
This is true regardless of deployment pattern. The artifact that you publish needs to have pointers back to all changes that went into it/what commit it was built at. Mono vs. multi-repo doesn't materially change that, although I would argue it's slightly easier with a monorepo since you can look at the single history of the repository, rather than having to go an extra hop to find out what version 1.0.837 of your dependency included.
> the version that was published in the registry
Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history. If a binary is built at commit X, then all commits before X across all dependencies are included. That's kind of the point.
vlovich123 2 hours ago [-]
> ...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time.
I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling. If you have multiple teams working on a tightly coupled system you’re asking for trouble. This is why software projects inevitably decompose against team boundaries and you ship your org chart - communication and complexity is really hard to manage as the head count grows which is where loose coupling helps.
But this article isn’t about moving from federated codebases to a single monorepo as you propose. They used that as an intermediary step to then enable making it a single service. But the point is that making a single giant service is well studied and a problem. Had this constantly at Apple when I worked on CoreLocation where locationd was a single service that was responsible for so many things (GPS, time synchronization of Apple Watches, WiFi location, motion, etc) that there was an entire team managing the process of getting everything to work correctly within a single service and even still people constantly stepped on each other’s toes accidentally and caused builds that were not suitable. It was a mess and the team that should have identified it as a bottleneck in need of solving (ie splitting out separate loosely coupled services) instead just kept rearranging deck chairs.
> Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history
I’m not opposed to a monorepo which I think may be where your confusion is coming from. I’m suggesting slamming a bunch of microservices back together is a poorly thought out idea because you’ll still end up with a launch coordination bottleneck and rolling back 1 team’s work forces other teams to roll back as well. It’s great the person in charge got to write a ra ra blog post for their promo packet. Come talk to me in 3 years with actual on the ground engineers saying they are having no difficulty shipping a large tightly coupled monolithic service or that they haven’t had to build out a team to help architect a service where all the different teams can safely and correctly coexist. My point about the registry is that they took one problem - a shared library multiple services depend on through a registry depend on latest causing problems deploying - and nuked it from orbit using a monorepo (ok - this is fine and a good solution - I can be a fan of monorepos provided your infrastructure can make it work) and making a monolithic service (probably not a good idea that only sounds good when you’re looking for things to do).
necovek 1 hours ago [-]
> I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling.
But it is not! They were updating dependencies and deploying services separately, and this led to every one of 140 services using a different version of "shared-foo". This made it cumbersome, confusing and expensive to keep going (you want a new feature from shared-foo, you have to take all the other features unless you fork and cherrypick on top, which makes it a not shared-foo anymore).
The point is that true microservice approach will always lead to exactly this situation: a) you either do not extract shared functions and live with duplicate implementations, b) you enforce keeping your shared dependencies always on very-close-to-latest (which you can do with different strategies; monorepo is one that enables but does not require it) or c) you end up with a mess of versions being used by each individual service.
The most common middle ground is to insist on backwards compatibility in a shared-lib, but carrying that over 5+ years is... expensive. You can mix it with an "enforce update" approach ("no version older than 2 years can be used"), but all the problems are pretty evident and expected with any approach.
I'd always err on the side of having a capability to upgrade at once if needed, while keeping the ability to keep a single service on a pinned version. This is usually not too hard with any approach, though monorepo makes the first one appear easier (you edit one file, or multiple dep files in a single repo): but unless you can guarantee all services get replaced in a deployment at exactly the same moment — which you rarely can — or can accept short lived inconsistencies, deployment requires all services to be backwards compatible until they are all updated with either approach).
I'd also say that this is still not a move to a monolith, but to a Service-Oriented-Architecture that is not microservices (as microservices are also SOA): as usual, the middle ground is the sweet spot.
wowohwow 7 hours ago [-]
To reference my other comment. This thread is about the nuance of if a dependency on a shared software repository means you are a microservice or not. I'm saying it's immaterial to the definition.
A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
smaudet 6 hours ago [-]
> Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
Some of their "solutions" I kind of wonder how they plan on resolving this, like the black box "magic" queue service they subbed back in, or the fault tolerance problem.
That said, I do think if you have a monolith that just needs to scale (single service that has to send to many places), they are possibly taking the correct approach. You can design your code/architecture so that you can deploy "services" separately, in a fault tolerant manner, but out of a mono repo instead of many independent repos.
vlovich123 2 hours ago [-]
Why issue isn’t with the monorepo but slamming all the microservices into a single monolithic service (the last part of the blog post).
dmoy 7 hours ago [-]
> Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
Internal Google services: *sweating profusely*
(Mostly in jest, it's obviously a different ballgame internal to the monorepo on borg)
deaddodo 6 hours ago [-]
The dependencies they're likely referring to aren't core libraries, they're shared interfaces. If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not). Generally for larger systems, but smaller/scrappier teams, a true dependency management tree for something like this is out of scope so they just redeploy everything in a domain.
mjr00 5 hours ago [-]
> If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not).
This is not true! This is one of the core strengths of protobuf. Non-destructive protobuf changes, such as adding new API methods or new fields, do not require clients to update. On the server-side you do need to handle the case when clients don't send you the new data--plus deal with the annoying "was this int64 actually set to 0 or is it just using the default?" problem--but as a whole you can absolutely independently update a protobuf, implement it on the server, and existing clients can keep on calling and be totally fine.
Now, that doesn't mean you can go crazy, as doing things like deleting fields, changing field numbering or renaming APIs will break clients, but this is just the reality of building distributed systems.
lowbloodsugar 4 hours ago [-]
Oh god no.
I mean I suppose you can make breaking changes to any API in any language, but that’s entirely on you.
lelanthran 1 hours ago [-]
Right, but theres a cost to having to support 12 different versions of a library in your system.
Its a tradeoff
philwelch 2 hours ago [-]
> If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
Show me a language runtime or core library that will never have a CVE. Otherwise, by your definition, microservices don’t exist and all service oriented architectures are distributed monoliths.
3rodents 8 hours ago [-]
Yes, you’re describing a distributed monolith. Microservices are independent, with nothing shared. They define a public interface and that’s it, that’s the entire exposed surface area. You will need to do major version bumps sometimes, when there are backwards incompatible changes to make, but these are rare.
The logical problem you’re running into is exactly why microservices are such a bad idea for most businesses. How many businesses can have entirely independent system components?
Almost all “microservice” systems in production are distributed monoliths. Real microservices are incredibly rare.
A mental model for true microservices is something akin to depending on the APIs of Netflix, Hulu, HBO Max and YouTube. They’ll have their own data models, their own versioning cycles and all that you consume is the public interface.
makeitdouble 2 hours ago [-]
I'm trying to understand what you see as a really independent service with nothing shared.
For instance if company A used one of the GCP logging stack, and company B does the same. GCP updates it's profuct in a way that strongly encourages upgrading within a specific time frame (e.g. price will drastically increase otherwise), so A and B do it mostly at the same time for the same reason.
Are A and B truly independent under your vision ? or are they a company-spanning monolith ?
buttercraft 2 hours ago [-]
> mostly at the same time
Mostly? If you can update A one week and B the next week with no breakage in between, that seems pretty independent.
makeitdouble 1 hours ago [-]
This was also the case for the micro-service situation described in the article. From the FA:
> Over time, the versions of these shared libraries began to diverge across the different destination codebases.
wowohwow 8 hours ago [-]
This type of elitist mentality is such a problem and such a drain for software development. "Real micro services are incredibly rare". I'll repeat myself from my other post, by this level of logic nothing is a micro service.
Do you depend on a cloud provider? Not a microservice. Do you depend on an ISP for Internet? Not a microservice. Depend on humans to do something? Not a microservice.
Textbook definitions and reality rarely coincide, rather than taking such a fundamentalist approach that leads nowhere, recognize that for all intents and purposes, what I described is a microservice, not a distributed monolith.
ollysb 8 hours ago [-]
It's fine to have dependencies, the point is two services that need to be deployed at the same time are not independent microservices.
wowohwow 8 hours ago [-]
Yes, the user I'm replying to is suggesting that taking on a dependency of a shared software repository makes the service no longer a microservice.
That is fundamentally incorrect. As presented in my other post you can correctly use the shared repository as a dependency and refer to a stable version vs a dynamic version which is where the problem is presented.
jameshart 7 hours ago [-]
The problem with having a shared library which multiple microservices depend on isn’t on the microservice side.
As long as the microservice owners are free to choose what dependencies to take and when to bump dependency versions, it’s fine - and microservice owners who take dependencies like that know that they are obliged to take security patch releases and need to plan for that. External library dependencies work like that and are absolutely fine for microservices to take.
The problem comes when you have a team in the company that owns a shared library, and where that team needs, in order to get their code into production, to prevail upon the various microservices that consume their code to bump versions and redeploy.
That is the path to a distributed monolith situation and one you want to avoid.
wowohwow 7 hours ago [-]
Yes we are in agreement. A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
7 hours ago [-]
3rodents 7 hours ago [-]
"by this level of logic nothing is a micro service"
Yes, exactly. The point is not elitism. Microservices are a valuable tool for a very specific problem but what most people refer to as "microservices" are not. Language is important when designing systems. Microservices are not just a bunch of separately deployable things.
The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility. The service has a public interface defined in a contract that other components depend on, and that is it, what happens within the service is irrelevant to the rest of the system and vice verse, the service does not have depend on knowledge of the rest of the system. By virtue of being micro in responsibility, it can be deployed anywhere and anyhow.
If it is not a microservice, it is just a service, and when it is just a service, it is probably a part of a distributed monolith. And that is okay, a distributed monolith can be very valuable. The reason many people bristle at the mention of microservices is that they are often seen as an alternative to a monolith but they are not, it is a radically different architecture.
We must be precise in our language because if you or I build a system made up of "microservices" that aren't microservices, we're taking on all of the costs of microservices without any of the benefits. You can choose to drive to work, or take the bus, but you cannot choose to drive because it is the cheapest mode of transport or walk because it is the fastest. The costs and benefits are not independent.
The worst systems I have ever worked on were "microservices" with shared libraries. All of the costs of microservices (every call now involves a network) and none of the benefits (every service is dependent on the others). The architect of that system had read all about how great microservices are and understood it to mean separately deployable components.
There is no hierarchy of goodness, we are just in pursuit of the right tool or the job. A monolith, distributed monolith or a microservice architecture could be the right tool for one problem and the wrong tool for another.
I am talking about using a shared software repository as a dependency. Which is valid for a microservice. Taking said dependency does not turn a microservice into a monoloth.
It may be a build time dependency that you do in isolation in a completely unrelated microservice for the pure purpose of building and compiling your business microservice. It is still a dependency. You cannot avoid dependencies in software or life. As Carl Sagan said, to bake an apple pie from scratch, you must first invent the universe.
>The worst systems I have ever worked on were "microservices" with shared libraries.
Ok? How is this relevant to my point? I am only referring to the manner in which your microservice is referencing said libraries. Not the pros or cons of implementing or using shared libraries (e.g mycompany-specific-utils), common libraries (e.g apache-commons), or any software component for that matter
>Yes, exactly
So you're agreeing that there is no such thing as a microservice. If that's the case, then the term is pointless other than a description of an aspirational yet unattainable state. Which is my point exactly. For the purposes of the exercise described the software is a microservice.
magicalhippo 5 hours ago [-]
> Taking said dependency does not turn a microservice into a monoloth.
True. However one of the core tenets of microservices is that they should be independently deployable[1][2].
If taking on such a shared dependency does not interfere with them being independently deployable then all is good and you still have a set of microservices.
However if that shared dependency couples the services so that if one needs a new version of the shared dependency then all do, well you suddenly those services are no longer microservices but a distributed monolith.
> The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility.
The "micro" in microservice was a marketing term to distinguish it from the bad taste of particular SOA technology implementations in the 2000s. A similar type of activity as crypto being a "year 3000 technology."
The irony is it was the common state that "services" weren't part of a distributed monolith. Services which were too big were still separately deployable. When services became nothing but an HTTP interface over a database entity, that's when things became complicated via orchestration; orchestration previously done by a service... not done to a service.
AndrewKemendo 8 hours ago [-]
And if my grandmother had wheels she would be a bike
There are categories and ontologies are real in the world. If you create one thing and call it something else that doesn’t mean the definition of “something else” should change
By your definition it is impossible to create a state based on coherent specifications because most states don’t align to the specification.
We know for a fact that’s wrong via functional programming, state machines, and formal verification
andrewmutz 9 hours ago [-]
Needing to upgrade a library everywhere isn’t necessarily a sign of inappropriate coupling.
For example, a library with a security vulnerability would need to be upgraded everywhere regardless of how well you’ve designed your system.
In that example the monolith is much easier to work with.
mjr00 8 hours ago [-]
While you're right, I can only think of twice in my career where there was a "code red all services must update now", which were log4shell and spectre/meltdown (which were a bit different anyway). I just don't think this comes up enough in practice to be worth optimizing for.
wowohwow 8 hours ago [-]
You have not been in the field very long than I presume? There's multiple per year that require all hands on deck depending on your tech stack. Just look at the recent NPM supply chain attacks.
mjr00 8 hours ago [-]
You presume very incorrectly to say the least.
The npm supply chain attacks were only an issue if you don't use lock files. In fact they were a great example of why you shouldn't blindly upgrade to the latest packages when they are available.
wowohwow 8 hours ago [-]
Fair enough, which is why I called out my assumption:).
I'm referring to the all hands on deck nature of responding to security issues not the best practice. For many, the NPM issue was an all hands on deck.
stavros 7 hours ago [-]
Wait what? I've been wondering why people have been fussing over supply chain vulnerabilities, but I thought they mostly meant "we don't want to get unlucky and upgrade, merge the PR, test, and build the container before the malicious commit is pushed".
Who doesn't use lockfiles? Aren't they the default everywhere now? I really thought npm uses them by default.
Aeolun 6 hours ago [-]
We use pretty much the entire nodejs ecosystem, and only the very latest Next.js vulnerability was an all hands on deck vulnerability. That’s taken over the past 7 years.
zhivota 6 hours ago [-]
I mean I just participated in a Next JS incident that required it this week.
It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).
Aeolun 6 hours ago [-]
NextJS was just bog standard “we designed an insecure API and now everyone can do RCE” though.
Everyone has been able to exploit that for ages. It only became a problem when it was discovered and publicised.
jameshart 7 hours ago [-]
A library which patches a security vulnerability should do so by bumping a patch version, maintaining backward compatibility. Taking a patch update to a library should mean no changes to your code, just rerun your tests and redeploy.
If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.
VirusNewbie 2 hours ago [-]
This is pedantic, but no, it doesn't need to be updated everywhere. It should be updated as fast as possible, but there isn't a dependency chain there.
mettamage 8 hours ago [-]
Example: log4j. That was an update fiasco everywhere.
smrtinsert 8 hours ago [-]
1 line change and redeploy
jabroni_salad 7 hours ago [-]
Works great if you are the product owner. We ended up having to fire and replace about a dozen 3rd party vendors over this.
reactordev 8 hours ago [-]
I was coming here to say this. That the whole idea of a shared library couples all those services together. Sounds like someone wanted to be clever and then included their cleverness all over the platform. Dooming all services together.
Decoupling is the first part of microservices. Pass messages. Use json. I shouldn’t need your code to function. Just your API. Then you can be clever and scale out and deploy on saturdays if you want to and it doesn’t disturb the rest of us.
xienze 7 hours ago [-]
> Pass messages. Use json. I shouldn’t need your code to function. Just your API.
Yes, but there’s likely a lot of common code related to parsing those messages, interpreting them, calling out to other services etc. shared amongst all of them. That’s to be expected. The question is how that common code is structured if everything has to get updated at once if the common code changes.
reactordev 6 hours ago [-]
Common code that’s part of your standard library, sure. Just parse the json. Do NOT introduce some shared class library that “abstracts” that away. Instead use versioning of schemas like another commenter said. Use protobuf. Use Avro. Use JSON. Use Swagger. Use something other than POCO/POJO shared library that you have to redeploy all your services because you added a Boolean to the newsletter object.
narnarpapadaddy 5 hours ago [-]
So, depending on someone else’s shared library, rather than my own shared library, is the difference between a microservice and not a microservice?
__abc 5 hours ago [-]
This right here. WTF do you do when you need to upgrade your underlying runtime such as Python, Ruby, whatever ¯\_(ツ)_/¯ you gotta go service by service.
reactordev 4 hours ago [-]
If needs be. Or, you upgrade the mission critical ones and leave the rest for when you pick them up again. If your culture is “leave it better than when you found it” this is a non issue.
The best is when you use containers and build against the latest runtimes in your pipelines so as to catch these issues early and always have the most up to date patches. If a service hasn’t been updated or deployed in a long time, you can just run another build and it will pull latest of whatever.
mjr00 4 hours ago [-]
The opposite situation of needing to upgrade your entire company's codebase all at once is much more painful. With services you can upgrade runtimes on an as-needed basis. In monoliths, runtime upgrades were massive projects that required a ton of coordination between teams and months or years of work.
__abc 3 hours ago [-]
Fair point.
c-fe 6 hours ago [-]
one way is by using schemas to communicate between them that are backwards compatible. eg with avro its quite nice
gmueckl 6 hours ago [-]
But the you're outsourcing the same shared code problem to a third party shared library. It fundamentally doesn't go away.
reactordev 4 hours ago [-]
That 3rd party library rarely gets updated whereas Jon’s commit adds a field and now everyone has to update or the marshaling doesn’t work.
Yes, there are scenarios where you have to deploy everything but when dealing with micro services, you should only be deploying the service you are changing. If updating a field in a domain affects everyone else, you have a distributed monolith and your architecture is questionable at best.
The whole point is I can deploy my services without relying on yours, or touching yours, because it sounds like you might not know what you’re doing. That’s the beautiful effect of a good micro service architecture.
mlhpdx 7 hours ago [-]
Agreed. It sounds like they never made it to the distributed architecture they would have benefited from. That said, if the team thrives on a monolithic one they made the right choice.
ChuckMcM 5 hours ago [-]
While I think that's a bit harsh :-) the sentiment of "if you have these problems, perhaps you don't understand systems architecture" is kind of spot on. I have heard people scoff at a bunch of "dead legacy code" in the Windows APIs (as an example) without understanding the challenge of moving millions of machines, each at different places in the evolution timeline, through to the next step in the timeline.
To use an example from the article, there was this statement: "The split to separate repos allowed us to isolate the destination test suites easily. This isolation allowed the development team to move quickly when maintaining destinations."
This is architecture bleed through. The format produced by Twilio "should" be the canonical form, which is submitted the adapter which mangles it to the "destination" form. Great, that transformation is expressible semantically in a language that takes the canonical form and spits out the special form. Changes to the transformation expression should not "bleed through" to other destinations, and changes to the canonical form should be backwards compatible to prevent bleed through of changes in the source from impacting the destination. At all times, if something worked before, it should continue to work without touching it because the architecture boundaries are robust.
Being able to work with a team that understood this was common "in the old days" when people were working on an operating system. The operating system would evolve (new features, new devices, new capabilities) but because there was a moat between the OS and applications, people understood that they had to architect things so that the OS changes would not cause applications that currently worked to stop working.
I don't judge Twilio for not doing robust architecture, I was astonished when I went to work of Google how lazy everyone got when the entire system is under their control (like there are no third party apps running in the fleet). The was a persistent theme of some bright person "deciding" to completely change some interface and Wham! every other group at Google had to stop what they were doing and move their code to the new thing. There was a particularly poor 'mandate' on a new version of their RPC while I was there. As Twilio notes, that can make things untenable.
Aeolun 6 hours ago [-]
Once all the code for the services lived in one repo there was nothing preventing them from deploying the thing 140 times. I’m not sure why they act like that wasn’t an option.
__abc 5 hours ago [-]
So you should re-write your logging code on each and every one of your 140+ services vs. leverage a shared module?
xboxnolifes 5 hours ago [-]
You can keep using an older version for a while. You shouldn't need to redeploy everything at once. If you can't keep using the older version, you did it wrong.
And ideally, your logging library should rarely need to update. If you need unique integrations per service, use a plug-in architecture and keep the plug-ins local to each service.
__abc 5 hours ago [-]
I wasn't taking into account velocity of fleet-wide rollout, as I agree, you can migrate over time however. however, I was focusing on the idea that anytime of fleet wide rollout for a specific change was somehow "bad."
threethirtytwo 7 hours ago [-]
Then every microservice network in existence is a distributed monolith so long as they communicate with one another.
If you communicate with one another you are serializing and deserializing a shared type. That shared type will break at the communication channels if you do not simultaneously deploy the two services. The irony is to prevent this you have to deploy simultaneously and treat it as a distributed monolith.
This is the fundamental problem of micro services. Under a monorepo it is somewhat more mitigated because now you can have type checking and integration tests across multiple repos.
Make no mistake the world isn’t just library dependencies. There are communication dependencies that flow through communication channels. A microservice architecture by definition has all its services depend on each other through this communication channels. The logical outcome of this is virtually identical to a distributed monolith. In fact shared libraries don’t do much damage at all if the versions are off. It is only shared types in the communication channels that break.
There is no way around this unless you have a mechanism for simultaneous merging code and deploying code across different repos which breaks the definition of what it is to be a microservice. Microservices always and I mean always share dependencies with everything they communicate with. All the problems that come from shared libraries are intrinsic to microservices EVEN when you remove shared libraries.
People debate me on this but it’s an invariant.
ricardobeat 7 hours ago [-]
I believe in the original amazon service architecture, that grew into AWS (see “Bezos API mandate” from 2002), backwards compatibility is expected for all service APIs. You treat internal services as if they were external.
That means consumers can keep using old API versions (and their types) with a very long deprecation window. This results in loose coupling. Most companies doing microservices do not operate like this, which leads to these lockstep issues.
threethirtytwo 5 hours ago [-]
Yeah. that's a bad thing right? Maintaining backward compatibility to the end of time in the name of safety.
I'm not saying monoliths are better then microservices.
I'm saying for THIS specific issue, you will not need to even think about API compatibility with monoliths. It's a concept you can throw out the window because type checkers and integration tests catch this FOR YOU automatically and the single deployment insures that the compatibility will never break.
If you choose monoliths you are CHOOSING for this convenience, if you choose microservices you are CHOOSING the possibility for things to break and AWS chose this and chose to introduce a backwards compatibility restriction to deal with this problem.
I use "choose" loosely here. More likely AWS ppl just didn't think about this problem at the time. It's not obvious... or they had other requirements that necessitated microservices... The point is, this problem in essence is a logical consequence of the choice.
mjr00 6 hours ago [-]
> If you communicate with one another you are serializing and deserializing a shared type.
Yes, this is absolutely correct. The objects you send over the wire are part of an API which forms a contract the server implementing the API is expected to provide. If the API changes in a way which is backwards compatible, this will break things.
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility which exists when your code and your database do not match. The strategy might be a simple add new column->migrate code->remove old column, including some thought on how to deal with data added in the interim. It might be to use views. It might be some insane strategy of duplicating the full stack, using change data capture to catch changes and flipping a switch.[0] It doesn't really matter, the point is that even within a monolith, you have two separate services, a database and a backend server, and you cannot deploy them truly simultaneously, so you need to have some strategy for dealing with that; or more generally, you need to be conscious of breaking API changes, in exactly the same way you would with independent services.
> The logical outcome of this is virtually identical to a distributed monolith.
Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
[0] I have seen this done. It was as crazy as it sounds.
threethirtytwo 6 hours ago [-]
>This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
Agreed and this is a negative. Backwards compatibility is a restriction made to deal with something fundamentally broken.
Additionally eventually in any system of services you will have to make a breaking change. Backwards compatibility is a behavioral comping mechanism to deal with a fundamental issue of microservices.
>This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility.
I believe you and am already aware. It's a limitation that exists intrinsically so it exists because you have No choice. A database and a monolith needs to exist as separate services. The thing I'm addressing here is the microservices and monolith debate. If you choose microservices, you are CHOOSING for this additional problem to exist. If you choose monolith, then within that monolith you are CHOOSING for those problems to not exist.
I am saying regardless of the other issues with either architecture, this one is an invariant in the sense that for this specific thing, monolith is categorically better.
>Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
No you're categorically wrong. If they did this in ANY of the companies you worked at then they are Living with this issue. What I'm saying here isn't an opinion. It is a theorem based consequence that will occur IF all the axioms are satisfied: namely >2 services that communicate with each other and ARE not deployed simultaneously. This is logic.
The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
mjr00 5 hours ago [-]
> The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
threethirtytwo 5 hours ago [-]
>IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
No certainly possible. You can evolve linux, macos and windows forever without any breaking changes and keep all apis backward compatible for all time. Keep going forever and ever and ever. But you see there's a huge downside to this right? This downside becomes more and more magnified as time goes on. In the early stages it's fine. And it's not like this growing problem will stop everything in it's tracks. I've seen organizations hobble along forever with increasing tech debt that keeps increasing for decades.
The downside won't kill an organization. I'm just saying there is a way that is better.
>I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I have as well. It doesn't mean it doesn't work and can't be done. For example typescript is better than javascript. But you can still build a huge organization around javascript. What I'm saying here is one is intrinsically better than the other but that doesn't mean you can't build something on technology or architectures that are inferior.
And also I want to say I'm not saying monoliths are better than microservices. I'm saying for this one aspect monoliths are definitively better. There is no tradeoff for this aspect of the debate.
>I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
Didn't a break happen recently? Barring that... There's behavioral ways to mitigate this right? like what you mentioned... backward compatible apis always. But it's better to set up your system such that the problem just doesn't exist Period... rather then to set up ways to deal with the problem.
kccqzy 5 hours ago [-]
> The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate.
This is correct.
> Neither of these scenarios is practical.
This is not. When you choose appropriate tools (protobuf being an example), it is extremely easy to make a non-breaking change to the communication channel, and it is also extremely easy to prevent breaking changes from being made ever.
threethirtytwo 5 hours ago [-]
I don't agree.
Protobuf works best if you have a monorepo. If each of your services lives within it's own repo then upgrades to one repo can be merged onto the main branch that potentially breaks things in other repos. Protobuf cannot check for this.
Second the other safety check protobuf uses is backwards compatibility. But that's a arbitrary restriction right? It's better to not even have to worry about backwards compatability at all then it is to maintain it.
Categorically these problems don't even exist in the monolith world. I'm not taking a side in the monolith vs. microservices debate. All I'm saying is for this aspect monoliths are categorically better.
kccqzy 7 hours ago [-]
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
No. Your shared type is too brittle to be used in microservices. Tools like the venerable protobuf has solved this problem decades ago. You have a foundational wire format that does not change. Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Here’s an analogy. Forget microservices. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with both the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
klabb3 6 hours ago [-]
> Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Also known as the rest of the fucking owl. I am entirely in factual agreement with you, but the number of people who are even aware they maintain an API surface with backwards compatibility as a goal, let alone can actually do it well, are tiny in practice. Especially for internal services, where nobody will even notice violations until it’s urgent, and at such a time, your definitions won’t save you from blame. Maybe it should, though. The best way to stop a bad idea is to follow it rigorously and see where it leads.
I’m very much a skeptic of microservices, because of this added responsibility. Only when the cost of that extra maintenance is outweighed by overwhelming benefits elsewhere, would I consider it. For the same reason I wouldn’t want a toilet with a seatbelt.
threethirtytwo 5 hours ago [-]
False. Protobuf solves nothing.
1. Protobuf requires a monorepo to work correctly. Shared types must be checked across all repos and services simulateneously. Without a monorepo or some crazy work around mechanism this won't work. Think about it. These type checkers need everything at the same version to correctly check everything.
2. Even with a monorepo, deployment is a problem. Unless you do simultaneous deploys if one team upgrades there service and another team doesn't the Shared type is incompatible simply because you used microservices and polyrepos to allow teams to move async instead of insync. It's a race condition in distributed systems and it's theoremtically true. Not solved at all because it can't be solved by logic and math.
Just kidding. It can be solved but you're going to have to change definitions of your axioms aka of what is currently a microservice, monolith, monorepo and polyrepo. If you allow simultaneous deploys or pushes to microservices and polyrepos these problems can be solved but then can you call those things microservices or polyrepos? They look more like monorepos or monoliths... hmmm maybe I'll call it "distributed monolith".... See we are hitting this problem already.
>Here’s an analogy. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
You are just describing the problem I provided. We call "monoliths" monoliths but technically a monolith must interact with a secondary service called a database. We have no choice in the matter. The monolith and microservice of course does not refer to that problem which SUFFERS from all the same problems as microservices.
>This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
No it's not. Not at all. It's a problem that's lived with. I have two modules in a monolith. ANY change that goes into the mainline branch or deploy is type checked and integration tested to provide maximum safety as integration tests and type checkers can check the two modules simultaneously.
Imagine those two modules as microservices. Because they can be deployed at any time asynchronously, because they can be merged to the mainline branch at any time asynchronously They cannot be type checked or integration tested. Why? If I upgrade A which requires an upgrade to B but B is not upgraded yet, How do I type check both A and B at the same time? Axiomatically impossible. Nothing is solved. Just behavioral coping mechanisms to deal with the issue. That's the key phrase: behavioral coping mechanisms as opposed to automated statically checked safety based off of mathematical proof. Most of the arguments from your side will be consisting of this: "behavioral coping mechanisms"
wowohwow 7 hours ago [-]
Bingo. Couldn't agree more. The other posters in this comment chain seem to view things from a dogmatic approach vs a pragmatic approach. It's important to do both, but individuals should call out when they are discussing something that is practiced vs preached.
threethirtytwo 7 hours ago [-]
Agreed. What I’m describing here isn’t solely pragmatic it’s axiomatic as well. If you model this as a distributed system with graph all microservices by definition will always reach a state where the apis are broken.
Most microservice companies either live with the fact or they have round about ways to deal with it including simultaneous deploys across multiple services and simultaneous merging, CI and type checking across different repos.
necovek 1 hours ago [-]
Imagine your services were built on react-server-* components or used Log4J logging.
This is simply dependency hell exploding with microservices.
philwelch 2 hours ago [-]
If there’s any shared library across all your services, even a third party library, if that library has a security patch you now need to update that shared library across your entire service fleet. Maybe you don’t have that; maybe each service is written in a completely different programming language, uses a different database, and reimplements monitoring in a totally different way. In that case you have completely different problems.
8 hours ago [-]
j45 8 hours ago [-]
Monorepos reasonably well designed and fleixble to grow with you can increase development speed quite a bit.
smrtinsert 8 hours ago [-]
100%. It's almost like they jumped into it not understanding what they were signing up for.
echelon 5 hours ago [-]
> If you must to deploy every service because of a library change
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $foo after SHA before 2025-12-14 at 12:00 UTC.
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $bar after SHA before 2025-12-14 at 12:00 UTC.
...
It's never ending. You get a half dozen of these on each on call rotation.
necovek 55 minutes ago [-]
A few thoughts: this is not really a move to a monolith. Their system is still a SOA (service-oriented architecture), just like microservices (make services as small as they can be), but with larger scope.
Having 140 services managed by what sounds like one team reinforces another point that I believe should be well known by now: you use SOAs (incuding microservices) to scale teams, and not services.
Eg. if a single team builds a shared library for all the 140 microservices and needs to maintain them, it's going to become very expensive quickly: you'll be using v2.3.1 in one service and v1.0.8 in another, and you won't even know yourself what API is available. Operationally, yes, you'll have to watch over 140 individual "systems" too.
There are ways to mitigate this, but they have their own trade-offs (I've posted them in another comment).
As per Conway's law, software architecture always follows the organizational structure, and this seems to have happened here: a single team is moving away from unneeded complexity to more effectively manage their work and produce better outcomes for the business.
It is not a monolith, but properly-scoped service level (scoped to the team). This is, in my experience, the sweet spot. A single team can run and operate multiple independent services, but with growth in those services, they will look to unify, so you need to restructure the team if you don't want that to happen. This is why I don't accept "system architect" roles as those don't give you the tools to really drive the architecture how it can be driven, and I really got into "management" :)
_pdp_ 7 hours ago [-]
In my previous company we did everything as a micro service. In the company before that it was serverless on AWS!
In both cases we had to come up with clever solutions to simply get by because communication between services is of a problem. It is difficult (not impossible) to keep all the contracts in sync and deployment has to be coordinated in a very specific way sometimes. The initial speed you get is soon lost further down the path due to added complexities. There was fear-driven development at play. Service ownership is a problem. Far too much meetings are spent on coordination.
In my latest company everything is part of the same monolith. Yes the code is huge but it is so much easier to work with. We use a lot more unit tests then integration tests. Types make sense. Refactoring is just so easy. All the troubleshooting tools including specialised AI agents built on top of our own platform are part of the code-base which is kind of interesting because I can see how this is turning into a self-improving system. It is fascinating!
We are not planning to break up the monolith unless we grow so much that is impossible to manage from a single git repository. As far as I can tell this may never happen as it is obvious that much larger projects are perfectly well maintained in the exact same way.
The only downside is that build takes longer but honestly we found ways around that as well in the past and now with further improvements in the toolchains delivered by the awesome open-source communities around the world, I expect to see at least 10x improvement in deployment time in 2026.
Overall, in my own assessment, the decision to go for a monolith allowed us to build and scale much faster than if we had used micro services.
I hope this helps.
MrDarcy 9 hours ago [-]
Reading it with hindsight, their problems have less to do with the technical trade off of micro or monolith services and much more to do with the quality and organizational structure of their engineering department. The decisions and reasons given shine a light on the quality. The repository and test layout shine a light on the structure.
Given the quality and the structure neither approach really matters much. The root problems are elsewhere.
CharlieDigital 8 hours ago [-]
My observation is that many teams lack strong "technical discipline"; someone that says "no, don't do that", makes the case, and takes a stand. It's easy to let the complexity genie out of the bottle if the team doesn't have someone like this with enough clout/authority to actually make the team pause.
Aeolun 6 hours ago [-]
I think the problem is that this microservices vs monolith decision is a really hard one to convince people of. I made a passionate case for ECS instead of lambda for a long time, but only after the rest of the team and leadership see the problems the popular strategy generates do we get something approaching uptake (and the balance has already shifted to kubernetes instead, which is at least better)
CharlieDigital 5 hours ago [-]
> I made a passionate case...
My experience is that it is less about passion and more about reason.
It has a lot going for it: 1) it's from Google, 2) it's easy to read and digest, 3) it makes a really clear case for monoliths.
Otek 7 hours ago [-]
I 100% agree with you but also sad fact is that it’s easy to understand why people don’t want to take this role. You can make enemies easily, you need to deliver “bad news” and convince people to put more effort or prove that effort they did was not enough. Why bother when you probably won’t be the one that have to clean it up
CharlieDigital 5 hours ago [-]
> You can make enemies easily...
Short term, definitely. In the long tail? If you are right more than you are wrong, then that manifests as respect.
AlwaysRock 1 hours ago [-]
Ha! I wish I worked at the places you have worked!
panny 7 hours ago [-]
>the quality and organizational structure of their engineering department
You're not kidding. I had to work with twilio on a project and it was awful. Any time there was an issue with the API, they'd never delve into why that issue had happened. They'd simply fix the data in their database and close the ticket. We'd have the same issue over and over and over again and they'd never make any effort to fix the cause of the problems.
monkaiju 7 hours ago [-]
Conway's Law shines again!
It's amazing how much explanatory power it has, to the point that I can predict at least some traits about a company's codebase during an interview process, without directly asking them about it.
machomaster 5 hours ago [-]
In this case, the more applicable are:
1. "Peter principle": "people in a hierarchy and organizations tend to rise to 'a level of respective incompetence' "
2. "Parkinson's law": "Work expands to fill the available time".
So people are filling all the available the time and working tirelessly to reach their personal and organizational levels of incompetency; working hard without stopping to think if what they are doing should be done at all. And nobody is stopping them, nobody asks why (with the real analysis of positives, negatives, risks).
Incompetent + driven is the worst combination there can be.
cindyllm 4 hours ago [-]
[dead]
rtpg 8 hours ago [-]
I am _not_ a microservices guy (like... at all) but reading this the "monorepo"/"microservices" false dichotomy stands out to me.
I think way too much tooling assumes 1:1 pairings between services and repos (_especially_ CI work). In huge orgs Git/whatever VCS you're using would have problems with everything in one repo, but I do think that there's loads of value in having everything in one spot even if it's all deployed more or less independently.
But so many settings and workflows couple repos together so it's hard to even have a frontend and backend in the same place if both teams manage those differently. So you end up having to mess around with N repos and can't send the one cross-cutting pull request very easily.
I would very much like to see improvements on this front, where one repo could still be split up on the forge side (or the CI side) in interesting ways, so review friction and local dev work friction can go down.
(shorter: github and friends should let me point to a folder and say that this is a different thing, without me having to interact with git submodules. I think this is easier than it used to be _but_)
GeneralMayhem 7 hours ago [-]
I worked on building this at $PREV_EMPLOYER. We used a single repo for many services, so that you could run tests on all affected binaries/downstream libraries when a library changed.
We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.
physicles 4 hours ago [-]
We’re in the middle of this right now. Go makes this easier: there’s a go CLI command that you can use to list a package’s dependencies, which can be cross-referenced with recent git changes. (duplicating the dependency graph in another build tool is a non-starter for me) But there are corner cases that we’re currently working through.
This, and if you want build + deploy that’s faster than doing it manually from your dev machine, you pay $$$ for either something like Depot, or a beefy VM to host CI.
A bit more work on those dependency corner cases, along with an auto-sleeping VM, should let us achieve nirvana. But it’s not like we have a lot of spare time on our small team.
* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.
> a beefy VM to host CI
Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.
carlm42 7 hours ago [-]
You're pointing out exactly what bothered me with this post in the first place: "we moved from microservices to a monolith and our problems went away"...
... except the problems had not much to do with the service architecture but all to do with operational mistakes and insufficient tooling: bad CI, bad autoscaling, bad oncall.
maxdo 8 hours ago [-]
Both approaches can fail. Especially in environments like Node.js or Python, there's a clear limit to how much code an event loop can handle before performance seriously degrades.
I managed a product where a team of 6–8 people handles 200+ microservices. I've also managed other teams at the same time on another product where 80+ people managed a monolith.
What i learned? Both approaches have pros and cons.
With microservices, it's much easier to push isolated changes with just one or two people. At the same time, global changes become significantly harder.
That's the trade-off, and your mental model needs to align with your business logic. If your software solves a tightly connected business problem, microservices probably aren't the right fit.
On the other hand, if you have a multitude of integrations with different lifecycles but a stable internal protocol, microservices can be a lifesaver.
If someone tries to tell you one approach is universally better, they're being dogmatic/religious rather than rational.
Ultimately, it's not about architecture, it's about how you build abstractions and approach testing and decoupling.
rozap 8 hours ago [-]
To me this rationalization has always felt like duct tape over the real problem, which is that the runtime is poorly suited to what people are trying to do.
These problems are effectively solved on beam, the jvm, rust, go, etc.
strken 8 hours ago [-]
Can you explain a bit more about what you mean by a limit on how much code an event loop can handle? What's the limit, numerically, and which units does it use? Are you running out of CPU cache?
joker666 8 hours ago [-]
I assume he means, how much work you let the event loop do without yielding. It doesn't matter if there's 200K lines of code but no real traffic to keep the event loop busy.
BoorishBears 1 hours ago [-]
Most people don't realize their applications are running like dogwater on Node because serverless is letting them smooth it over by paying 4x what they would be paying if they moved 10 or so lines of code and a few regexes to a web worker.
(and I say that as someone who caught themselves doing the same:
severless is really good at hiding this.)
wg0 5 hours ago [-]
Thanks. It was a stupid most idea for MOST shops. I think maybe it works for AWS, Google and Netflix but everywhere in my career, I saw 90% of the problem was due to microservices.
Diving system into composable parts is a very very difficult problem already and it is only foolish to introduce further network boundaries between them.
Next comeback I see is away from React and SPAs as view transitions become more common.
waterproof 2 hours ago [-]
The only rationale given for the initial switch to microservices is this:
> Initially, when the destinations were divided into separate services, all of the code lived in one repo. A huge point of frustration was that a single broken test caused tests to fail across all destinations.
You kept breaking tests in main so you thought the solution was to revamp your entire codebase structure? Seems a bit backward.
wiradikusuma 2 hours ago [-]
> With everything running in a monolith, if a bug is introduced in one destination that causes the service to crash, the service will crash for all destinations
We can have a service with 100 features, but only enable the features relevant to a given "purpose". That way, we can still have "micro services" but they're running the same code: "bla.exe -foo" and "bla.exe -bar".
sriku 3 hours ago [-]
In a discussion I was in recently, a participant mentioned "culture eats strategy for breakfast" .. which perhaps makes sense in this context. Be bold enough to do what makes the team and the product thrive.
develatio 9 hours ago [-]
can you add [2018] to the title, please?
andrewmuia 5 hours ago [-]
No kidding, not cool to be rehashing an article that is 7 years old. In tech terms, that is antiquity.
pmbanugo 8 hours ago [-]
have they reverted to microservices?
Towaway69 8 hours ago [-]
Mono services in a micro repository. /s
tonymet 1 hours ago [-]
Your “microservice” is just a clumsy & slow symbol lookup over the network, at 1000x the cpu and 10000x the latency.
honkycat 15 minutes ago [-]
I don't care how it is done just dont rely on your database schema for data modeling and business logic
akoumjian 3 hours ago [-]
I humbly post this little widget to help your team decide if some functionality warrants being a separate service or not: https://mulch.dev/service-scorecard/
nyrikki 9 hours ago [-]
> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
This is the problem with the undefined nature of the term `microservices`, In my experience if you can't develop in a way that allows you to deploy all services independently and without coordination between services, it may not be a good fit for your orgs needs.
In the parent SOA(v2), what they described is a well known anti-pattern: [0]
Application Silos to SOA Silos
* Doing SOA right is not just about technology. It also requires optimal cross-team communications.
Web Service Sprawl
* Create services only where and when they are needed. Target areas of greatest ROI, and avoid the service sprawl headache.
If you cannot, due to technical or political reasons, retain the ability to independently deploy a service, no matter if you choose to actually independently deploy, you will not gain most of the advantages that were the original selling point of microservices, which had to do more with organizational scaling than technical conserns.
There are other reasons to consider the pattern, especially due to the tooling available, but it is simply not a silver bullet.
And yes, I get that not everyone is going to accept Chris Richardson's definitions[1], but even in more modern versions of this, people always seem to run into the most problems because they try to shove it in a place where the pattern isn't appropriate, or isn't possible.
But kudos to Twilio for doing what every team should be, reassessing if their previous decisions were still valid and moving forward with new choices when they aren't.
I would caution that microservices should be architected with technical concerns first—-being able to deploy independently is a valid technical concern too.
Doing it for organizational scaling can lead to insular vision with turf defensive attitude, as teams are rewarded on the individual service’s performance and not the complete product’s performance. Also refactoring services now means organizational refactoring, so the friction to refactor is massively increased.
I agree that patterns should be used where most appropriate, instead of blindly.
What pains me is that a language like “Cloud-Native” has been usurped to mean microservices. Did Twilio just stop having a “Cloud-Native” product due to shipping a monolith? According to CNCF, yes. According to reason, no.
bob1029 5 hours ago [-]
Monolith is definitely what you want to start with.
Being able to ~instantly obtain a perfect list of all references to all symbols is an extraordinarily powerful capability. The stronger the type system, the more leverage you get. If you have only ever had experience with weak type systems or poor tooling, I could understand how the notion of putting everything into one compilation context seems pointless.
mlhpdx 7 hours ago [-]
Wow. Their experience could not be more different than mine. As I’m contemplating the first year of my startup I’ve tallied 6000 deployments and 99.997 percent uptime and a low single digit rollback percentage (MTTR in low single digit minutes and fractional, single cell impact for them so far). While I’m sure it’s possible for a solo entrepreneur to hit numbers like that with a monolith I have never done so, and haven’t see others do so.
Edit: I’d love to eat the humble pie here. If you have examples of places where monoliths are updated 10-20 times a day by a small (or large) team post the link. I’ll read them all.
AlotOfReading 4 hours ago [-]
The idea of deploying to production 10-20 times per day sounds terrifying. What's the rationale for doing so?
I'll assume you're not writing enough bugs that customers are reporting 10-20 new ones per day, but that leaves me confused why you would want to expose customers to that much churn. If we assume an observable issue results in a rollback and you're only rolling back 1-2% of the time (very impressive), once a month or so customers should experience observable issues across multiple subsequent days. That would turn me off making a service integral to my workflow.
et1337 3 hours ago [-]
If something is difficult or scary, do it more often. Smaller changes are less risky. Code that is merged but not deployed is essentially “inventory” in the factory metaphor. You want to keep inventory low. If the distance between the main branch and production is kept low, then you can always feel pretty confident that the main branch is in a good state, or at least close to one. That’s invaluable when you inevitably need to ship an emergency fix. You can just commit the fix to main instead of trying to find a known good version and patching it. And when a deployment does break something, you’ll have a much smaller diff to search for the problem.
AlotOfReading 2 hours ago [-]
There's a lot of middle ground between "deploy to production 20x a day" and "deploy so infrequently that you forget how to deploy". Like, once a day? I have nothing against emergency fixes, unless you're doing them 9-19x a day. Hotfixes should be uncommon (neither rare nor standard practice).
chmod775 9 hours ago [-]
In practice most monoliths turned into "microservices" are just monoliths in disguise. They still have most of the failure modes of the original monolith, but now with all the complexity and considerable challenges of distributed computing layered on top.
Microservices as a goal is mostly touted by people who don't know what the heck they're doing - the kind of people who tend to mistakenly believe blind adherence to one philosophy or the other will help them turn their shoddy work into something passable.
Engineer something that makes sense. If, once you're done, whatever you've built fits the description of "monolith" or "microservices", that's fine.
However if you're just following some cult hoping it works out for your particular use-case, it's time to reevaluate whether you've chosen the right profession.
Nextgrid 8 hours ago [-]
Microservices were a fad during a period where complexity and solving self-inflicted problems were rewarded more than building an actual sustainable business. It was purely a career- & resume-polishing move for everyone involved.
Putting this anywhere near "engineering" is an insult to even the shoddiest, OceanGate-levels of engineering.
abernard1 6 hours ago [-]
I remember when microservices were introduced and they were solving real problems around 1) independent technological decisions with languages, data stores, and scaling, and 2) separating team development processes. They came out of Amazon, eBay, Google and a host of successful tech titans that were definitely doing "engineering." The Bezos mandate for APIs in 2002 was the beginning of that era.
It was when the "microservices considered harmful" articles started popping up that microservices had become a fad. Most of the HN early-startup energy will continue to do monoliths because of team communication reasons. And I predict that if any of those startups are successful, they will have need for separate services for engineering reasons. If anything, the historical faddishness of HN shows that hackers pick the new and novel because that's who they are, for better or worse.
I don't think this blog post reflects so well on this engineering team. Kudos to them to be so transparent about it though. "We had so many flaky tests that depended on 3rd parties that broke pipelines that we decided on micro-services" is not something I would put on my CV at least.
electromech 6 hours ago [-]
That seems unfair. There's a lot we don't know about the politics behind the scenes. I'd bet that the individuals who created the microservice architecture aren't the same people who re-consolidated them into one service. If true, the authors of the article are being generous to the original creators of the microservices, which I think reflects well on them for not badmouthing their predecessors.
ShakataGaNai 8 hours ago [-]
Too much of anything sucks. Too big of a monolith? Sucks. Too many microservices? Sucks. Getting the right balance is HARD.
Plus, it's ALWAYS easier/better to run v2 of something when you completely re-write v1 from scratch. The article could have just as easily been "Why Segment moved from 100 microservices to 5" or "Why Segment rewrote every microservice". The benefits of hindsight and real-world data shouldn't be undersold.
At the end of the day, write something, get it out there. Make decisions, accept some of them will be wrong. Be willing to correct for those mistakes or at least accept they will be a pain for a while.
In short: No matter what you do the first time around... it's wrong.
TZubiri 1 hours ago [-]
This is not the first time that an engineer working at a big company thinks they are using a monolith when in reality they are a small team in charge of a single microservice, which in turn is part of a company that definitely does not run a monolith.
Last time it was an aws engineer that worked on route 53, and they dismissed microservices in a startup claiming that in AWS they ran a monolith (as in the r53 dns).
Everything is a monolith if you zoom in enough and ignore everything else. Which I guess you can do when you work on a big company and are in charge of a very specific role.
TL;DR they have a highly partitioned job database, where a job is a delivery of a specific event to a specific destination, and each partition is acted upon by at-most-one worker at a time, so lock contention is only at the infrastructure level.
In that context, each worker can handle a similar balanced workload between destinations, with a fraction of production traffic, so a monorepo makes all the sense in the world.
IMO it speaks to the way in which microservices can be a way to enforce good boundaries between teams... but the drawbacks are significant, and a cross-team review process for API changes and extensions can be equally effective and enable simplified architectures that sidestep many distributed-system problems at scale.
abernard1 6 hours ago [-]
They also failed as a company, which is why that's on Twilio's blog now. So there's that. Undoubtedly their microservices architecture was a bad fit because of how technically focused the product was. But their solution with a monolith didn't have the desired effect either.
btown 5 hours ago [-]
Failed? It was a $3.2B acquisition with a total of 283M raised. I don’t see any way that’s a failure.
That said I’m curious if you’re basing this on service degradation you’ve seen since the acquisition. We were thinking of starting to use them - is that a bad move?
abernard1 5 hours ago [-]
By all means use Segment. Segment was a great technology with an incredible technical vision for what they wanted to do. I was in conversations in that office on Market far beyond what they ended up doing post-acquisition.
But a company that can't stand on its own isn't a success in my opinion. Similar things can be said about companies that continue to need round after round of funding without an IPO.
My comment is of the "(2018)" variety. Old news that didn't age well like the people jumping on the "Uber: why we switched to MySQL from Postgres" post. (How many people would choose that decision today?)
People tend to divorce the actual results of a lot of these companies from the gripes of the developers of the tech blogs.
sethammons 5 hours ago [-]
I left Twilio in 2018. I spent a decade at SendGrid. I spent a small time in Segment.
The shitty arch is not a point against (micro)services. SendGrid, another Twilio property, uses (micro)services to great effect. Services there were fully independently deployable.
mikert89 8 hours ago [-]
"Microservices is the software industry’s most successful confidence scam. It convinces small teams that they are “thinking big” while systematically destroying their ability to move at all. It flatters ambition by weaponizing insecurity: if you’re not running a constellation of services, are you even a real company? Never mind that this architecture was invented to cope with organizational dysfunction at planetary scale. Now it’s being prescribed to teams that still share a Slack channel and a lunch table.
Small teams run on shared context. That is their superpower. Everyone can reason end-to-end. Everyone can change anything. Microservices vaporize that advantage on contact. They replace shared understanding with distributed ignorance. No one owns the whole anymore. Everyone owns a shard. The system becomes something that merely happens to the team, rather than something the team actively understands. This isn’t sophistication. It’s abdication.
Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement. You don’t “deploy” anymore—you synchronize a fleet. One bug now requires a multi-service autopsy. A feature release becomes a coordination exercise across artificial borders you invented for no reason. You didn’t simplify your system. You shattered it and called the debris “architecture.”
Microservices also lock incompetence in amber. You are forced to define APIs before you understand your own business. Guesses become contracts. Bad ideas become permanent dependencies. Every early mistake metastasizes through the network. In a monolith, wrong thinking is corrected with a refactor. In microservices, wrong thinking becomes infrastructure. You don’t just regret it—you host it, version it, and monitor it.
The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore. What doesn’t scale is chaos. What doesn’t scale is process cosplay. What doesn’t scale is pretending you’re Netflix while shipping a glorified CRUD app. Monoliths scale just fine when teams have discipline, tests, and restraint. But restraint isn’t fashionable, and boring doesn’t make conference talks.
Microservices for small teams is not a technical mistake—it is a philosophical failure. It announces, loudly, that the team does not trust itself to understand its own system. It replaces accountability with protocol and momentum with middleware. You don’t get “future proofing.” You get permanent drag. And by the time you finally earn the scale that might justify this circus, your speed, your clarity, and your product instincts will already be gone."
The whole point of micro-services is to manage dependencies independently across service boundaries, using the API as the contract, not the internal libraries.
Then you can implement a service in Java, Python, Rust, C++, etc, and it doesn't matter.
Coupling your postgres db to your elasticsearch cluster via a hard library dependency impossibly heavy. The same insight applies to your bespoke services.
brightstep 7 hours ago [-]
They have a monolith but struggle with individual subsystem failures bringing down the whole thing. Sounds like they would benefit from Elixir’s isolated, fail-fast architecture.
shoo 7 hours ago [-]
Great writeup. Much of this is more about testing, how package dependencies are expressed and many-repo/singlerepo tradeoffs than "microservices"!
Maintaining and testing a codebase containing many external integrations ("Destinations") was one of the drivers behind the earlier decision to shatter into many repos, to isolate the impact of Destination-specific test suite failures caused because some tests were actually testing integration to external 3rd party services.
One way to think about that situation is in terms of packages, their dependency structure, how those dependencies are expressed (e.g. decoupled via versioned artefact releases, directly coupled via monorepo style source checkout), their rates of change, and the quality of their automated tests suites (high quality meaning the test suite runs really fast, tests only the thing it is meant to test, has low rates of false negatives and false positives, low quality meaning the opposite).
Their initial situation was one that rapidly becomes unworkable: a shared library package undergoing a high rate of change depended on by many Destination packages, each with low quality test suites, where the dependencies were expressed in a directly-coupled way by virtue of everything existing in a single repo.
There's a general principle here: multiple packages in a single repo with directly-coupled dependencies, where those packages have test suites with wildly varying levels of quality, quickly becomes a nightmare to maintain. The packages with low quality test suites that depend upon high quality rapidly changing shared packages generate spurious test failures that need to be triaged and slow down development. Maintainers of packages that depend upon rapidly changing shared package but do not have high quality test suites able to detect regressions may find their package frequently gets broken without anyone realising in time.
Their initial move solves this problem by shattering the single repo and trade directly-coupled dependencies with decoupled versioned dependencies, to decouple the rate of change of the shared package from the per Destination packages. That was an incremental improvement but added the complexity and overhead of maintaining multiple versions of the "shared" library and per-repo boilerplate, which grows over time as more Destinations are added or more changes are made to the shared library while deferring the work to upgrade and retest Destinations to use it.
Their later move was to reverse this, go back to directly-coupled dependencies, but instead improve the quality of their per-Destination test suites, particularly by introducing record/replay style testing of Destinations. Great move. This means that the test suite of each Destination is measuring "is the Destination package adhering to its contract in how it should integrate with the 3rd party API & integrate with the shared package?" without being conflated with testing stuff that's outside of the control of code in the repo (is the 3rd party service even up, etc).
blatherard 8 hours ago [-]
(2018)
readthenotes1 9 hours ago [-]
Some this sounds like the journey to ejb's and back.
AndrewKemendo 9 hours ago [-]
> Microservices is a service-oriented software architecture in which server-side applications are constructed by combining many single-purpose, low-footprint network services.
Gonna stop you right there.
Microservices have nothing to do with the hosting or operating architecture.
Martin Fowler who formalized the term, Microservices are:
“In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery”
You can have an entirely local application built on the “microservice architectural style.”
Saying they are “often HTTP and API” is besides the point.
The problem Twilio actually describe is that they messed up service granularity and distributed systems engineering processes
Twilio's experience was not a failure of the microservice architectural style. This was a failure to correctly define service boundaries based on business capabilities.
Their struggles with serialization, network hops, and complex queueing were symptoms of building a distributed monolith, which they finally made explicit with this move. So they accidentally built a system with the overhead of distribution but the tight coupling of a single application. Now they are making their foundations of architecture fit what they built, likely cause they poorly planned it.
The true lesson is that correctly applying microservices requires insanely hard domain modeling and iteration and meticulous attention to the "Distributed Systems Premium."
Just because he says something does not mean Fowler “formalized the term”. Martin wrote about every topic under the sun, and he loved renaming and or redefining things to fit his world view, and incidentally drive people not just to his blog but also to his consultancy, Thoughtworks.
PS The “single application” line shows how dated Fowlers view were then and certainly are today.
abernard1 6 hours ago [-]
I've been developing under that understanding since before Fowler-said-so. His take is simply a description of a phenomenon predating the moniker of microservices. SOA with things like CORBA, WSDL, UDDI, Java services in app servers etc. was a take on service oriented architectures that had many problems.
Anyone who has ever developed in a Java codebase with "Service" and then "ServiceImpl"s everywhere can see the lineage of that model. Services were supposed to be the API, and the implementation provided in a separate process container. Microservices signalled a time where SOA without Java as a pre-requisite had been successful in large tech companies. They had reached the point of needing even more granular breakout and a reduction of reliance on Java. HTTP interfaces was an enabler of that. 2010s era microservices people never understood the basics, and many don't even know what they're criticizing.
AndrewKemendo 6 hours ago [-]
Thank you this is the point
9 hours ago [-]
yieldcrv 9 hours ago [-]
I feel like microservices have gotten a lot easier over the last 7 years from when Twilio experienced this, not just from my experience but from refinements in architectures
There are infinite permutations in architecture and we've collectively narrowed them down to things that are cheap to deploy, automatically scale for low costs, and easily replicable with a simple script
We should be talking about how AI knows those scripts too and can synthetize adjustments, dedicated Site Reliability Engineers and DevOps is great for maintaining convoluted legacy setups, but irrelevant for doing the same thing from scratch nowadays
eYrKEC2 9 hours ago [-]
You know what I think is better than a push of the CPU stack pointer and a jump to a library?
A network call. Because nothing could be better for your code than putting the INTERNET into the middle of your application.
--
The "micro" of microservices has always been ridiculous.
If it can run on one machine then do it. Otherwise you have to deal with networking. Only do networking when you have to. Not as a hobby, unless your program really is a hobby.
NeutralCrane 6 hours ago [-]
Microservices have nothing to do with the underlying hosting architecture. Microservices can all run and communicate on a single machine. There will be a local network involved, but it absolutely does require the internet or multiple machines.
yieldcrv 8 hours ago [-]
it's not really "micro" but more so "discreet" as in special purpose, one off. to ensure consistent performance, as opposed to shared performance.
yes, networking is the bottleneck between the processes, while one machine is the bottleneck to end users
Nextgrid 8 hours ago [-]
> one machine is the bottleneck to end users
You can run your monolith on multiple machines and round-robin end-user requests between them. Your state is in the DB anyway.
yieldcrv 7 hours ago [-]
I do bare metal sometimes and I like the advances in virtualization for many processes there too
Well implemented network hardware can have high bandwidth and low latency. But that doesn't get around the complexity and headaches it brings. Even with the best fiber optics, wires can be cut or tripped over. Controllers can fail. Drivers can be buggy. Networks can be misconfigured. And so on. Any request - even sent over a local network - can and will fail on you eventually. And you can't really make a microservice system keep working properly when links start failing.
Local function calls are infinitely more reliable. The main operational downside with a binary monolith is that a bug in one part of the program will crash the whole thing. Honestly, I still think Erlang got it right here with supervisor trees. Use "microservices". But let them all live on the same computer, in the same process. And add tooling to the runtime environment to allow individual "services" to fail or get replaced without taking down the rest of the system.
moltar 9 hours ago [-]
Do you have any recommended reading on the topic of refinements in architectures? Thank you.
0xbadcafebee 7 hours ago [-]
These "we moved from X to Y" posts are like Dunning-Kruger humblebrags. Yes, we all lack information and make mistakes. But there's never an explanation in these posts of how they've determined their new decision is any less erroneous than their old decision. It's like they threw darts at a wall and said "cool, that's our new system design (and SDLC)". If you have not built it yourself before, and have not studied in depth an identical system, just assume you are doing the wrong thing. Otherwise you are running towards another Dunning-Kruger pit.
If you have a company that writes software, please ask a professional software/systems architect to review your plans before you build. The initial decisions here would be a huge red flag to any experienced architect, and the subsequent decisions are full of hidden traps, and are setting them up for more failure. If you don't already have a very skilled architect on staff (99% chance you don't) you need to find one and consult with them. Otherwise your business will suffer from being trapped in unnecessary time-consuming expensive rework, or worse, the whole thing collapsing.
Rendered at 06:31:30 GMT+0000 (Coordinated Universal Time) with Vercel.
If you must to deploy every service because of a library change, you don't have services, you have a distributed monolith. The entire idea of a "shared library" which must be kept updated across your entire service fleet is antithetical to how you need to treat services.
It's likely there's a single source of truth where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
Likewise, if there's a -need- to remove a version for a vulnerability or what have you, then everyone needs to redeploy, sure, but the centralized benefit of this likely outweighs the security cost and complexity of tracking the patching and deployment process for each and every service.
I would say those systems -are- and likely would be classified as micro services but from a cost and ease perspective operate within a shared services environment. I don't think it's fair to consider this style of design decision as a distributed monolith.
By that level of logic, having a singular business entity vs 140 individual business entities for each service would mean it's a distributed monolith.
No, this misses one of the biggest benefits of services; you explicitly don't need everyone to upgrade library-latest to 2.0 at the same time. If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
I was one of the engineers who helped make the decisions around this migration. There is no one size fits all. We believed in that thinking originally, but after observing how things played out, decided to make different trade-offs.
This meant that it was costly to maintain and caused a lot of confusion, especially with internal dependencies (shared libraries): this is the trade-off they did not like and wanted to move away from.
They moved away from this in multiple steps, first one of those being making it a "distributed monolith" (as per your implied definition) by putting services in a monorepo and then making them use the same dependency versions (before finally making them a single service too).
> We no longer had to deploy 140+ services for a change to one of the shared libraries.
Taken in isolation, that is a strong indicator that they were indeed running a distributed monolith.
However, the blog post earlier on said that different microservices were using different versions of the library. If that was actually true, then they would never have to deploy all 140+ of their services in response to a single change in their shared library.
Totally agree. For what it's worth, based on the limited information in the article, I actually do think it was the right decision to pull all of the per-destination services back into one. The shared library problem can go both ways, after all: maybe the solution is to remove the library so your microservices are fully independent, or maybe they really should have never been independent in the first place and the solution is to put them back together.
I don't think either extreme of "every line of code in the company is deployed as one service" or "every function is an independent FaaS" really works in practice, it's all about finding the right balance, which is domain-specific every time.
However, the world we live in, people choose pointing to latest, to avoid manual work and trust other teams did the right diligence when updating to the latest version.
You can point to a stable version in the model I described and still be distributed and a micro service, while depending on a shared service or repository.
Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
OP is correct that you are indeed now in a weird hybrid monolith application where it’s deployed piecemeal but can’t really be deployed that way because of tightly coupled dependencies.
Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.
Yes I understand it’s a shared library but if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong - that library should be published as a v2 or dependents should pin to a specific version. But having a shared library that has backward incompatible changes that is automatically vendored into all downstream dependencies is insane. You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service and the version that was published in the registry.
...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time. If it's because you can't be certain from testing that it's actually a safe change, then fine, but note that that option is still available to you by copy/pasting to a v2/ or adding a feature flag. Going to a monorepo gives you strictly more options in how to deal with changes.
> You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service
This is true regardless of deployment pattern. The artifact that you publish needs to have pointers back to all changes that went into it/what commit it was built at. Mono vs. multi-repo doesn't materially change that, although I would argue it's slightly easier with a monorepo since you can look at the single history of the repository, rather than having to go an extra hop to find out what version 1.0.837 of your dependency included.
> the version that was published in the registry
Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history. If a binary is built at commit X, then all commits before X across all dependencies are included. That's kind of the point.
I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling. If you have multiple teams working on a tightly coupled system you’re asking for trouble. This is why software projects inevitably decompose against team boundaries and you ship your org chart - communication and complexity is really hard to manage as the head count grows which is where loose coupling helps.
But this article isn’t about moving from federated codebases to a single monorepo as you propose. They used that as an intermediary step to then enable making it a single service. But the point is that making a single giant service is well studied and a problem. Had this constantly at Apple when I worked on CoreLocation where locationd was a single service that was responsible for so many things (GPS, time synchronization of Apple Watches, WiFi location, motion, etc) that there was an entire team managing the process of getting everything to work correctly within a single service and even still people constantly stepped on each other’s toes accidentally and caused builds that were not suitable. It was a mess and the team that should have identified it as a bottleneck in need of solving (ie splitting out separate loosely coupled services) instead just kept rearranging deck chairs.
> Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history
I’m not opposed to a monorepo which I think may be where your confusion is coming from. I’m suggesting slamming a bunch of microservices back together is a poorly thought out idea because you’ll still end up with a launch coordination bottleneck and rolling back 1 team’s work forces other teams to roll back as well. It’s great the person in charge got to write a ra ra blog post for their promo packet. Come talk to me in 3 years with actual on the ground engineers saying they are having no difficulty shipping a large tightly coupled monolithic service or that they haven’t had to build out a team to help architect a service where all the different teams can safely and correctly coexist. My point about the registry is that they took one problem - a shared library multiple services depend on through a registry depend on latest causing problems deploying - and nuked it from orbit using a monorepo (ok - this is fine and a good solution - I can be a fan of monorepos provided your infrastructure can make it work) and making a monolithic service (probably not a good idea that only sounds good when you’re looking for things to do).
But it is not! They were updating dependencies and deploying services separately, and this led to every one of 140 services using a different version of "shared-foo". This made it cumbersome, confusing and expensive to keep going (you want a new feature from shared-foo, you have to take all the other features unless you fork and cherrypick on top, which makes it a not shared-foo anymore).
The point is that true microservice approach will always lead to exactly this situation: a) you either do not extract shared functions and live with duplicate implementations, b) you enforce keeping your shared dependencies always on very-close-to-latest (which you can do with different strategies; monorepo is one that enables but does not require it) or c) you end up with a mess of versions being used by each individual service.
The most common middle ground is to insist on backwards compatibility in a shared-lib, but carrying that over 5+ years is... expensive. You can mix it with an "enforce update" approach ("no version older than 2 years can be used"), but all the problems are pretty evident and expected with any approach.
I'd always err on the side of having a capability to upgrade at once if needed, while keeping the ability to keep a single service on a pinned version. This is usually not too hard with any approach, though monorepo makes the first one appear easier (you edit one file, or multiple dep files in a single repo): but unless you can guarantee all services get replaced in a deployment at exactly the same moment — which you rarely can — or can accept short lived inconsistencies, deployment requires all services to be backwards compatible until they are all updated with either approach).
I'd also say that this is still not a move to a monolith, but to a Service-Oriented-Architecture that is not microservices (as microservices are also SOA): as usual, the middle ground is the sweet spot.
A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
Some of their "solutions" I kind of wonder how they plan on resolving this, like the black box "magic" queue service they subbed back in, or the fault tolerance problem.
That said, I do think if you have a monolith that just needs to scale (single service that has to send to many places), they are possibly taking the correct approach. You can design your code/architecture so that you can deploy "services" separately, in a fault tolerant manner, but out of a mono repo instead of many independent repos.
Internal Google services: *sweating profusely*
(Mostly in jest, it's obviously a different ballgame internal to the monorepo on borg)
This is not true! This is one of the core strengths of protobuf. Non-destructive protobuf changes, such as adding new API methods or new fields, do not require clients to update. On the server-side you do need to handle the case when clients don't send you the new data--plus deal with the annoying "was this int64 actually set to 0 or is it just using the default?" problem--but as a whole you can absolutely independently update a protobuf, implement it on the server, and existing clients can keep on calling and be totally fine.
Now, that doesn't mean you can go crazy, as doing things like deleting fields, changing field numbering or renaming APIs will break clients, but this is just the reality of building distributed systems.
I mean I suppose you can make breaking changes to any API in any language, but that’s entirely on you.
Its a tradeoff
Show me a language runtime or core library that will never have a CVE. Otherwise, by your definition, microservices don’t exist and all service oriented architectures are distributed monoliths.
The logical problem you’re running into is exactly why microservices are such a bad idea for most businesses. How many businesses can have entirely independent system components?
Almost all “microservice” systems in production are distributed monoliths. Real microservices are incredibly rare.
A mental model for true microservices is something akin to depending on the APIs of Netflix, Hulu, HBO Max and YouTube. They’ll have their own data models, their own versioning cycles and all that you consume is the public interface.
For instance if company A used one of the GCP logging stack, and company B does the same. GCP updates it's profuct in a way that strongly encourages upgrading within a specific time frame (e.g. price will drastically increase otherwise), so A and B do it mostly at the same time for the same reason.
Are A and B truly independent under your vision ? or are they a company-spanning monolith ?
Mostly? If you can update A one week and B the next week with no breakage in between, that seems pretty independent.
> Over time, the versions of these shared libraries began to diverge across the different destination codebases.
Do you depend on a cloud provider? Not a microservice. Do you depend on an ISP for Internet? Not a microservice. Depend on humans to do something? Not a microservice.
Textbook definitions and reality rarely coincide, rather than taking such a fundamentalist approach that leads nowhere, recognize that for all intents and purposes, what I described is a microservice, not a distributed monolith.
That is fundamentally incorrect. As presented in my other post you can correctly use the shared repository as a dependency and refer to a stable version vs a dynamic version which is where the problem is presented.
As long as the microservice owners are free to choose what dependencies to take and when to bump dependency versions, it’s fine - and microservice owners who take dependencies like that know that they are obliged to take security patch releases and need to plan for that. External library dependencies work like that and are absolutely fine for microservices to take.
The problem comes when you have a team in the company that owns a shared library, and where that team needs, in order to get their code into production, to prevail upon the various microservices that consume their code to bump versions and redeploy.
That is the path to a distributed monolith situation and one you want to avoid.
Yes, exactly. The point is not elitism. Microservices are a valuable tool for a very specific problem but what most people refer to as "microservices" are not. Language is important when designing systems. Microservices are not just a bunch of separately deployable things.
The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility. The service has a public interface defined in a contract that other components depend on, and that is it, what happens within the service is irrelevant to the rest of the system and vice verse, the service does not have depend on knowledge of the rest of the system. By virtue of being micro in responsibility, it can be deployed anywhere and anyhow.
If it is not a microservice, it is just a service, and when it is just a service, it is probably a part of a distributed monolith. And that is okay, a distributed monolith can be very valuable. The reason many people bristle at the mention of microservices is that they are often seen as an alternative to a monolith but they are not, it is a radically different architecture.
We must be precise in our language because if you or I build a system made up of "microservices" that aren't microservices, we're taking on all of the costs of microservices without any of the benefits. You can choose to drive to work, or take the bus, but you cannot choose to drive because it is the cheapest mode of transport or walk because it is the fastest. The costs and benefits are not independent.
The worst systems I have ever worked on were "microservices" with shared libraries. All of the costs of microservices (every call now involves a network) and none of the benefits (every service is dependent on the others). The architect of that system had read all about how great microservices are and understood it to mean separately deployable components.
There is no hierarchy of goodness, we are just in pursuit of the right tool or the job. A monolith, distributed monolith or a microservice architecture could be the right tool for one problem and the wrong tool for another.
https://www.youtube.com/watch?v=y8OnoxKotPQ
I am talking about using a shared software repository as a dependency. Which is valid for a microservice. Taking said dependency does not turn a microservice into a monoloth.
It may be a build time dependency that you do in isolation in a completely unrelated microservice for the pure purpose of building and compiling your business microservice. It is still a dependency. You cannot avoid dependencies in software or life. As Carl Sagan said, to bake an apple pie from scratch, you must first invent the universe.
>The worst systems I have ever worked on were "microservices" with shared libraries.
Ok? How is this relevant to my point? I am only referring to the manner in which your microservice is referencing said libraries. Not the pros or cons of implementing or using shared libraries (e.g mycompany-specific-utils), common libraries (e.g apache-commons), or any software component for that matter
>Yes, exactly
So you're agreeing that there is no such thing as a microservice. If that's the case, then the term is pointless other than a description of an aspirational yet unattainable state. Which is my point exactly. For the purposes of the exercise described the software is a microservice.
True. However one of the core tenets of microservices is that they should be independently deployable[1][2].
If taking on such a shared dependency does not interfere with them being independently deployable then all is good and you still have a set of microservices.
However if that shared dependency couples the services so that if one needs a new version of the shared dependency then all do, well you suddenly those services are no longer microservices but a distributed monolith.
[1]: https://martinfowler.com/microservices/
[2]: https://www.oreilly.com/content/a-quick-and-simple-definitio...
The "micro" in microservice was a marketing term to distinguish it from the bad taste of particular SOA technology implementations in the 2000s. A similar type of activity as crypto being a "year 3000 technology."
The irony is it was the common state that "services" weren't part of a distributed monolith. Services which were too big were still separately deployable. When services became nothing but an HTTP interface over a database entity, that's when things became complicated via orchestration; orchestration previously done by a service... not done to a service.
There are categories and ontologies are real in the world. If you create one thing and call it something else that doesn’t mean the definition of “something else” should change
By your definition it is impossible to create a state based on coherent specifications because most states don’t align to the specification.
We know for a fact that’s wrong via functional programming, state machines, and formal verification
For example, a library with a security vulnerability would need to be upgraded everywhere regardless of how well you’ve designed your system.
In that example the monolith is much easier to work with.
The npm supply chain attacks were only an issue if you don't use lock files. In fact they were a great example of why you shouldn't blindly upgrade to the latest packages when they are available.
I'm referring to the all hands on deck nature of responding to security issues not the best practice. For many, the NPM issue was an all hands on deck.
Who doesn't use lockfiles? Aren't they the default everywhere now? I really thought npm uses them by default.
It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).
Everyone has been able to exploit that for ages. It only became a problem when it was discovered and publicised.
If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.
Decoupling is the first part of microservices. Pass messages. Use json. I shouldn’t need your code to function. Just your API. Then you can be clever and scale out and deploy on saturdays if you want to and it doesn’t disturb the rest of us.
Yes, but there’s likely a lot of common code related to parsing those messages, interpreting them, calling out to other services etc. shared amongst all of them. That’s to be expected. The question is how that common code is structured if everything has to get updated at once if the common code changes.
The best is when you use containers and build against the latest runtimes in your pipelines so as to catch these issues early and always have the most up to date patches. If a service hasn’t been updated or deployed in a long time, you can just run another build and it will pull latest of whatever.
Yes, there are scenarios where you have to deploy everything but when dealing with micro services, you should only be deploying the service you are changing. If updating a field in a domain affects everyone else, you have a distributed monolith and your architecture is questionable at best.
The whole point is I can deploy my services without relying on yours, or touching yours, because it sounds like you might not know what you’re doing. That’s the beautiful effect of a good micro service architecture.
To use an example from the article, there was this statement: "The split to separate repos allowed us to isolate the destination test suites easily. This isolation allowed the development team to move quickly when maintaining destinations."
This is architecture bleed through. The format produced by Twilio "should" be the canonical form, which is submitted the adapter which mangles it to the "destination" form. Great, that transformation is expressible semantically in a language that takes the canonical form and spits out the special form. Changes to the transformation expression should not "bleed through" to other destinations, and changes to the canonical form should be backwards compatible to prevent bleed through of changes in the source from impacting the destination. At all times, if something worked before, it should continue to work without touching it because the architecture boundaries are robust.
Being able to work with a team that understood this was common "in the old days" when people were working on an operating system. The operating system would evolve (new features, new devices, new capabilities) but because there was a moat between the OS and applications, people understood that they had to architect things so that the OS changes would not cause applications that currently worked to stop working.
I don't judge Twilio for not doing robust architecture, I was astonished when I went to work of Google how lazy everyone got when the entire system is under their control (like there are no third party apps running in the fleet). The was a persistent theme of some bright person "deciding" to completely change some interface and Wham! every other group at Google had to stop what they were doing and move their code to the new thing. There was a particularly poor 'mandate' on a new version of their RPC while I was there. As Twilio notes, that can make things untenable.
And ideally, your logging library should rarely need to update. If you need unique integrations per service, use a plug-in architecture and keep the plug-ins local to each service.
If you communicate with one another you are serializing and deserializing a shared type. That shared type will break at the communication channels if you do not simultaneously deploy the two services. The irony is to prevent this you have to deploy simultaneously and treat it as a distributed monolith.
This is the fundamental problem of micro services. Under a monorepo it is somewhat more mitigated because now you can have type checking and integration tests across multiple repos.
Make no mistake the world isn’t just library dependencies. There are communication dependencies that flow through communication channels. A microservice architecture by definition has all its services depend on each other through this communication channels. The logical outcome of this is virtually identical to a distributed monolith. In fact shared libraries don’t do much damage at all if the versions are off. It is only shared types in the communication channels that break.
There is no way around this unless you have a mechanism for simultaneous merging code and deploying code across different repos which breaks the definition of what it is to be a microservice. Microservices always and I mean always share dependencies with everything they communicate with. All the problems that come from shared libraries are intrinsic to microservices EVEN when you remove shared libraries.
People debate me on this but it’s an invariant.
That means consumers can keep using old API versions (and their types) with a very long deprecation window. This results in loose coupling. Most companies doing microservices do not operate like this, which leads to these lockstep issues.
I'm not saying monoliths are better then microservices.
I'm saying for THIS specific issue, you will not need to even think about API compatibility with monoliths. It's a concept you can throw out the window because type checkers and integration tests catch this FOR YOU automatically and the single deployment insures that the compatibility will never break.
If you choose monoliths you are CHOOSING for this convenience, if you choose microservices you are CHOOSING the possibility for things to break and AWS chose this and chose to introduce a backwards compatibility restriction to deal with this problem.
I use "choose" loosely here. More likely AWS ppl just didn't think about this problem at the time. It's not obvious... or they had other requirements that necessitated microservices... The point is, this problem in essence is a logical consequence of the choice.
Yes, this is absolutely correct. The objects you send over the wire are part of an API which forms a contract the server implementing the API is expected to provide. If the API changes in a way which is backwards compatible, this will break things.
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility which exists when your code and your database do not match. The strategy might be a simple add new column->migrate code->remove old column, including some thought on how to deal with data added in the interim. It might be to use views. It might be some insane strategy of duplicating the full stack, using change data capture to catch changes and flipping a switch.[0] It doesn't really matter, the point is that even within a monolith, you have two separate services, a database and a backend server, and you cannot deploy them truly simultaneously, so you need to have some strategy for dealing with that; or more generally, you need to be conscious of breaking API changes, in exactly the same way you would with independent services.
> The logical outcome of this is virtually identical to a distributed monolith.
Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
[0] I have seen this done. It was as crazy as it sounds.
Agreed and this is a negative. Backwards compatibility is a restriction made to deal with something fundamentally broken.
Additionally eventually in any system of services you will have to make a breaking change. Backwards compatibility is a behavioral comping mechanism to deal with a fundamental issue of microservices.
>This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility.
I believe you and am already aware. It's a limitation that exists intrinsically so it exists because you have No choice. A database and a monolith needs to exist as separate services. The thing I'm addressing here is the microservices and monolith debate. If you choose microservices, you are CHOOSING for this additional problem to exist. If you choose monolith, then within that monolith you are CHOOSING for those problems to not exist.
I am saying regardless of the other issues with either architecture, this one is an invariant in the sense that for this specific thing, monolith is categorically better.
>Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
No you're categorically wrong. If they did this in ANY of the companies you worked at then they are Living with this issue. What I'm saying here isn't an opinion. It is a theorem based consequence that will occur IF all the axioms are satisfied: namely >2 services that communicate with each other and ARE not deployed simultaneously. This is logic.
The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
No certainly possible. You can evolve linux, macos and windows forever without any breaking changes and keep all apis backward compatible for all time. Keep going forever and ever and ever. But you see there's a huge downside to this right? This downside becomes more and more magnified as time goes on. In the early stages it's fine. And it's not like this growing problem will stop everything in it's tracks. I've seen organizations hobble along forever with increasing tech debt that keeps increasing for decades.
The downside won't kill an organization. I'm just saying there is a way that is better.
>I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I have as well. It doesn't mean it doesn't work and can't be done. For example typescript is better than javascript. But you can still build a huge organization around javascript. What I'm saying here is one is intrinsically better than the other but that doesn't mean you can't build something on technology or architectures that are inferior.
And also I want to say I'm not saying monoliths are better than microservices. I'm saying for this one aspect monoliths are definitively better. There is no tradeoff for this aspect of the debate.
>I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
Didn't a break happen recently? Barring that... There's behavioral ways to mitigate this right? like what you mentioned... backward compatible apis always. But it's better to set up your system such that the problem just doesn't exist Period... rather then to set up ways to deal with the problem.
This is correct.
> Neither of these scenarios is practical.
This is not. When you choose appropriate tools (protobuf being an example), it is extremely easy to make a non-breaking change to the communication channel, and it is also extremely easy to prevent breaking changes from being made ever.
Protobuf works best if you have a monorepo. If each of your services lives within it's own repo then upgrades to one repo can be merged onto the main branch that potentially breaks things in other repos. Protobuf cannot check for this.
Second the other safety check protobuf uses is backwards compatibility. But that's a arbitrary restriction right? It's better to not even have to worry about backwards compatability at all then it is to maintain it.
Categorically these problems don't even exist in the monolith world. I'm not taking a side in the monolith vs. microservices debate. All I'm saying is for this aspect monoliths are categorically better.
No. Your shared type is too brittle to be used in microservices. Tools like the venerable protobuf has solved this problem decades ago. You have a foundational wire format that does not change. Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Here’s an analogy. Forget microservices. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with both the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
Also known as the rest of the fucking owl. I am entirely in factual agreement with you, but the number of people who are even aware they maintain an API surface with backwards compatibility as a goal, let alone can actually do it well, are tiny in practice. Especially for internal services, where nobody will even notice violations until it’s urgent, and at such a time, your definitions won’t save you from blame. Maybe it should, though. The best way to stop a bad idea is to follow it rigorously and see where it leads.
I’m very much a skeptic of microservices, because of this added responsibility. Only when the cost of that extra maintenance is outweighed by overwhelming benefits elsewhere, would I consider it. For the same reason I wouldn’t want a toilet with a seatbelt.
1. Protobuf requires a monorepo to work correctly. Shared types must be checked across all repos and services simulateneously. Without a monorepo or some crazy work around mechanism this won't work. Think about it. These type checkers need everything at the same version to correctly check everything.
2. Even with a monorepo, deployment is a problem. Unless you do simultaneous deploys if one team upgrades there service and another team doesn't the Shared type is incompatible simply because you used microservices and polyrepos to allow teams to move async instead of insync. It's a race condition in distributed systems and it's theoremtically true. Not solved at all because it can't be solved by logic and math.
Just kidding. It can be solved but you're going to have to change definitions of your axioms aka of what is currently a microservice, monolith, monorepo and polyrepo. If you allow simultaneous deploys or pushes to microservices and polyrepos these problems can be solved but then can you call those things microservices or polyrepos? They look more like monorepos or monoliths... hmmm maybe I'll call it "distributed monolith".... See we are hitting this problem already.
>Here’s an analogy. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
You are just describing the problem I provided. We call "monoliths" monoliths but technically a monolith must interact with a secondary service called a database. We have no choice in the matter. The monolith and microservice of course does not refer to that problem which SUFFERS from all the same problems as microservices.
>This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
No it's not. Not at all. It's a problem that's lived with. I have two modules in a monolith. ANY change that goes into the mainline branch or deploy is type checked and integration tested to provide maximum safety as integration tests and type checkers can check the two modules simultaneously.
Imagine those two modules as microservices. Because they can be deployed at any time asynchronously, because they can be merged to the mainline branch at any time asynchronously They cannot be type checked or integration tested. Why? If I upgrade A which requires an upgrade to B but B is not upgraded yet, How do I type check both A and B at the same time? Axiomatically impossible. Nothing is solved. Just behavioral coping mechanisms to deal with the issue. That's the key phrase: behavioral coping mechanisms as opposed to automated statically checked safety based off of mathematical proof. Most of the arguments from your side will be consisting of this: "behavioral coping mechanisms"
Most microservice companies either live with the fact or they have round about ways to deal with it including simultaneous deploys across multiple services and simultaneous merging, CI and type checking across different repos.
This is simply dependency hell exploding with microservices.
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $foo after SHA before 2025-12-14 at 12:00 UTC.
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $bar after SHA before 2025-12-14 at 12:00 UTC.
...
It's never ending. You get a half dozen of these on each on call rotation.
Having 140 services managed by what sounds like one team reinforces another point that I believe should be well known by now: you use SOAs (incuding microservices) to scale teams, and not services.
Eg. if a single team builds a shared library for all the 140 microservices and needs to maintain them, it's going to become very expensive quickly: you'll be using v2.3.1 in one service and v1.0.8 in another, and you won't even know yourself what API is available. Operationally, yes, you'll have to watch over 140 individual "systems" too.
There are ways to mitigate this, but they have their own trade-offs (I've posted them in another comment).
As per Conway's law, software architecture always follows the organizational structure, and this seems to have happened here: a single team is moving away from unneeded complexity to more effectively manage their work and produce better outcomes for the business.
It is not a monolith, but properly-scoped service level (scoped to the team). This is, in my experience, the sweet spot. A single team can run and operate multiple independent services, but with growth in those services, they will look to unify, so you need to restructure the team if you don't want that to happen. This is why I don't accept "system architect" roles as those don't give you the tools to really drive the architecture how it can be driven, and I really got into "management" :)
In both cases we had to come up with clever solutions to simply get by because communication between services is of a problem. It is difficult (not impossible) to keep all the contracts in sync and deployment has to be coordinated in a very specific way sometimes. The initial speed you get is soon lost further down the path due to added complexities. There was fear-driven development at play. Service ownership is a problem. Far too much meetings are spent on coordination.
In my latest company everything is part of the same monolith. Yes the code is huge but it is so much easier to work with. We use a lot more unit tests then integration tests. Types make sense. Refactoring is just so easy. All the troubleshooting tools including specialised AI agents built on top of our own platform are part of the code-base which is kind of interesting because I can see how this is turning into a self-improving system. It is fascinating!
We are not planning to break up the monolith unless we grow so much that is impossible to manage from a single git repository. As far as I can tell this may never happen as it is obvious that much larger projects are perfectly well maintained in the exact same way.
The only downside is that build takes longer but honestly we found ways around that as well in the past and now with further improvements in the toolchains delivered by the awesome open-source communities around the world, I expect to see at least 10x improvement in deployment time in 2026.
Overall, in my own assessment, the decision to go for a monolith allowed us to build and scale much faster than if we had used micro services.
I hope this helps.
Given the quality and the structure neither approach really matters much. The root problems are elsewhere.
There's a lot of good research and writing on this topic. This paper, in particular has been really helpful for my cause: https://dl.acm.org/doi/pdf/10.1145/3593856.3595909
It has a lot going for it: 1) it's from Google, 2) it's easy to read and digest, 3) it makes a really clear case for monoliths.
You're not kidding. I had to work with twilio on a project and it was awful. Any time there was an issue with the API, they'd never delve into why that issue had happened. They'd simply fix the data in their database and close the ticket. We'd have the same issue over and over and over again and they'd never make any effort to fix the cause of the problems.
It's amazing how much explanatory power it has, to the point that I can predict at least some traits about a company's codebase during an interview process, without directly asking them about it.
1. "Peter principle": "people in a hierarchy and organizations tend to rise to 'a level of respective incompetence' "
2. "Parkinson's law": "Work expands to fill the available time".
So people are filling all the available the time and working tirelessly to reach their personal and organizational levels of incompetency; working hard without stopping to think if what they are doing should be done at all. And nobody is stopping them, nobody asks why (with the real analysis of positives, negatives, risks).
Incompetent + driven is the worst combination there can be.
I think way too much tooling assumes 1:1 pairings between services and repos (_especially_ CI work). In huge orgs Git/whatever VCS you're using would have problems with everything in one repo, but I do think that there's loads of value in having everything in one spot even if it's all deployed more or less independently.
But so many settings and workflows couple repos together so it's hard to even have a frontend and backend in the same place if both teams manage those differently. So you end up having to mess around with N repos and can't send the one cross-cutting pull request very easily.
I would very much like to see improvements on this front, where one repo could still be split up on the forge side (or the CI side) in interesting ways, so review friction and local dev work friction can go down.
(shorter: github and friends should let me point to a folder and say that this is a different thing, without me having to interact with git submodules. I think this is easier than it used to be _but_)
We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.
This, and if you want build + deploy that’s faster than doing it manually from your dev machine, you pay $$$ for either something like Depot, or a beefy VM to host CI.
A bit more work on those dependency corner cases, along with an auto-sleeping VM, should let us achieve nirvana. But it’s not like we have a lot of spare time on our small team.
* You can use gazelle to auto-generate Bazel rules across many modules - I think the most up to date usage guide is https://github.com/bazel-contrib/rules_go/blob/master/docs/g....
* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.
> a beefy VM to host CI
Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.
I managed a product where a team of 6–8 people handles 200+ microservices. I've also managed other teams at the same time on another product where 80+ people managed a monolith.
What i learned? Both approaches have pros and cons.
With microservices, it's much easier to push isolated changes with just one or two people. At the same time, global changes become significantly harder.
That's the trade-off, and your mental model needs to align with your business logic. If your software solves a tightly connected business problem, microservices probably aren't the right fit.
On the other hand, if you have a multitude of integrations with different lifecycles but a stable internal protocol, microservices can be a lifesaver.
If someone tries to tell you one approach is universally better, they're being dogmatic/religious rather than rational.
Ultimately, it's not about architecture, it's about how you build abstractions and approach testing and decoupling.
These problems are effectively solved on beam, the jvm, rust, go, etc.
(and I say that as someone who caught themselves doing the same: severless is really good at hiding this.)
Diving system into composable parts is a very very difficult problem already and it is only foolish to introduce further network boundaries between them.
Next comeback I see is away from React and SPAs as view transitions become more common.
> Initially, when the destinations were divided into separate services, all of the code lived in one repo. A huge point of frustration was that a single broken test caused tests to fail across all destinations.
You kept breaking tests in main so you thought the solution was to revamp your entire codebase structure? Seems a bit backward.
We can have a service with 100 features, but only enable the features relevant to a given "purpose". That way, we can still have "micro services" but they're running the same code: "bla.exe -foo" and "bla.exe -bar".
This is the problem with the undefined nature of the term `microservices`, In my experience if you can't develop in a way that allows you to deploy all services independently and without coordination between services, it may not be a good fit for your orgs needs.
In the parent SOA(v2), what they described is a well known anti-pattern: [0]
If you cannot, due to technical or political reasons, retain the ability to independently deploy a service, no matter if you choose to actually independently deploy, you will not gain most of the advantages that were the original selling point of microservices, which had to do more with organizational scaling than technical conserns.There are other reasons to consider the pattern, especially due to the tooling available, but it is simply not a silver bullet.
And yes, I get that not everyone is going to accept Chris Richardson's definitions[1], but even in more modern versions of this, people always seem to run into the most problems because they try to shove it in a place where the pattern isn't appropriate, or isn't possible.
But kudos to Twilio for doing what every team should be, reassessing if their previous decisions were still valid and moving forward with new choices when they aren't.
[0] https://www.oracle.com/technetwork/topics/entarch/oea-soa-an... [1] https://microservices.io/post/architecture/2022/05/04/micros...
Doing it for organizational scaling can lead to insular vision with turf defensive attitude, as teams are rewarded on the individual service’s performance and not the complete product’s performance. Also refactoring services now means organizational refactoring, so the friction to refactor is massively increased.
I agree that patterns should be used where most appropriate, instead of blindly.
What pains me is that a language like “Cloud-Native” has been usurped to mean microservices. Did Twilio just stop having a “Cloud-Native” product due to shipping a monolith? According to CNCF, yes. According to reason, no.
Being able to ~instantly obtain a perfect list of all references to all symbols is an extraordinarily powerful capability. The stronger the type system, the more leverage you get. If you have only ever had experience with weak type systems or poor tooling, I could understand how the notion of putting everything into one compilation context seems pointless.
Edit: I’d love to eat the humble pie here. If you have examples of places where monoliths are updated 10-20 times a day by a small (or large) team post the link. I’ll read them all.
I'll assume you're not writing enough bugs that customers are reporting 10-20 new ones per day, but that leaves me confused why you would want to expose customers to that much churn. If we assume an observable issue results in a rollback and you're only rolling back 1-2% of the time (very impressive), once a month or so customers should experience observable issues across multiple subsequent days. That would turn me off making a service integral to my workflow.
Microservices as a goal is mostly touted by people who don't know what the heck they're doing - the kind of people who tend to mistakenly believe blind adherence to one philosophy or the other will help them turn their shoddy work into something passable.
Engineer something that makes sense. If, once you're done, whatever you've built fits the description of "monolith" or "microservices", that's fine.
However if you're just following some cult hoping it works out for your particular use-case, it's time to reevaluate whether you've chosen the right profession.
Putting this anywhere near "engineering" is an insult to even the shoddiest, OceanGate-levels of engineering.
It was when the "microservices considered harmful" articles started popping up that microservices had become a fad. Most of the HN early-startup energy will continue to do monoliths because of team communication reasons. And I predict that if any of those startups are successful, they will have need for separate services for engineering reasons. If anything, the historical faddishness of HN shows that hackers pick the new and novel because that's who they are, for better or worse.
You Want Microservices, But Do You Really Need Them? https://www.docker.com/blog/do-you-really-need-microservices...
Plus, it's ALWAYS easier/better to run v2 of something when you completely re-write v1 from scratch. The article could have just as easily been "Why Segment moved from 100 microservices to 5" or "Why Segment rewrote every microservice". The benefits of hindsight and real-world data shouldn't be undersold.
At the end of the day, write something, get it out there. Make decisions, accept some of them will be wrong. Be willing to correct for those mistakes or at least accept they will be a pain for a while.
In short: No matter what you do the first time around... it's wrong.
Last time it was an aws engineer that worked on route 53, and they dismissed microservices in a startup claiming that in AWS they ran a monolith (as in the r53 dns).
Everything is a monolith if you zoom in enough and ignore everything else. Which I guess you can do when you work on a big company and are in charge of a very specific role.
TL;DR they have a highly partitioned job database, where a job is a delivery of a specific event to a specific destination, and each partition is acted upon by at-most-one worker at a time, so lock contention is only at the infrastructure level.
In that context, each worker can handle a similar balanced workload between destinations, with a fraction of production traffic, so a monorepo makes all the sense in the world.
IMO it speaks to the way in which microservices can be a way to enforce good boundaries between teams... but the drawbacks are significant, and a cross-team review process for API changes and extensions can be equally effective and enable simplified architectures that sidestep many distributed-system problems at scale.
That said I’m curious if you’re basing this on service degradation you’ve seen since the acquisition. We were thinking of starting to use them - is that a bad move?
But a company that can't stand on its own isn't a success in my opinion. Similar things can be said about companies that continue to need round after round of funding without an IPO.
My comment is of the "(2018)" variety. Old news that didn't age well like the people jumping on the "Uber: why we switched to MySQL from Postgres" post. (How many people would choose that decision today?)
People tend to divorce the actual results of a lot of these companies from the gripes of the developers of the tech blogs.
The shitty arch is not a point against (micro)services. SendGrid, another Twilio property, uses (micro)services to great effect. Services there were fully independently deployable.
Small teams run on shared context. That is their superpower. Everyone can reason end-to-end. Everyone can change anything. Microservices vaporize that advantage on contact. They replace shared understanding with distributed ignorance. No one owns the whole anymore. Everyone owns a shard. The system becomes something that merely happens to the team, rather than something the team actively understands. This isn’t sophistication. It’s abdication.
Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement. You don’t “deploy” anymore—you synchronize a fleet. One bug now requires a multi-service autopsy. A feature release becomes a coordination exercise across artificial borders you invented for no reason. You didn’t simplify your system. You shattered it and called the debris “architecture.”
Microservices also lock incompetence in amber. You are forced to define APIs before you understand your own business. Guesses become contracts. Bad ideas become permanent dependencies. Every early mistake metastasizes through the network. In a monolith, wrong thinking is corrected with a refactor. In microservices, wrong thinking becomes infrastructure. You don’t just regret it—you host it, version it, and monitor it.
The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore. What doesn’t scale is chaos. What doesn’t scale is process cosplay. What doesn’t scale is pretending you’re Netflix while shipping a glorified CRUD app. Monoliths scale just fine when teams have discipline, tests, and restraint. But restraint isn’t fashionable, and boring doesn’t make conference talks.
Microservices for small teams is not a technical mistake—it is a philosophical failure. It announces, loudly, that the team does not trust itself to understand its own system. It replaces accountability with protocol and momentum with middleware. You don’t get “future proofing.” You get permanent drag. And by the time you finally earn the scale that might justify this circus, your speed, your clarity, and your product instincts will already be gone."
-DHH
Then you can implement a service in Java, Python, Rust, C++, etc, and it doesn't matter.
Coupling your postgres db to your elasticsearch cluster via a hard library dependency impossibly heavy. The same insight applies to your bespoke services.
Maintaining and testing a codebase containing many external integrations ("Destinations") was one of the drivers behind the earlier decision to shatter into many repos, to isolate the impact of Destination-specific test suite failures caused because some tests were actually testing integration to external 3rd party services.
One way to think about that situation is in terms of packages, their dependency structure, how those dependencies are expressed (e.g. decoupled via versioned artefact releases, directly coupled via monorepo style source checkout), their rates of change, and the quality of their automated tests suites (high quality meaning the test suite runs really fast, tests only the thing it is meant to test, has low rates of false negatives and false positives, low quality meaning the opposite).
Their initial situation was one that rapidly becomes unworkable: a shared library package undergoing a high rate of change depended on by many Destination packages, each with low quality test suites, where the dependencies were expressed in a directly-coupled way by virtue of everything existing in a single repo.
There's a general principle here: multiple packages in a single repo with directly-coupled dependencies, where those packages have test suites with wildly varying levels of quality, quickly becomes a nightmare to maintain. The packages with low quality test suites that depend upon high quality rapidly changing shared packages generate spurious test failures that need to be triaged and slow down development. Maintainers of packages that depend upon rapidly changing shared package but do not have high quality test suites able to detect regressions may find their package frequently gets broken without anyone realising in time.
Their initial move solves this problem by shattering the single repo and trade directly-coupled dependencies with decoupled versioned dependencies, to decouple the rate of change of the shared package from the per Destination packages. That was an incremental improvement but added the complexity and overhead of maintaining multiple versions of the "shared" library and per-repo boilerplate, which grows over time as more Destinations are added or more changes are made to the shared library while deferring the work to upgrade and retest Destinations to use it.
Their later move was to reverse this, go back to directly-coupled dependencies, but instead improve the quality of their per-Destination test suites, particularly by introducing record/replay style testing of Destinations. Great move. This means that the test suite of each Destination is measuring "is the Destination package adhering to its contract in how it should integrate with the 3rd party API & integrate with the shared package?" without being conflated with testing stuff that's outside of the control of code in the repo (is the 3rd party service even up, etc).
Gonna stop you right there.
Microservices have nothing to do with the hosting or operating architecture.
Martin Fowler who formalized the term, Microservices are:
“In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery”
You can have an entirely local application built on the “microservice architectural style.”
Saying they are “often HTTP and API” is besides the point.
The problem Twilio actually describe is that they messed up service granularity and distributed systems engineering processes
Twilio's experience was not a failure of the microservice architectural style. This was a failure to correctly define service boundaries based on business capabilities.
Their struggles with serialization, network hops, and complex queueing were symptoms of building a distributed monolith, which they finally made explicit with this move. So they accidentally built a system with the overhead of distribution but the tight coupling of a single application. Now they are making their foundations of architecture fit what they built, likely cause they poorly planned it.
The true lesson is that correctly applying microservices requires insanely hard domain modeling and iteration and meticulous attention to the "Distributed Systems Premium."
https://martinfowler.com/microservices/
Just because he says something does not mean Fowler “formalized the term”. Martin wrote about every topic under the sun, and he loved renaming and or redefining things to fit his world view, and incidentally drive people not just to his blog but also to his consultancy, Thoughtworks.
PS The “single application” line shows how dated Fowlers view were then and certainly are today.
Anyone who has ever developed in a Java codebase with "Service" and then "ServiceImpl"s everywhere can see the lineage of that model. Services were supposed to be the API, and the implementation provided in a separate process container. Microservices signalled a time where SOA without Java as a pre-requisite had been successful in large tech companies. They had reached the point of needing even more granular breakout and a reduction of reliance on Java. HTTP interfaces was an enabler of that. 2010s era microservices people never understood the basics, and many don't even know what they're criticizing.
There are infinite permutations in architecture and we've collectively narrowed them down to things that are cheap to deploy, automatically scale for low costs, and easily replicable with a simple script
We should be talking about how AI knows those scripts too and can synthetize adjustments, dedicated Site Reliability Engineers and DevOps is great for maintaining convoluted legacy setups, but irrelevant for doing the same thing from scratch nowadays
A network call. Because nothing could be better for your code than putting the INTERNET into the middle of your application.
--
The "micro" of microservices has always been ridiculous.
If it can run on one machine then do it. Otherwise you have to deal with networking. Only do networking when you have to. Not as a hobby, unless your program really is a hobby.
yes, networking is the bottleneck between the processes, while one machine is the bottleneck to end users
You can run your monolith on multiple machines and round-robin end-user requests between them. Your state is in the DB anyway.
https://github.com/sirupsen/napkin-math
Local function calls are infinitely more reliable. The main operational downside with a binary monolith is that a bug in one part of the program will crash the whole thing. Honestly, I still think Erlang got it right here with supervisor trees. Use "microservices". But let them all live on the same computer, in the same process. And add tooling to the runtime environment to allow individual "services" to fail or get replaced without taking down the rest of the system.
If you have a company that writes software, please ask a professional software/systems architect to review your plans before you build. The initial decisions here would be a huge red flag to any experienced architect, and the subsequent decisions are full of hidden traps, and are setting them up for more failure. If you don't already have a very skilled architect on staff (99% chance you don't) you need to find one and consult with them. Otherwise your business will suffer from being trapped in unnecessary time-consuming expensive rework, or worse, the whole thing collapsing.