I’ve done some work with “actually” distributed systems (as in gossip protocols and self-organizing networks and logical clocks and blah), so I was fairly skeptical of the promises of serverless functions right from the start, because many of these pitfalls are – like the article notes – pretty classic stuff in distributed systems.
Honestly it strikes me as hilarious that someone would think “hey, you know what would make this system easier to operate? Making it distributed!”. We’ve seen the same thing with microservices; endless boondoggles with network interaction graphs that look like they could summon Cthulhu, brittle codebases where you can break services you didn’t even know existed by changing something in your API you thought was trivial, junior (and sometimes even more senior) coders writing stuff that doesn’t take into account eventual consistency (or database write consistency levels, or what happens if the service dies in the middle of operation X, or or or or or…) and that breaks seemingly at random, etc. etc.
Not that I’m saying it’s impossible to get these things right. Definitely not, and I’ve seen that happen too. It just takes much more work and skill, a well planned infrastructure, good tracing, debugging and instrumentation capabilities, painless local dev envs etc. etc., and on top of all the “plumbing” you need several people with a solid understanding of distributed systems and their behavior, so they can keep people from making costly mistakes without even realizing it until production.
Oddly enough I’d say this exact same thing about monoliths. Except for one thing: you’re right that applications are easier to implement and operate as monoliths, but they are easier to manage and maintain as microservices. So in a way, the question is one of perspective. If you want shit done today, write a monolith. It’s you want that shit to continually operate and grow for a decade or more, write microservices.
Oh I have nothing against microservices as a concept and they can be very maintainable and easy to operate, but that’s rarely the outcome when the people building the systems don’t quite know what they’re doing or have bad infra.
Monoliths are perfectly fine for many use cases, but once you go over some thresholds they get hard to scale unless they’re very well designed. Lock free systems scale fantastically because they don’t have to wait around for eg. state writes, but those designs often mean ditching regular databases for the critical paths and having to use deeper magics like commutative data structures and gossiping, so your nodes can be as independent as possible and not have any bottlenecks. But even a mediocre monolith csn get you pretty far if you’re not doing anything very fancy.
Microservices give you more granular control “out of the box”, but that doesn’t come without a price, so you need much better tooling and a more experienced team. Still a bit of a minefield because reasoning about distributed systems is hard. They have huge benefits but you really need to be on your game or you’ll end in a world of pain 😀 I was the person who unfucked distributed systems at one company I worked in, and I was continuously surprised by how little thought many coders paid to eg. making sure the service state stays in a “legal” state. Database atomicity guarantees were often either misused or not used at all, so if a service had to do multiple writes to complete some “transaction” (loosely) and it died during rhe write, or a database node died and only part of the writes went through to the master, and maybe suddenly you’re looking at some sort of spreading Byzantine horror where nothing makes sense anymore because that partially completed group of writes has affected other systems. Extreme example, sure, but Byzantine faults where a corrupted state spreads and fucks your consensus are something you only see in a distributed context.
Yeah so that’s one place I sort of sacrifice microservice purity. If you have a transaction that needs to update multiple domains then you need a non-microservice to handle that IMO. All of these rules are really good rules of thumb, but there will always be complexity that doesn’t fit into our perfect little boxes.
The important thing is to document the hell out of the exceptions and do everything you can to keep them on the periphery of business logic. Fight like hell to minimize dependencies on them.
But that’s true of any architecture.
If it’s not clear I pretty much agree with everything you say here.
I’ve done some work with “actually” distributed systems (as in gossip protocols and self-organizing networks and logical clocks and blah), so I was fairly skeptical of the promises of serverless functions right from the start, because many of these pitfalls are – like the article notes – pretty classic stuff in distributed systems.
Honestly it strikes me as hilarious that someone would think “hey, you know what would make this system easier to operate? Making it distributed!”. We’ve seen the same thing with microservices; endless boondoggles with network interaction graphs that look like they could summon Cthulhu, brittle codebases where you can break services you didn’t even know existed by changing something in your API you thought was trivial, junior (and sometimes even more senior) coders writing stuff that doesn’t take into account eventual consistency (or database write consistency levels, or what happens if the service dies in the middle of operation X, or or or or or…) and that breaks seemingly at random, etc. etc.
Not that I’m saying it’s impossible to get these things right. Definitely not, and I’ve seen that happen too. It just takes much more work and skill, a well planned infrastructure, good tracing, debugging and instrumentation capabilities, painless local dev envs etc. etc., and on top of all the “plumbing” you need several people with a solid understanding of distributed systems and their behavior, so they can keep people from making costly mistakes without even realizing it until production.
Oddly enough I’d say this exact same thing about monoliths. Except for one thing: you’re right that applications are easier to implement and operate as monoliths, but they are easier to manage and maintain as microservices. So in a way, the question is one of perspective. If you want shit done today, write a monolith. It’s you want that shit to continually operate and grow for a decade or more, write microservices.
Oh I have nothing against microservices as a concept and they can be very maintainable and easy to operate, but that’s rarely the outcome when the people building the systems don’t quite know what they’re doing or have bad infra.
Monoliths are perfectly fine for many use cases, but once you go over some thresholds they get hard to scale unless they’re very well designed. Lock free systems scale fantastically because they don’t have to wait around for eg. state writes, but those designs often mean ditching regular databases for the critical paths and having to use deeper magics like commutative data structures and gossiping, so your nodes can be as independent as possible and not have any bottlenecks. But even a mediocre monolith csn get you pretty far if you’re not doing anything very fancy.
Microservices give you more granular control “out of the box”, but that doesn’t come without a price, so you need much better tooling and a more experienced team. Still a bit of a minefield because reasoning about distributed systems is hard. They have huge benefits but you really need to be on your game or you’ll end in a world of pain 😀 I was the person who unfucked distributed systems at one company I worked in, and I was continuously surprised by how little thought many coders paid to eg. making sure the service state stays in a “legal” state. Database atomicity guarantees were often either misused or not used at all, so if a service had to do multiple writes to complete some “transaction” (loosely) and it died during rhe write, or a database node died and only part of the writes went through to the master, and maybe suddenly you’re looking at some sort of spreading Byzantine horror where nothing makes sense anymore because that partially completed group of writes has affected other systems. Extreme example, sure, but Byzantine faults where a corrupted state spreads and fucks your consensus are something you only see in a distributed context.
Yeah so that’s one place I sort of sacrifice microservice purity. If you have a transaction that needs to update multiple domains then you need a non-microservice to handle that IMO. All of these rules are really good rules of thumb, but there will always be complexity that doesn’t fit into our perfect little boxes.
The important thing is to document the hell out of the exceptions and do everything you can to keep them on the periphery of business logic. Fight like hell to minimize dependencies on them.
But that’s true of any architecture.
If it’s not clear I pretty much agree with everything you say here.