Five Questions to Ask When Choosing the Way Your Microservices Communicate
Microservices development is a growing field, and microservice communications are still maturing. There are many ways how microservices communicate, ranging from different messaging technologies to service mesh. The goal of this discussion is to help you ask the right questions and make educated decisions.
- What kind of microservice communication works for me?
There are two broad ways for microservices to communicate. One is pub-sub (publisher-subscriber) messaging, and the other is Remote Procedure Call (RPC).
Pub-sub can be described as a form of communication where there are microservices which are publishers of messages and the microservices acting as subscribers of these messages. Messages are stored in the message queues, and subscribers can consume messages regardless of when and how the publishers are generating the messages. This way of communication is known as asynchronous because there is no dependence between the publishing of messages and the process of subscribing.
RPC is a direct interaction between microservices by remote function calls. One microservice initiates the procedure call on another microservice. Once the procedure is executed, a reply is sent to the originating microservice. This request-reply interaction is referred to as the synchronous way of communication between microservices.
The right question to ask is which of those two ways of communication between microservices fits better as a solution to your problem? Do you need a request-reply type of interaction between your microservices, or it is more about microservices publishing and subscribing to messages? Also, in some cases, you may need both RPC and pub-sub to solve the problem, which I will describe further in the text.
A widely deployed solution for microservice communication that mostly fits into the RPC model of communication is service mesh. From the perspective of developers, a service mesh is a network abstraction of microservices. For example, instead of thinking how to assign the IP addresses to microservices, a developer can focus on communication between microservices without the need to deal with IP address assignments.
Here are some of the most popular implementations of service mesh: Envoy/Istio, Linkerd, and AWS App Mesh. When it comes to pub-sub messaging technologies, the most popular are Kafka, RabbitMQ, and AWS SQS. There are many more messaging and service mesh implementations to explore. Comparison of service mesh and comparison of pub-sub implementations are elaborate topics, and they are out of the scope of this discussion.
- When to consider a service mesh?
The main use cases for service mesh are addressing the fundamental pain points when maintaining the applications. These use cases address problems of observability, traffic management, and security:
- Observability gives insights into how microservices communicate based on metrics, logs, app traces, events, and other diagnostic attributes
• Traffic management allows control of microservice traffic properties, such as service performance and latency
• Security provides encryption between microservices as well as authorization and authentication
- Service mesh removes a lot of complexity from your application. It implements service discovery, load balancing, traffic management, routing decisions, telemetry, policy enforcement, canary roll-outs, security such as mTLS communication between the services. You don’t need to implement it on your own. Service mesh does it for you. That’s a massive advantage over traditional solutions where the application itself needs to implement many of these features.
- Another reason to adopt service mesh is to be able to have a polyglot application development environment. Choosing multiple programming languages allows development teams greater flexibility when implementing their features. Even startups use a few languages for their microservice development, while large companies can have up to 10.
- There is an additional benefit of moving service mesh code away from the actual application. At larger companies, platform teams can maintain the service mesh-related code, and the business logic can be the responsibility of application development teams. For some companies, that can increase reliability because people who are very familiar with platform architecture take care of infrastructure.
- The data plane of the service mesh consists of proxies sitting at each microservice. When deciding whether service mesh is for you, keep in mind that each proxy adds approximately 1 ms of latency. You could reduce it below 1ms by sophisticated implementations such as intelligent routing and load balancing. For most, this additional latency is acceptable compared to the benefits of the service mesh.
- How should a small number of microservices communicate?
Let’s first discuss the case where you are thinking of using service mesh for a limited set of microservices. Service mesh benefits are significant, including the capability to scale well, but all of that comes with a cost. Startups with few people may not be able to afford a separate platform team to take care of their service mesh. For them, one option could be a hosted service mesh such as AWS App Mesh.
Hosted service mesh offerings can simplify the complexity of maintenance. But hosted solutions may have fewer features than other more matured implementations of a service mesh. If a limited number of your microservices are business-critical and you need to have deep insights into what your microservices are doing, then full-featured service mesh may be irreplaceable regardless of the pain to maintain it.
Alternatively, if service mesh is a maintenance burden, developers of applications with a smaller number of microservices should think about custom implementations. Those implementations could be developed based on RPC calls between microservices.
Similarly, if a limited number of services you plan to develop fits into the description of the pub-sub model of microservice communication, you want to use messaging technology. Consider choosing a messaging implementation that takes the least time from your team to maintain it. Again, you need to ensure such a solution has all of the features you need.
Do not forget to take in assumption the future of your small set of microservices. If there is a possibility that they will grow in number and complexity, you may not want to start microservice development with simpler communication solutions.
- Can I implement pub-sub microservice communication with service mesh?
Pub-sub communication and service mesh are used for different use cases. Their implementations are fundamentally different and are not meant to be used to solve the same problem.
On the other hand, some applications need to combine those two approaches to microservice communication. A number of projects are starting to address dual architecture, which includes pub-sub capabilities and service mesh.
One example of such development is a project of Kafka support for Envoy. Among many interesting ideas outlined at the previous link related to Kafka/Envoy are L7 protocol parsing for observability and rate-limiting at both connection and L7 message level.
Another example of developers combining different microservice communication types is running Kafka over Istio. See detailed performance results at the previous link.
Among projects exploring the dual architecture area of pub-sub communication and service mesh, you can find Gloo-NATS/Envoy
In the future, there will be implementations for RPC to pub-sub and pub-sub to RPC. Istio project is also committed to work on bringing pub-sub capabilities into service mesh.
- Should I use API Gateway for microservices communication?
Reverse proxies or, more generally, API Gateways are the more traditional edge for communication to the web tier of the application. With the introduction of microservices, application architecture changed, and some microservices became both internal and web-facing. This approach distributes the flow of requests from the web and allows better scaling. It is crucial when solving a problem of large-scale edge proxy struggling to deal with millions of requests per second. By replacing proxy with a number of microservices, a high rate of requests becomes more manageable. That is the reason why service mesh should be considered as an alternative to centralized proxies and API gateways.
When talking about API Gateways and service mesh, it would be interesting to mention the case of Kong, which implements both API Gateway and service mesh.
Let’s summarize what you should consider when choosing how your microservices communicate. First, check if pub-sub or RPC fits into a solution to your problem and choose the appropriate one. If RPC is what you need, but your microservices are limited in number, you may not want to choose service mesh. The cost of maintaining service mesh is not always acceptable in such a case. For those who want to take advantage of both the pub-sub and service mesh approach, some emerging or future solutions may be the right implementations to explore.