Distributed Tracing in Microservices

Hi Folks,

Today, I’m expecting to cover one of the main aspects we should focus on when we design a software system. Under Non-Functional Requirements (NFR), we address several criteria such as maintainability, adaptability, resilience, and so on in the design phase. Because we need to make sure doing the right things and also doing things right.

Software systems are fragile, building bug-free system is still far away from software engineering. And the worst case is those mistakes are really costly. Even 5min downtime costs millions of dollars most of the cases in the industry. But observing the system behavior and reacting to the failures within a few minutes will make our life more comfortable as engineers and we can save from our company files for bankruptcy.

Today’s topic, Observability we can define as this;

The measure of how well the internal state of a system can be inferred from knowledge of its external outputs.

Based on the different system inputs such as CPU utilization, request-respond time, success/fail count may act as a messenger to survive us from catastrophic system failure. Logging, metrics, and tracing play a major role in observability, and we call those activities are the three pillars of observability. Each of the activities owns distinct capabilities and some products shipped with all three in one product and some products address either one of three. Fig 01 — depicting the three pillars and capabilities of each activity.

Up to now, the contents we covered are applicable to both monolithic and microservices architectures. The rest of the content is only applicable to the microservices architectural paradigm in other words let’s focus on how we can achieve Observerbility NFR in the distributed computing world.

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.

— Leslie Lamport (2013 Turing Award winner for explaining and formulating the behavior of distributed computing systems )

Lamport’s funny (and scary at the same time) definition of distributed systems explain very well how things may worst in a distributed context, with respect to all the good thing in distributed systems, we should agree that observability is a bit challengeable in the distributed world. Because in order to full fill our single requirement different applications hosted on independent hardware should work together. While loosely coupled systems make our life easier at the same time it comes with a cost it is nothing but the complexity of tracing.

Distributed tracing using Spring Cloud Sleuth

The Spring Boot microservices ecosystem contains simple and straightforward implementation to track each request using two IDs call trace ID and span ID. Developers can simply enable these capabilities to their application with few lines of code. Let’s see how to engage sleuth with the Spring Boot app.

About Sample ProjectWe implemented two services in the hypothetical scenario on the reservation domain. Booking-service work as a backend-for-frontend service and user request hits directly to the booking-service when booking request raises two inter-services communication happens from booking-service to host-management-service and verify the availability of host and make booking calls.:\> git clone https://github.com/Denuwanhh/distributed-tracing-sample.git
  1. Add Maven dependencies on the POM file

2.Add properties on application.properties file

Once you add dependencies and properties, you can run your Spring Boot application and see the logs with a few additional pieces of information. Basically, along with each log, it prints [application name, trace ID, span ID]. Don’t worry I know it doesn’t make any sense for the movement.

2022–02–20 02:06:41.949 INFO 
9792 — — [nio-8080-exec-8] com.booking.service.BookingService : ...
2022–02–20 02:06:47.245 INFO
9792 — — [nio-8080-exec-8] com.booking.service.BookingService : ...

But let’s spin up two services, which are intercommunicating to each for full fill one user request. In my sample project, booking-service communicate with host-management-service to the full fill booking requests request via below cURL.

curl -X POST \
http://localhost:8080/booking \
-H ‘content-type: application/json’ \
-d ‘{
“bookingID” : 1,
“bookingDate” : “2022–02–19T00:00:00.000Z”,
“host” : {
“hostID” : 10

When you carefully inspect both services logs, you can see the printed logs with [application name, trace ID, span ID] format and the trace ID is common for all the logs belonging to one single request. That’s the base of Sleuth’s implementation. And I know it’s not a wow fact still you are looking for.

Visualize Spring Cloud Sleuth data using Zipkin

Still, the data we owned doesn’t make a lot of sense for us. Zipkin full fills the data visualization part of the tracing activity. The Zipkin implementation is also straightforward, you just need to add Zipkin base URL property on your property files and maven dependency, and Spring Boot handles the rest of the things for you.

1.Building and starting an instance of Zipkin

You can spin up the Zipkin instance using three options: using Java, Docker, or running from source. In this demo, I prefer to fetch the latest release as a self-contained executable jar.

curl -sSL https://zipkin.io/quickstart.sh | bash -s
java -jar zipkin.jar

2.Add Maven dependencies on the POM file

3.Add properties on application.properties file

Once you set up pre-required setups, just try to spin up the Zipkin instance and execute the same cURL request again and see what are the data visualizations available on Zipkin for the particular request.

Polyglot architecture is a proud capability we always highlight when we talk about microservice. Some time full fill one user request we may need to communicate with different systems which are built on different programming languages. In this post we refer to Spring Cloud Sleuth which is developed under the Spring cloud project, we have a different client such as Jaeger which is developed by Uber. Zipkin support open tracing, which means Zipkin can collect data from a different client. So if you are interesting you can try Zipkin with both Sleuth & Jaeger as clients.

Happy Coding…



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store