background

With the popularity of microservices, many companies have split the system into many microservices according to business boundaries, when troubleshooting and checking logs. Because business links run through many microservice nodes, it becomes very difficult to locate the logs of a certain request and the logs of upstream and downstream businesses.

Based on the OpenTracing specification, distributed tracing open source frameworks such as SkyWalking and Pinpoint were born. However, it takes a certain amount of time and learning costs to build a distributed tracing system, become familiar with it, and promote it to the entire company, and it involves link span nodes. Storage cost issue, full collection or partial collection? If full collection is used, taking SkyWalking storage as an example, at least 5 nodes are required to build an ES cluster. This requires increased server costs. Moreover, if there are many microservice nodes, it is actually very normal to generate tens of gigabytes or hundreds of gigabytes of data in a day. If you want to save it for a longer time, you also need to increase the cost of the server disk.

This article is based on the simplest way to solve the log tracking problem. It does not collect logs and does not require additional storage space. It just automatically labels your logs and automatically generates a TraceId that runs through your microservices. links, without intrusion into existing code, quickly solve distributed system log tracking, and facilitate daily positioning and troubleshooting of business logs.

Implementation ideas and difficulties

  • Each request uses a unique identifier to track all links and display them in the log, and does not modify the original printing method (no code intrusion)
  • Add the traceId identifier to the log template of the MDC mechanism using Logback, and the value method is %X{trace_id}
  • Since MDC uses ThreadLocal internally, only this thread is valid, and the values in sub-threads and downstream service MDC will be lost; therefore, the main difficulty of the solution is to solve the value transfer problem, which mainly includes the following parts:
    • How to pass MDC data in spring gateway to downstream
    • How to pass and accept microservices before calling other remote services
    • How to pass it to child threads in asynchronous situations (thread pool)
    • How message queue producers and consumers transmit and receive messages, such as kafka, etc.
    • How to mark the log of each scheduled execution of a scheduled task

Based on the technical framework currently used by the company:

1. Configure logback

1
2
3
4
5
6
7
8
9
10
11
<appender name="SYSTEM_LOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
<rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
<FileNamePattern>${SYS_LOG_HOME}/${spring.application.name}_log_%d{yyyy-MM-dd}_%i.log</FileNamePattern>
<MaxHistory>10</MaxHistory>
<maxFileSize>1GB</maxFileSize>
</rollingPolicy>
<append>true</append>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{50} [TRACEID:%X{trace_id}] - %msg%n</pattern>
</encoder>
</appender>

2. Spring gateway adds a filter to pass the unique value requested by the client or automatically generates a unique uuid set to the header

1
2
3
4
5
6
7
8
9
10
11
12
@Component
@Slf4j
public class AuthGlobalFilter implements GlobalFilter, Ordered {

public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
//do something
String uuid = UUID.randomUUID().toString().replaceAll("-", "").toUpperCase();
headers.set("trace_id", uuid);
log.info("Add header traceId {}", uuid);
}

}

3. Add interceptors to business application systems

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
@Slf4j
public class TraceIdInterceptor implements HandlerInterceptor {
private static final String TRACE_ID = "trace_id";

@Override
public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler) throws Exception {
//If there is an upper layer call, use the upper layer ID
String traceId = request.getHeader(TRACE_ID);
if (StringUtils.isBlank(traceId)) {
traceId = UUID.randomUUID().toString().replaceAll("-", "").toUpperCase();
}
log.info("TraceIdInterceptor add trace_id {}",traceId);
MDC.put(TRACE_ID, traceId);
log.info("TraceIdInterceptor add trace_id end {}",traceId);
return true;
}

@Override
public void postHandle(HttpServletRequest request, HttpServletResponse response, Object handler, ModelAndView modelAndView)
throws Exception {
}

@Override
public void afterCompletion(HttpServletRequest request, HttpServletResponse response, Object handler, Exception ex)
throws Exception {
//delete after calling
log.info("TraceIdInterceptor remove trace_id");
MDC.remove(TRACE_ID);
log.info("TraceIdInterceptor remove trace_id end");
}

}
1
2
3
4
5
6
7
8
9
10
@Configuration
public class WebConfig implements WebMvcConfigurer {
@Override
public void addInterceptors(InterceptorRegistry registry) {

TraceIdInterceptor traceIdInterceptor = new TraceIdInterceptor();
registry.addInterceptor(traceIdInterceptor);
}

}

4. Extend custom thread pool to pass values

1
2
3
4
5
6
7
8
9
10
11
12
13
14
@Bean("commonExecutor")
public Executor commonExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(cpuNum);
executor.setMaxPoolSize(cpuNum*2);
executor.setKeepAliveSeconds(keepAliveSeconds);
executor.setThreadNamePrefix("common-business-pool");
executor.setTaskDecorator(new MdcTaskDecorator());
executor.setQueueCapacity(queueCapacity);
executor.setWaitForTasksToCompleteOnShutdown(true);
executor.setAwaitTerminationSeconds(keepAliveSeconds);
executor.initialize();
return executor;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
public class MdcTaskDecorator implements TaskDecorator {
@Override
public Runnable decorate(Runnable runnable) {
Map<String, String> map = MDC.getCopyOfContextMap();
return () -> {
try {
MDC.setContextMap(map);
runnable.run();
} finally {
MDC.clear();
}
};
}
}

5. Implement openfeign interceptor to handle traceId transfer

1
2
3
4
5
6
7
8
9
10
11
12
13
@Slf4j
@Configuration
public class FeignTraceIdInterceptor implements RequestInterceptor {
private static final String TRACE_ID = "trace_id";
@Override
public void apply(RequestTemplate template) {
String traceId = MDC.get(TRACE_ID);
if (StringUtils.isNotBlank(traceId)) {
log.info("feign Interceptor add header trace id {}", traceId);
template.header(TRACE_ID, traceId);
}
}
}

6. Use aop to handle xxl-job traceId transfer

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
@Slf4j
@Aspect
@Component
public class TraceIdXxlJobAspect {
private static final String TRACE_ID = "trace_id";

@Pointcut("@annotation(com.xxl.job.core.handler.annotation.XxlJob)")
public void cut() {
}

@Around("cut()")
public Object around(ProceedingJoinPoint jp) throws Throwable {
MethodSignature signature = (MethodSignature)jp.getSignature();
Method method = signature.getMethod();
String jobParam = XxlJobHelper.getJobParam();
try {
XxlJobHelper.log("job {} param {} start ...", method, jobParam);
String traceId = MDC.get(TRACE_ID);
if (StringUtils.isBlank(traceId)) {
traceId = UUID.randomUUID().toString().replaceAll("-", "").toLowerCase();
}
MDC.put(TRACE_ID, traceId);
log.info("TraceIdXxlJobAspect add trace id {}", traceId);
return jp.proceed();
} finally {
MDC.remove(TRACE_ID);
XxlJobHelper.log("job {} param {} end ...", method, jobParam);
}
}
}

7. Add interceptor to support Kafka traceId delivery

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Slf4j
public class TraceIdKafkaProducerInterceptor implements ProducerInterceptor<String, String> {
private static final String TRACE_ID = "trace_id";

@Override
public ProducerRecord<String, String> onSend(ProducerRecord<String, String> record) {
String traceId = MDC.get(TRACE_ID);
if (StringUtils.isBlank(traceId)) {
traceId = UUID.randomUUID().toString().replaceAll("-", "").toLowerCase();
}
Headers headers = record.headers();
headers.add(TRACE_ID, traceId.getBytes());
log.info("kafka add traceId {} ...", traceId);
return producerRecord;
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
@Slf4j
public class TraceIdKafkaConsumerInterceptor implements ConsumerInterceptor<String, String> {
private static final String TRACE_ID = "trace_id";

@Override
public ConsumerRecords<String, String> onConsume(ConsumerRecords<String, String> records) {
if (records == null || records.isEmpty()) {
return records;
}
records.forEach(x -> {
Header[] headers = x.headers().toArray();
Arrays.asList(headers)
.forEach(s -> {
if (s.key().equals(TRACE_ID)) {
String traceId = new String(s.value());
MDC.put(s.key(), traceId);
log.info("kafka consumer add traceId {}", traceId);
}
});
});
for (ConsumerRecord<String, String> record : records) {
log.info("Receive kafka message key:{},value:{},partition:{},topic:{}", record.key(), record.value(), record.partition(), record.topic());
}

return records;
}

@Override
public void onCommit(Map<TopicPartition, OffsetAndMetadata> map) {
log.info("remove traceId suc...");
MDC.remove(TRACE_ID);
}

@Override
public void close() {

}

@Override
public void configure(Map<String, ?> map) {

}
}

Currently supported features are as follows:

  • Complete lightweight microservice log tracking by labeling logs
  • No intrusive design for business code, simple to use, quick access
  • Supports adaptation to the three common logging frameworks: log4j, log4j2, and logback
  • Support extending springcloud framework, openFeign, etc.
  • Adapt http call tag delivery for HttpClient and Okhttp
  • Support common task frameworks, JDK’s TimerTask, Quartz, XXL-JOB
  • Supports asynchronous thread tracking, including thread pools, multi-level asynchronous threads and other scenarios
  • Almost no performance loss, fast and stable, after pressure test, the loss is 0.01%