Skip to content
This repository was archived by the owner on Dec 10, 2024. It is now read-only.

Blitz4j at a glance

karthik-vn edited this page Nov 19, 2012 · 24 revisions

What is Blitz4j?

Blitz4j is a logging framework built on top of log4j to reduce multithreaded contention and enable highly scalable logging without affecting application performance characteristics.

At Netflix, Blitz4j is used to log billions of events for monitoring, business intelligence reporting, debugging and other purposes. Blitz4j overcomes traditional log4j bottlenecks and comes built with a highly scalable and customizable asynchronous framework. Blitz4j comes with the ability to convert the existing log4j appenders to use the asynchronous model without changing the existing log4j configurations.

Blitz4j uses archaius (http://www.github.com/Netflix/archaius) that allows log4j dynamic configurations for a server or a group of servers. Blitz4j also tries to mitigate data loss and provides a way to summarize the log information during log storms

History of Blitz4j

At Netflix, log4j has been used as a logging framework for a few years. It had worked fine for us, until the point where there was a real need to log lots of data. When our traffic increased and when the need for per-instance logging went up, log4j's frailties started to get exposed.

Problems with Log4j

  • Contended Synchronization with Root Logger

Log4j follows a hierarchical model and that makes it easy to turn off/on logging based on a package or a class level (if your logger definition follows that model). In this model, root logger is at the top of the hierarchy. In most cases, all loggers have to get access the root logger to log to the appenders configured there. One of the biggest problems here is that locking is needed on the root logger to write to the appenders. For a high traffic application that logs lot of data this is a big contention point as all application threads have to synchronize on the root logger.

public
  void callAppenders(LoggingEvent event) {
    int writes = 0;
   for(Category c = this; c != null; c=c.parent) {
     // Protected against simultaneous call to addAppender, removeAppender,...
      synchronized(c) {
	if(c.aai != null) {
	  writes += c.aai.appendLoopOnAppenders(event);
	}
	if(!c.additive) {
	  break;
	}
      }
    }

    if(writes == 0) {
      repository.emitNoAppenderWarning(this);
    }
  }

This severely limits the application scalability. Even if the critical section is executed quickly, this is a huge bottleneck in high volume logging applications.

The reason for the lock seems to be for protection against potential change in the list of appenders which should be a rare event. For us, the thread dump has exposed this contention numerous times.

Asynchronous appender to the rescue?

The longer the time spent in logging to the appender, more threads wait on the logging to complete. Any buffering here helps the scalability of the application tremendously. File Appender comes with buffering helps this problem quite a bit. The built in log4j asynchronous appender alleviates this problem quite a bit, but it still does not remove this synchronization. Thread dumps revealed another point of contention when logging to the appender.

It was quite clear that the built-in asynchronous appender is less concurrent because of this synchronization.

public
  synchronized 
  void doAppend(LoggingEvent event) {
    if(closed) {
      LogLog.error("Attempted to append to closed appender named ["+name+"].");
      return;
    }
    
    if(!isAsSevereAsThreshold(event.getLevel())) {
      return;
    }

    Filter f = this.headFilter;
    
    FILTER_LOOP:
    while(f != null) {
      switch(f.decide(event)) {
      case Filter.DENY: return;
      case Filter.ACCEPT: break FILTER_LOOP;
      case Filter.NEUTRAL: f = f.getNext();
      }
    }
    
    this.append(event);    
  }

Deadlock Vulnerability

This double locking (rootlogger and appender) also makes the application vulnerable to deadlocks if your appender by any chance tries to take a lock on a resource and if that resource tries to log to the appender at the same time.

Locking on Logger Cache

In log4j, loggers are cached in a Hashtable and that needs to be locked for any retrieval of a cached logger. When you want to change the log4j settings dynamically, there are 2 steps in the process.

  1. Reset and empty out all current log4j configurations
  2. Load all configurations including new configurations

During the reset process, locks have to be held on both the cache and the individual loggers. If any of your appenders try to look up the logger from the cache at the same time, you have the classic case of locks trying to be held in opposite directions and chance of a deadlock.

public
  void shutdown() {
    Logger root = getRootLogger();

    // begin by closing nested appenders
    root.closeNestedAppenders();

    synchronized(ht) {
      Enumeration cats = this.getCurrentLoggers();
      while(cats.hasMoreElements()) {
	Logger c = (Logger) cats.nextElement();
	c.closeNestedAppenders();
      }

Blitz4j to the rescue

Central to all the contention and the deadlock vulnerabilities is the locking model in log4j. If the log4j used any of the concurrent data structures with JDK 1.5 and above, most of the problems would be solved. That is exactly what blitz4j does.

Blitz4j overrides key parts of the log4j architecture to remove the locks and replace them with concurrent data structures.

Why not use LogBack?

Configuring Blitz4j

Blitz4j performance

Clone this wiki locally