Structured logs in Test Informant

Test Informant is a test monitoring service for Mozilla. It provides a high level view of what’s going on with automated tests.

It currently uses pulse to listen for the completion of build jobs and then downloads the associated file (that can reach 200MB), finds test manifests and parses them. After that it saves the information it found in a Mongo database.

The problem with that approach is that only a subset of tests is compatible with manifestparser, and Test Informant only supports those.

mozlog.structured provides structured logs as JSON files with information about tests that is easily machine-readable, but can also be interpreted by humans. Here is an example. Most of the suites compatible with manifestparser are also compatible with mozlog.structured, and so by using structured logs we get a lot of new suites. Therefore, the goal was to make Test Informant use structured logs instead of parsing manifests.

Listening for test jobs

Structured logs are attached to test jobs, not build jobs. So the first step to make Test-Informant use structured logs would be to make it listen for test jobs instead of build jobs on pulse listener.

The hardest part was making sure my code was working as expected. I had to wait for tests to show up in pulse, and sometimes that took a long time. One thing that helped was listening to ‘mozilla-inbound’ instead of ‘mozilla-central’, because it has much more activity. That way I ended up receiving several “Your pulse queue is full” emails.

Reading structured logs

After step 1 was ready, it was time to actually consume structured logs. Turns out that it’s pretty easy. All I had to do was adapt the example code from ahal’s blog post and that part was done.

Dealing with database issues was harder. I had to worry about race conditions and also deal with chunking (some test suites are split into several chunks that must be added to the same database entry).

Adding more tests

Test-Informant only deals with tests in its configuration file. There were a bunch of new tests that had structured logs but weren’t in I wrote the following script to identify pairs of platforms and suites that were compatible with structured logs:

def print_name(self, data):
     structured_logs = [(fn, url) for fn, url in data['blobber_files'].iteritems() 
                        if fn.endswith('_raw.log')]
     if structured_logs and not self.get_suite_name(data['test'], data['platform']):
         print data['platform'], data['test']

I left my script running for a couple hours, and after that I added the new platforms/suites I found to

What happened

This is a sample report using structured logs. It has a bunch of new tests, and that is actually a problem. The report links to skipped tests, but now not every test lives on, so our previous way of figuring out the URL does not work anymore. My work resulted in four PRs that were merged on the structured_log branch, and we are waiting for follow-up work from other contributors to merge structured_logs into master.

It was a really rewarding project and I’m excited about seeing it being used!

tags : Mozilla