Continuous Integration: Test Quantity versus Build Duration

Continuous Integration describes many guidelines, but the description is often a bit vague. Therefore, CI practitioners often run into problems, like massive test suites that only slow the development process down or a test suite testing wrong parts of the software. This blog post researches the relation between test quantity and build duration. Finally, some new guidelines are suggested.

Software quality in the world of rapid application development is becoming increasingly important, considering applications are getting more complex and development teams are growing. Hence, it is becoming more difficult to manually integrate all individual software pieces into one potential releasable product. For this reason Continuous Integration (CI) is starting to become a major area of interest within software development. Fowler [1] formalized CI in the past decade as “a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily – leading to multiple integrations per day”. The source code is checked for build errors every iteration and the compiled software is tested in an automated way. Additionally, after all tests succeed, the integrated software parts form a complete package, which can be shipped to the customer. In the end, this automated concept enables a faster and more stable integration of new software updates than traditional software integration methods [1]. Ever since Fowler’s formalization, and together with the rise of Agile Software Development, CI is becoming popular among software developers [3]. Despite the traditional way of developing, programmers, and eventually the whole company, started to experience the advantages of this automated framework. Unfortunately, the CI guidelines can be interpreted in different ways, which incidentally introduces problems, especially, if the development team members are not experienced CI practitioners.For example, Fowler [1] states that a build should be made self-testing. Furthermore, he suggests to perform other tests besides unit testing, for example end-to-end testing. Although this sounds promising for the quality of the product, Fowler [1] declares that these tests will never produce 100% bug-free software. Nevertheless, building several tests increases quality in contrast to no tests, although they also mean a possible threat to the “keep the build fast” guideline [1]. Security, performance, or user interface testing can consume a vast amount of time. This blog post focusses on several statements and best-practices, for instance the lessons learned from case-studies of CI implementations. The main goal of this blog post is to find a preferable test quantity versus build duration balance for use in CI practice. The proposal only covers the technical perspective, which means that the organizational perspective is not present.

Stolberg [3] recommends to “define and execute ‘just enough’ acceptance tests” in his best practices, where the cautious words “just enough” characterize the project-specificness. Therefore, he suggests to discuss the vulnerable parts of new features with the product manager during the development planning. Furthermore, Stolberg [3] acknowledges that scaling automated functional tests can be a difficult task. Thus, it might be a good idea to determine the level of complexity and the impact on the rest of the tests. Next, the usefulness of the test to be implemented should be defined. Thus, it is a good idea to measure commonly used features and sensitive parts of the product. Software analytics and runtime intelligence deliver valuable information on the decision whether to implement the new test or not.

“define and execute ‘just enough’ acceptance tests”

Su et al. [4] agree with the statement that the creation of a comprehensive testing environment can be a difficult and time-consuming activity. Indeed, the testing environment can be more complex than the software itself. Nevertheless, the tests increase development efficiency and supposedly increase the quality of the software. Regarding the “what to test?” question, Su et al. [4] aim at the various types of tests, such as unit, system, security and performance tests. The different test types vary in execution time. Moreover, the suggestion is to test from low level to high level, e.g. from testing methods in-code to testing the user interface. As a result, it may occur that functional testing is not required, due to the high coverage of system tests. Another aspect, regarding the decrease of time of each build, is the test infrastructure. The test suite execution time is strongly dependent on the hardware the software runs on. Therefore, a well-configured test architecture running on high-performance components can reduce the total test time and is probably a worthwhile investment. Additionally, Su et al. [4] propose an idea to run several tests in parallel on different machines.

“test from low level to high level”

Ståhl and Bosch [2] have proposed a descriptive model based on a large collection of articles about CI. An important factor in this model is the separation of developer builds and nightly builds. For instance, a programmer commits changes to a version control system and a developer build is automatically triggered. Next, this build finishes in a few minutes, running only low level tests, resulting in moving this piece to the stable build branch. Later at night, a scheduled nightly build is triggered, high level testing all stable components. Finally, a potentially releasable package is presented the morning after, unless the nightly build failed.

“separate developer builds and nightly builds”

This blog post has compared three different articles with statements and best-practices on implementing continuous integration, focusing on the amount of tests versus the build duration. The aim was to discover a possible balance between test quantity and build duration. Taken together, the results suggest to take multiple elements in account. First, define “just enough” tests and create awareness of what to test and what not. Second, use a low level to high level approach, according to the different types of tests. Next, it is recommended to review your test architecture and, if possible, run tests in parallel. Finally, it is advised to separate developer builds and nightly builds, to assure the continuity of the programmers. Although the findings point towards different components of the problem, there was one statement they all had in common, namely, the difficulty and time-consumption of implementing test suites. This highlights the importance of the research about automated test suites within CI. Hopefully, this blog post prevents programmers to make common mistakes regarding testing while implementing CI and therefore making the development pipeline easier to maintain. A possible future study would be a case-study to find out how to maintain tests as low level as possible.

Bibliography

[1] Fowler, M. 2000. Continuous Integration [online]. Available at: http://martinfowler.com/articles/continuousIntegration.html [Accessed: 25 May 2014].

[2] Ståhl, D., & Bosch, J. 2014. Modeling continuous integration practice differences in industry software development. Journal of Systems and Software 87(1), pp. 48–59.

[3] Stolberg, S. 2009. Enabling Agile Testing through Continuous Integration. 2009 Agile Conference, pp. 369–374.

[4] Su, T., Lyle, J., Atzeni, A., Faily, S., & Virji, H. 2013. Continuous Integration for Web-Based Software Infrastructures: Lessons Learned on the webinos Project. In: Bertacco, V. & Legay, A. eds. Hardware and Software: Verification and Testing. Springer International Publishing, pp. 145–150.