Extra #includes

Introduction

Consider the following program transformation.

For an #include directive found in a source code file, comment it out, attempt to rebuild the project. If the rebuild fails, return the commented line back; otherwise keep it commented.

Given the complexity of C/C++ languages and their preprocessors (arguably being Turing-complete for practical means), a success of compilation with an excluded header does not necessarily mean that no changes to program’s behavior have been introduced. However, for practical reasons and with the project’s test suite passing, it would be reasonable to consider the excluded header to be superfluous.

Why bother

Too many includes can theoretically slow the compilation down (not the case in practice, especially for pure C). What is more important, they make it harder to reason about the code for humans. An #include is a statement of external dependencies of current translation unit; not telling the truth about dependencies complicates changing of the code.

Application to a software repository

I played with a script based on ideas found in this project [1]. What it did was to iteratively apply the transformation described earlier to all C/C++/H files in the project. I adapted the original script to work with a make-based project. Sorry, no code this time. Each #include directive in each file was commented out, the project was recompiled, and if it built, the commented line was preserved, otherwise it was reverted its uncommented state.

The described approach is of a brute force kind and it is not fast. It is not very parallelizable either, except for the build phase. It took about two days to continuously rebuild the code base. However, the approach is simple and makes few assumptions about the code base structure. There exist other, more sophisticated static analysers that could perform the superfluous #includes analysis in the matter of minutes. However, they incur higher risk for false positives and require more configuration.

Results

Over 250 C/C++/H files of the project (external dependencies excluded) were processed. There, 1200+ #include directives were found. Of them, 651 were commented as not really required.

However, the build time has not changed in a statistically significant manner.

Discussion and Conclusions

There was no noticeable change in build times.

No attempt to reorder headers inclusion is made, meaning that the result of the reduction process is not guaranteed to generate the most “optimal” result in cases where there are dependencies between headers.

Most notably, in many cases the “god includes” (header files that contained many other headers, only for the sake of it) have survived the operation. Instead, many smaller headers that followed them were wiped away.

References

  1. https://github.com/cognitivewaves/misc/tree/master/check-header-includes
  2. https://stackoverflow.com/questions/614794/detecting-superfluous-includes-in-c-c
  3. https://github.com/myint/cppclean
  4. https://stackoverflow.com/questions/74326/how-should-i-detect-unnecessary-include-files-in-a-large-c-project
  5. https://gitlab.com/esr/deheader/-/blob/master/deheader

Written by Grigory Rechistov in Uncategorized on 02.11.2020. Tags: preprocessor, include,


Copyright © 2020 Grigory Rechistov