Extra #includes
Introduction
Consider the following program transformation.
For an #include
directive found in a source code file, comment it out,
attempt to rebuild the project. If the rebuild fails, return the commented line
back; otherwise keep it commented.
Given the complexity of C/C++ languages and their preprocessors (arguably being Turing-complete for practical means), a success of compilation with an excluded header does not necessarily mean that no changes to program’s behavior have been introduced. However, for practical reasons and with the project’s test suite passing, it would be reasonable to consider the excluded header to be superfluous.
Why bother
Too many includes can theoretically slow the compilation down (not the case in practice, especially for pure C).
What is more important, they make it harder to reason about the code for humans.
An #include
is a statement of external dependencies of current translation unit;
not telling the truth about dependencies complicates changing of the code.
Application to a software repository
I played with a script based on ideas found in this project [1]. What it did was
to iteratively apply the transformation described earlier to all C/C++/H files in
the project. I adapted the original script to work with a make-based project.
Sorry, no code this time. Each #include
directive in each file was commented out,
the project was recompiled, and if it built, the commented line was preserved, otherwise
it was reverted its uncommented state.
The described approach is of a brute force kind and it is not fast. It is not
very parallelizable either, except for the build phase.
It took about two days to continuously rebuild the code base. However, the approach
is simple and makes few assumptions about the code base structure.
There exist other, more sophisticated static analysers that could perform the
superfluous #includes
analysis in the matter of minutes. However, they incur
higher risk for false positives and require more configuration.
Results
Over 250 C/C++/H files of the project (external dependencies excluded) were processed.
There, 1200+ #include
directives were found. Of them, 651 were commented as not really required.
However, the build time has not changed in a statistically significant manner.
Discussion and Conclusions
There was no noticeable change in build times.
No attempt to reorder headers inclusion is made, meaning that the result of the reduction process is not guaranteed to generate the most “optimal” result in cases where there are dependencies between headers.
Most notably, in many cases the “god includes” (header files that contained many other headers, only for the sake of it) have survived the operation. Instead, many smaller headers that followed them were wiped away.
References
- https://github.com/cognitivewaves/misc/tree/master/check-header-includes
- https://stackoverflow.com/questions/614794/detecting-superfluous-includes-in-c-c
- https://github.com/myint/cppclean
- https://stackoverflow.com/questions/74326/how-should-i-detect-unnecessary-include-files-in-a-large-c-project
- https://gitlab.com/esr/deheader/-/blob/master/deheader