Krystian’s Q3 Update

Krystian's Q3 Update

Oct 31, 2023

My primary focus this quarter was getting MrDocs into a state where it can serve as a drop-in replacement for Doxygen/Docca in Boost.URL. Before diving into that, there are a few smaller things I addressed in other projects:

Boost.StaticString

Added support for platforms lacking wchar_t/wsnprintf

Docca

Added backward compatibility for operator names. Doxygen 1.8.15 and older generate operator names containing a space between operator and the subsequent tokens. This behavior changed in newer versions, meaning that the new names must be converted to the old format to avoid breaking existing references to these functions.
Suppressed generation of private friends. This was necessary because such declarations would “hide” the primary declaration and result in broken links.
Stripped auto-generated links within code blocks due to incorrect rendering.

MrDocs

Switching focus to MrDocs, I implemented many major features:

Dependency extraction

When symbols are referenced by a declaration, dependency extraction controls whether the referenced symbol will be extracted, irrespective of whether it was declared within the project directory. My initial naive implementation would extract such symbols unconditionally, but I later added a more refined mode where dependency extraction only occurs for:

Local classes which are deduced as the return type of an extracted function, and
Base classes of an extracted class These cases are the only ones in which a referenced symbol affects the “interface” of another, hence the term “dependency.” A final mode that disables dependency extraction completely was also added.

Safe names

The “safe name” of a symbol is a prettier but unique name for a symbol that can be used as an alternative to the base16/base64 representation of a SymbolID. These names also have the property of being path/URL safe, as their intended purpose is for use as filenames when generating the output. Broadly, safe names are generated by collecting all symbols with the same name in a given scope, and then appending digits from the base16 representation of the SymbolID until all names are unique. For example, the safe name for void A::f(); will be A-f in the absence of other overloads. If there exists an overload void A::f(int);, then a possible set of safe names could be A-f-0a and A-f-04.

Symbol filtering

Symbol filtering permits the exclusion of symbols matching a pattern from being extracted. Filters are specified as C++ id-expressions, except that wildcards (*) may be used to zero or more occurrences of any character. The primary purpose of filters is to exclude symbols from detail namespaces (e.g., using the pattern *::detail). In addition to excluded patterns, it is also possible to specify included patterns to override matches; these patterns are meaningless unless they match a subset of symbols matched by an excluded pattern. For example, the excluded pattern A::B combined with the included pattern A::B::f* means only the symbols in A::B beginning with f are to be extracted. Internally, filters are converted into a tree that is traversed alongside the AST; this avoids the need to check every pattern each time a new symbol is extracted.

Symbol lookup

Symbol lookup is the mechanism by which the @ref and @copydoc commands are implemented; it performs a simplified version of C++ name lookup for the given id-expression within the set of all extracted symbols. The current implementation is far from complete (e.g., no ambiguity resolution is performed, and the semantics of constructs like inline namespaces, using declarations, using directives, and injected-class-names are not implemented), but it’s sufficient for Boost.URL’s documentation. Lookup is deferred until all symbols have been extracted to support cross-TU references without forward declarations.

Clang

The backbone of MrDocs is the clang compiler, which, given the nature of software, is not without its bugs. Working around them is only feasible to a certain extent, meaning that at some point it becomes necessary to fix them instead of waiting for others to do so. To address this, I have spent considerable time this quarter getting comfortable with hacking clang and familiarizing myself with the process of merging patches into LLVM. Thus far, I have submitted one PR that has been merged which eliminates the ClassScopeFunctionSpecializationDecl AST node in favor of using DependentFunctionTemplateSpecializationInfo to represent dependent class scope explicit specializations of function templates. The primary motivation for this patch was to simplify ASTVisitor::traverse in MrDocs by using the same overload to handle all function declaration nodes. However, this patch also improves diagnostics for the following example, insofar that the lack of a primary template will be diagnosed prior to instantiation:

template<typename>
struct A
{
    template<>
    void f(int);
};

I have also been working on patches for other bugs related to function template specializations, e.g., diagnosing friend function template specializations which are definitions, ensuring that lookup for friend function template specializations considers inline namespaces, diagnosing unexpanded packs in class scope function template specializations, etc.

Another related aspect of explicit function template specializations I have been working on is template argument deduction. The current implementation of template argument deduction for function templates implicitly instantiates a specialization for the deduced arguments, which is undesirable (and non-conforming) when the deduction is done for the purposes of matching an explicit specialization to its primary template. I wrote a proof-of-concept implementation in which this implicit instantiation is eliminated, but I am not planning to pursue these changes until a later date when I have more time available to propose these changes.

Finally, I have been working on some AST memory optimizations, namely for data common to all redeclarations of an entity. This is done by replacing Redeclarable::First (which stores a pointer to the first declaration in a redeclaration chain) with a pointer to a common base Common:

struct Common
{
    decl_type* First;
};

Allocated by calling decl_type::newCommon, which permits decl_type to allocate a Redeclarable::Common derived object to store additional common data. This can, for example, be used by CXXRecordDecl to store a single DefinitionData pointer for all redeclarations, as opposed to storing it in each CXXRecordDecl and propagating it upon allocation. This also eliminates the need for RedeclarableTemplate’s common pointer, as it can be merged into Redeclarable::Common.