Krystian's Q3 Update
Oct 31, 2023My primary focus this quarter was getting MrDocs into a state where it can serve as a drop-in replacement for Doxygen/Docca in Boost.URL. Before diving into that, there are a few smaller things I addressed in other projects:
Boost.StaticString
- Added support for platforms lacking
wchar_t
/wsnprintf
Docca
- Added backward compatibility for operator names. Doxygen 1.8.15 and older generate operator names containing a space between
operator
and the subsequent tokens. This behavior changed in newer versions, meaning that the new names must be converted to the old format to avoid breaking existing references to these functions. - Suppressed generation of private friends. This was necessary because such declarations would “hide” the primary declaration and result in broken links.
- Stripped auto-generated links within code blocks due to incorrect rendering.
MrDocs
Switching focus to MrDocs, I implemented many major features:
Dependency extraction
When symbols are referenced by a declaration, dependency extraction controls whether the referenced symbol will be extracted, irrespective of whether it was declared within the project directory. My initial naive implementation would extract such symbols unconditionally, but I later added a more refined mode where dependency extraction only occurs for:
- Local classes which are deduced as the return type of an extracted function, and
- Base classes of an extracted class These cases are the only ones in which a referenced symbol affects the “interface” of another, hence the term “dependency.” A final mode that disables dependency extraction completely was also added.
Safe names
The “safe name” of a symbol is a prettier but unique name for a symbol that can be used as an alternative to the base16/base64 representation of a SymbolID
. These names also have the property of being path/URL safe, as their intended purpose is for use as filenames when generating the output.
Broadly, safe names are generated by collecting all symbols with the same name in a given scope, and then appending digits from the base16 representation of the SymbolID
until all names are unique. For example, the safe name for void A::f();
will be A-f
in the absence of other overloads. If there exists an overload void A::f(int);
, then a possible set of safe names could be A-f-0a
and A-f-04
.
Symbol filtering
Symbol filtering permits the exclusion of symbols matching a pattern from being extracted. Filters are specified as C++ id-expressions, except that wildcards (*
) may be used to zero or more occurrences of any character. The primary purpose of filters is to exclude symbols from detail namespaces (e.g., using the pattern *::detail
). In addition to excluded patterns, it is also possible to specify included patterns to override matches; these patterns are meaningless unless they match a subset of symbols matched by an excluded pattern. For example, the excluded pattern A::B
combined with the included pattern A::B::f*
means only the symbols in A::B
beginning with f
are to be extracted. Internally, filters are converted into a tree that is traversed alongside the AST; this avoids the need to check every pattern each time a new symbol is extracted.
Symbol lookup
Symbol lookup is the mechanism by which the @ref
and @copydoc
commands are implemented; it performs a simplified version of C++ name lookup for the given id-expression within the set of all extracted symbols. The current implementation is far from complete (e.g., no ambiguity resolution is performed, and the semantics of constructs like inline namespaces, using declarations, using directives, and injected-class-names are not implemented), but it’s sufficient for Boost.URL’s documentation. Lookup is deferred until all symbols have been extracted to support cross-TU references without forward declarations.
Clang
The backbone of MrDocs is the clang compiler, which, given the nature of software, is not without its bugs. Working around them is only feasible to a certain extent, meaning that at some point it becomes necessary to fix them instead of waiting for others to do so. To address this, I have spent considerable time this quarter getting comfortable with hacking clang and familiarizing myself with the process of merging patches into LLVM. Thus far, I have submitted one PR that has been merged which eliminates the ClassScopeFunctionSpecializationDecl
AST node in favor of using DependentFunctionTemplateSpecializationInfo
to represent dependent class scope explicit specializations of function templates. The primary motivation for this patch was to simplify ASTVisitor::traverse
in MrDocs by using the same overload to handle all function declaration nodes. However, this patch also improves diagnostics for the following example, insofar that the lack of a primary template will be diagnosed prior to instantiation:
template<typename>
struct A
{
template<>
void f(int);
};
I have also been working on patches for other bugs related to function template specializations, e.g., diagnosing friend function template specializations which are definitions, ensuring that lookup for friend function template specializations considers inline namespaces, diagnosing unexpanded packs in class scope function template specializations, etc.
Another related aspect of explicit function template specializations I have been working on is template argument deduction. The current implementation of template argument deduction for function templates implicitly instantiates a specialization for the deduced arguments, which is undesirable (and non-conforming) when the deduction is done for the purposes of matching an explicit specialization to its primary template. I wrote a proof-of-concept implementation in which this implicit instantiation is eliminated, but I am not planning to pursue these changes until a later date when I have more time available to propose these changes.
Finally, I have been working on some AST memory optimizations, namely for data common to all redeclarations of an entity. This is done by replacing Redeclarable::First
(which stores a pointer to the first declaration in a redeclaration chain) with a pointer to a common base Common
:
struct Common
{
decl_type* First;
};
Allocated by calling decl_type::newCommon
, which permits decl_type
to allocate a Redeclarable::Common
derived object to store additional common data. This can, for example, be used by CXXRecordDecl
to store a single DefinitionData
pointer for all redeclarations, as opposed to storing it in each CXXRecordDecl
and propagating it upon allocation. This also eliminates the need for RedeclarableTemplate
’s common pointer, as it can be merged into Redeclarable::Common
.
All Posts by This Author
- 01/12/2024 Krystian's Q4 Update
- 10/31/2023 Krystian's Q3 Update
- 09/29/2020 Krystian's September Update
- 09/06/2020 Krystian's August Update
- 08/01/2020 Krystian's July Update
- 07/01/2020 Krystian's May & June Update
- 05/08/2020 Krystian's April Update
- 04/07/2020 Krystian's March Update
- 03/06/2020 Krystian's February Update
- View All Posts...