Crafting a versatile generic function using concepts and unexpected pitfalls of implicit ranges
Context
During the course of my work, I was brought to develop a feature where the library user should be able upload a file or folder to a foreign storage. This post goes over my train of thought from a barebone function to a richer interface, covering some common use-cases. For the purpose of this post, upload will be reworded as print, where the argument is output to the console. A minor change, but relatively similar interface. The default prototype looks like this:
|
|
The type is passed as a const reference because it is a heavy stateful object (until path_view gets in) and I just want my function to bind the argument to either a L-value or R-value reference, without having to duplicate the function, or use template for the default case. Moreover, this choice enables to pass any object convertible to a path, including string.
|
|
Later, I wanted to enrich my API by enabling users to upload a sequence of paths, for example
|
|
Although,
- Not just
std::vector
, but any container - Not just
std::filesystem::path
, but anything that can be converted to it
One could go with the fully unconstrained version:
|
|
But it is strongly discouraged. This form is easy to misuse with invalid inputs, for which compilers would give terrible error messages. One bakes in some assumptions which are not explicitly stated in the function declaration, and likely requires extensive additional documentation to specify its terms of use.
Furthermore, it is always better first to constrain as much as possible, and release as per need, to avoid user reliance on something that would need to be limited afterwards.
The constraint…
Every container in the standard library (and followed by everyone in general) provides a dependent type name indicating the type of its elements: value_type. It is a vital piece for the development of generic algorithms. We want our new function to be usable by any container (which follows the standard recommendations). Thus, here is a possible constraint on our template parameter:
|
|
However, T
can be deduced to a reference and the like, which do not allow for evaluating dependent name.
❌ error: 'std::vector<std::filesystem::__cxx11::path>&' is not a class, struct, or union type
That’s why it is often necessary to peel the template parameters with the help of some type traits, e.g. std::decay_t
or std::remove_cvref_t
|
|
…using SFINAE
Substitution failure is not an error (SFINAE) is was one of the generic technique commonly used to constrain an overload set. Tag dispatch is another main alternative, often more explicit but less concise (and adds clutter); it will be skipped for the sake of brevity. SFINAE is already widely described online so I will not linger on it12.
|
|
The position of the constraint can vary, but I usually prefer to keep it within the template argument list. Note that alternative versions:
|
|
are all valid. Although SFINAE.B has the subtle disadvantage, but a major one, to prevent several definition containing enable_if
because default type template parameters do not get mangled into the final function signature generated by the compiler. The compiler would then generate an error for the following examples:
|
|
1
❌ error: redefinition of 'template<class T, class> void print(T&&)'
…using constant expression
SFINAE (and tag dispatch) were the main (and only?) way to handle such scenario until the introduction of if constexpr
in C++17 (and concepts in C++20). This new option allows the compiler to select a conditional branch at compile-time.
|
|
It is definitely more readable than SFINAE since it reads like the good ol’ code (if/else) we are used to read. The issue with this technique for me, although my opinion might change with further experience with it, is:
- The signature is unhelpful to the reader, because catch-all template, unless specified in the comments or going through the definition
- The definition becomes similar to a big
switch
whose size can become unwieldy and I prefer the clarity of multiple declarations - Wrong branch selection might be harder to detect or too easy to occur
…using concepts
A new big thing in C++20 is the concepts, which is a clean and efficient way to impose constraints or describe interfaces when diving into generic programming. The constraints can be imposed either by defining concepts or by introducing ad-hoc constraints with the keyword requires
. Note that the type trait is_same
/is_same_v
is symmetric to the concept same_as
.
|
|
The satisfaction of a job done
Putting everything together gives a brief, yet powerful, interface.
|
|
Output
Path "foo/bar" String "foo/bar" Vector "foo/bar" "bar/foo" Set "bar/foo" "foo/bar"
Yey ! Job done, let’s go home… or?
All is ephemeral
It came up quickly — ahem, during the first reality check — that several major use-cases were missing.
- Using the function with container of string-like objects
1 2
std::vector<std::string> paths; print(paths);
- Using the function with
std::ranges
, for example, by prepending a path1 2 3
std::vector<std::filesystem::path> paths; auto prepend_with = [](auto&& x) {return [&x](auto&&p){return x/p;};}; print(paths | std::ranges::views::transform(prepend_with("blop")));
- Using the function with brace-enclosed initializer lists of
std::filesystem::path
or string-like values1 2
print({path1, path2}); print({"foo/bar","bar/foo"});
Expanding the interface
Resolving the first issue is relatively simple. In essence, we need to accept any object that can be converted to a std::filesystem::path
. Fortunately, the C++ standard provides several core concepts, which includes one with the exact name corresponding to our interpretation std::convertible_to
.
|
|
📝 Note that the concept
std::constructible_from
is also applicable, and which might be more favorable in some (corner?) cases. More on this topic: Is constructible? and std::convertible_to failing to recognize explicitly convertible types
Dealing with ranges elegantly resolves the case for containers too, because a range generalize iteration of containers, they just need to provide an iterator pair begin
and end
.
|
|
However, it becomes again under-constrained because the type of an element of the range is unspecified. The container-specific dependent type value_type
cannot be reused as such because the std::ranges
library defines its own set of “dependent” types.
|
|
Initializer lists, more commonly faced as anonymous brace-enclosed initializer lists, require a separate codepath. Due to their anonymous nature, they do not behave similarly as a concrete container and cannot be deduced directly through a templated function.
|
|
Brace-enclosed initializer lists are abstract entities that try to conform to the target type, which does not exist when using template arguments because the target type can be anything. Hence the need for a specific overload of the function.
|
|
I will not go into the topic of initializer_list
nor concept subsumption or overload resolution in this post for the sake of keeping this piece as a blog post and not turning it into a book 😄
Let’s roll !
Running the following example on Compiler Explorer
|
|
Output
. foo bar
Huh?! Where is the nice printed one-liner ./foo/bar
😠
Tell me why 🎵
Instead of the desired output
./foo/bar
We got our path sliced like a salami
. foo bar
Sometimes programming is like medicine, let us proceed with the diagnosis.
Symptom: template overload priority
Consider the following function call:
|
|
It involves a temporary std::filesystem::path
, which is a pure rvalue. When the compiler sees the call print(fs::path{"./foo/bar"})
, it chooses the most specialized template that can match the argument, or emit an “ambiguous” compiler error if several candidates match equivalently.
The print(fs::path const& path)
specialization is not an exact match for either a mutable lvalue, nor an rvalue argument. Even if it is a function overload rather than a template specialization, there is no priority for such within the selection process.
The template version, even if constrained, can accept both lvalues and rvalues due to its parameter parameter being a forwarding reference (R&&
). In our example, since our input is an rvalue, the compiler considers this version to be a better match for the rvalue argument and pick it instead of the concrete overload.
To resolve this issue and ensure that the right version is called, one can provide an overload specifically for rvalues:
|
|
With this overload, the compiler will now prefer the print(fs::path&& path)
version when an rvalue is passed.
Note that a similar symptom would have been triggered if we were to use a forwarding reference within the range-for loop of the constrained template function.
|
|
Hereby requiring us to slap the final overload for mutable lvalue print(fs::path& path)
.
Unfortunately, it introduces quite some boilerplate, obfuscating the root cause of the problem we face.
Cause: path
satisfies the concept range
itself.
Concepts in C++20 are structural. Any concept created outside the developer’s control can apply to any type as long as the apparent constraints are fulfilled, whether or not they want it or mean it semantically.
📝 Even though C++ Core Guidelines recommend to have concepts modeling semantic notions rather than syntactic one, it is effectively hard to avoid.
The intent of concepts is to model semantic categories (Number, Range, RegularFunction) rather than syntactic restrictions (HasPlus, Array). According to ISO C++ core guideline T.20, “The ability to specify meaningful semantics is a defining characteristic of a true concept, as opposed to a syntactic constraint.” — https://en.cppreference.com/w/cpp/language/constraints
An example of something that looks like a range but is not:
|
|
Compiler
error: static assertion failed 43 | static_assert(std::ranges::range<this_is_not_a_range_and_yet> == false); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ note: the comparison reduces to '(1 == 0)'
😐😐😐
When looking closer at the C++ standard, the range concept is defined in a simple way:
1 2 3 4 5
template< class T > concept range = requires( T& t ) { ranges::begin(t); // equality-preserving for forward iterators ranges::end(t); };
A typical range class only needs to provide two functions:
- A member function begin() whose return type models input_or_output_iterator.
- A member function end() whose return type models sentinel_for
, where It is the return type of begin(). Alternatively, they can be non-member functions, to be found by argument-dependent lookup.
Every type that provides the couple begin()
and end()
member functions with some compatible return type (for example a T*
) will be considered as a range. std::filesystem::path
does provide those functions, which enables to traverse each node of the path, i.e. pieces located between path separator pairs. Therefore std::filesystem::path
represents a range
.
The better treatment, compared to the rvalue overload mentioned above, is to change the constraint itself to restrict further the space of acceptable types, by skipping the overload when the type for range R is convertible to a path:
|
|
Final snippet
With all of this in mind or resolved, we can get to our final snippet 🎉
|
|
Conclusion
All this thought process for producing a few line of code 😔 But see the power in those lines ! Though, I would have wished that std::filesystem::path
was not implicitly a range, as well as std::string
, for example by asking us to opt-in with a as_range()
member or free function.