Rendered at 13:29:06 GMT+0000 (Coordinated Universal Time) with Netlify.
gignico 2 days ago [-]
> Under the hood, a virtual table (vtable) is created for each class, and a pointer (vptr) to the vtable is added to each instance.
Coming from C++ I assumed this was the only way but Rust has an interesting approach where the single objects do not pay any cost because virtual dispatch is handled by fat pointers. So you carry around the `vptr` in fat pointers (`&dyn MyTrait`) only when needed, not in every instance.
cataphract 2 days ago [-]
There have been type-erasure libraries in c++ for a longish time that allow choosing inline vtables and inline storage. It's definitely been a widely talked about technique for at least 10 years (I see talks about Dyno from 2017).
menaerus 1 days ago [-]
> only when needed
Do you know how is this exactly deduced?
dminik 1 days ago [-]
It's not. The user has to decide.
A specific type/reference to a type will always use static dispatch.
fn foo(bar: &Baz) {
bar.thing();
}
A dyn trait reference will always use dynamic dispatch and carry around the vtable pointer.
fn foo(bar: &dyn BazTrait) {
bar.thing();
}
menaerus 1 days ago [-]
Ah, I see. Do I understand correctly that this means that for a given instance of polymorphic object I can switch between static polymorphism and dynamic dispatch, and use them both simultaneously? How is this useful in practical terms, like why would I want to do it?
dminik 23 hours ago [-]
Sort of. Given an instance (can even be a primitive) you can obtain a dyn reference to a trait it implements simply by casting it.
let a: i32 = 12
let b = &a as &dyn std::string::ToString; // i32 implements the ToString trait
let c = a.to_string(); // Static dispatch
let d = b.to_string(); // Dynamic dispatch through dyn reference
Note that there's not really any polymorphic objects in rust. All polymorphism in this case goes through the dyn reference which contains a pointer to a vtable for a specific trait.
Additionally, going from a dyn reference to a type-specific reference is not easy. Also, certain methods and traits are not dyn-compatible, mostly due to generic parameters.
The main use comes in with various libraries. Doing dynamic dispatch on a specific type is not very useful, but your library might expose a trait which you then call some methods on. If you accept a generic parameter (eg. impl Trait) each such invocation will cause monomorphization (the function body is compiled separately for each generic type combination). This can obviously bloat compile times.
Using a dyn reference in your API will result in only a single version being compiled. The downside is the inability to inline or optimize based on the type.
One additional use I found is that you can sometimes get around the divergent expression type in match expressions. Say you need to print out some values of different types:
let value: &dyn Display = match foo {
A(numeric_id) => &numeric_id,
B(string_name) => &string_name,
C => &"static str",
};
This would not work without dyn as each value has a different type.
menaerus 3 hours ago [-]
Ah, I see. Thanks for the example. I think I understand now. In C++ problem of monomorphization, or potential bloat due to excessive template instantiations, is normally handled at the linker level but can be controlled at the code level too through either by rewriting the code with some other type-erasure technique or simply by extracting bits of not-quite-generic-code into smaller non-generic entities (usually a non-templated base class).
Does this mean that the Rust frontend is spitting out the intermediate representation such that it doesn't allow the de-duplication at the linking phase? I see that Rust now has its own linker as of Sep 25' but it still works with normal linkers that are used in C and C++ too - gnu ld, lld, mold, ...
AnimalMuppet 1 days ago [-]
I think the question is, do you know at compile time what the concrete type is? In situations where you do, use static. (I'm not sure I'd call that "polymorphism". If you know the static type it's just a function on a type, and who cares that other types have functions with the same name?) But if you don't know the concrete type at compile time, then you must use dynamic dispatch.
And you can use each approach with the same type at different points in the code - even for the same function. It just depends on you local knowledge of the concrete type.
menaerus 3 hours ago [-]
That's polymorhpism 101, and not quite what I was asking. From my understanding what Rust has is something different to what C++ is offering. In C++ you either opt-in for static- or dynamic-dispatch. In Rust it seems that you can mix both for the same type object and convert between the two during runtime. It seems that this is true according to the example from dminik from the comment above but the actual purpose is still not quite evident to me. It seems that it tries to solve the problem of excessive template instantiations, called as I can see monomorphizations in Rust. In C++ this is normally and mostly done through the linker optimizations which may suggest that Rust doesn't have them yet implemented or that there are more useful cases for it.
dalvrosa 19 hours ago [-]
+1
dalvrosa 2 days ago [-]
Good point, thanks for sharing!
anon291 2 days ago [-]
This is the standard type class approach. Haskell does the same thing.
drmikeando 1 days ago [-]
While this is a great article, I feels it buries the lede.
For me, the key insight was from the last paragraph of the article:
C++23 introduces "deducing this", which is a way to avoid the performance
cost of dynamic dispatch without needing to use tricks like CRRT, by writing:
class Base {
public:
auto foo(this auto&& self) -> int { return 77 + self.bar(); }
};
class Derived : public Base {
public:
auto bar() -> int { return 88; }
};
I wish the article had gone into more details on how this works and when you can use it, and what its limitations are.
dalvrosa 1 days ago [-]
Thanks for the feedback, I'll consider expanding in a separate post
hinkley 2 days ago [-]
I wonder if I still have the link.
One of the papers I had bookmarked when toying with my own language design was someone that had worked out how to make interfaces as fast or faster than vtables by using perfect hashing and using the vtable as a hash table instead of a list.
You can also, when inlining a polymorphic call, put a conditional block in that bounces back to full dispatch if the call occasionally doesn’t match the common case. The problem with polymorphic inlining though is that it quickly resembles the exact sort of code we delete and replace with polymorphic dispatch:
if (typeof arg1 == “string”) {
} else if typeof arg1 === …) {
} else if {
} else if {
} else {
}
1718627440 1 days ago [-]
As someone who's favorite language is C, I don't see what is wrong with that code? Sure, you need to extend it with a new subtype, but you also need to implement every virtual function anyway. And if you use switch instead of an if-else-chain the compiler will complain when you are missing a subtype.
AnimalMuppet 1 days ago [-]
What's wrong with it is, when I extend with a new subtype, I have to fix up the locations that use the type. Potentially all of the locations that use it - I at least have to look at all of them.
With the polymorphic approach, I just have to create the new subtype, and all the users can do the right thing (if they were written with polymorphism in mind, anyway - if they use virtual functions on the base class).
1718627440 1 days ago [-]
Why would I change the users at all instead of just modifying the dispatch method in the super type?
AnimalMuppet 22 hours ago [-]
I see. Yes, you can do it that way.
Still... doing it the C++ way, I can just declare the sub type as deriving from the super type, and I don't have to fix up the super type.
hinkley 20 hours ago [-]
That’s the OO way. Of which C++ is an instance.
dalvrosa 2 days ago [-]
Nice one, TIL
One caveat with "hash vtables" is that you only really see a performance win when the interface has a lot of specializations.
hinkley 2 days ago [-]
As I just mentioned in another reply, the problem they were trying to solve was hierarchies where it makes sense for a group of types to be constructed by the combination of two or three narrowly scoped interfaces.
For instance, if you treat some collections as read only, you can define comprehensions across them with a single implementation. But that means the mutators have to be contained in another type, which a subset will implement, and may have covariant inputs.
anitil 2 days ago [-]
I've been thinking through what features I'd want in a language if I were designing one myself, and one of my desires is to have exhaustive matches on enums (which could be made of any primitive type) and sum types. The ability to generate perfect hashes at compile time was one of the things that falls out nicely from that
akoboldfrying 2 days ago [-]
> using the vtable as a hash table instead of a list.
Could you explain this a bit more? The word "list" makes me think you might be thinking that virtual method lookup iterates over each element of the vtable, doing comparisons until it finds a match -- but I'm certain that this is not how virtual method invocation works in C++. The vtable is constructed at compile time and is already the simplest possible "perfect hashtable": a short, dense array with each virtual method mapping to a function pointer at a statically known index.
hinkley 2 days ago [-]
The problem they were trying to solve was multiple inheritance, and by nominal type not by code reuse. So interfaces, basically.
So these guys essentially assigned a hashcode to every function of every interface and then you would do dispatch instead of obj.vtable[12] you would do modular math x = singature.hash % len(obj.vtable) and call that.
I believe this was sometime around 2005-2008 and they found that it was fast enough on hardware of that era to be usable.
akoboldfrying 1 days ago [-]
Thanks, I think I get it now. The hash value would be a pure function of the method's signature (argument types and return type) and its name, so that two interfaces with a same-name, same-signature method would hash to the same value and thus invoke the same underlying method; the constraints would be that, after modulo, different methods must map to different indices; and the objective function to minimise would be the vtable size (which I think would be common across all classes).
But maybe I don't get it, since this would require knowledge of all interfaces, and as soon as you require that, it's straightforward to build a minimal-size mapping from method name+signature to integer index: e.g., just form the union of all method declarations appearing in any interface, sort them lexicographically, and use a method's position in this sorted list as its index. Lookups in this map are only ever done at compile time so there's no runtime inefficiency to worry about.
repstosb 17 hours ago [-]
The problem is that C++ stores the vtable inside the object, and the objects over which you're iterating often weren't allocated contiguously. Even when they are, if each object contains lots of other data, the vtables won't necessarily be close to each other. That means that invoking virtual functions inside a loop means a lot of cache misses, and since the data you're fetching will be a branch target, it's often hard to find other useful work to accomplish during the memory delay cycles. However, in a language where you can store a relatively tight array of object IDs (or even use tag bits in the this pointer), now you have a much higher cache hit rate on the indexes to your equally tight dispatch table, which will also have a high hit rate.
It's a fair amount of extra work, but in a hot loop it's sometimes worth it. "You can often solve correctness problems (tricky corner cases) by adding an extra layer of indirection. You can solve any performance problem by removing a layer of indirection."
hinkley 20 hours ago [-]
At the time that article was published in SIGPLAN, the dominant language was Java, which is officially statically, strongly typed.
Although really it isn’t because the JVM is strongly typed but evaluates some things at load or first invocation so it allows some languages that run in the JVM to be a bit tricky. They first generics implementation on the JVM, called Pizza, leveraged this load time concretization to do its thing.
But if you have a language that can resolve the type system at link time then you can do this trick. Alternatively you could switch to cuckoo hashing and if you next module load starts causing collisions, then so be it.
corysama 1 days ago [-]
"list" here does not refer to a "linked list". In more academic circles, a "list" referes to any linear container. Such as a Python List. In practice, C++ vtables are effectively structs containing function pointers.
anon291 2 days ago [-]
That's because that type of code is actually better performing than the dynamic dispatch.
There's absolutely nothing wrong with this code. It's just that it's not as extensible
It's a 'closed world' representation where the code assumes it knows about every possibility. This make extension more difficult
The code itself is extraordinarily good and performant.
pjmlp 2 days ago [-]
Nice overview, it misses other kinds of dispatch though.
With concepts, templates and compile time execution, there is no need for CRTP, and in addition it can cover for better error messages regarding what methods to dispatch to.
dalvrosa 2 days ago [-]
Fair. New C++ standards are providing great tools for compile-time everything
But still CRTP is widely used in low-latency environments :)
Panzerschrek 1 days ago [-]
Since std::variant was introduced I use inheritance and virtual calls much less than before. It's faster, since variant dispatch (via std::visit) is basically a switch statement with all execution paths visible to the compiler and thus inlining is possible. Inheritance and virtual calls are nowadays only necessary in places where it's not possible to statically list all alternatives (where the set of derived classes is open).
dalvrosa 19 hours ago [-]
Yeah for C++17 or above, it's a nicer and more performant alternative in most cases
Archit3ch 23 hours ago [-]
An expressive combination is Static Polymorphism + Multiple Dispatch, which Julia resorts to when it can.
yunnpp 1 days ago [-]
Crazy web design, by the way. Diggin' it very much.
dalvrosa 1 days ago [-]
:)
TimorousBestie 2 days ago [-]
Good article, rare to see simple explanations of intricate C++ ideas.
Coming from C++ I assumed this was the only way but Rust has an interesting approach where the single objects do not pay any cost because virtual dispatch is handled by fat pointers. So you carry around the `vptr` in fat pointers (`&dyn MyTrait`) only when needed, not in every instance.
Do you know how is this exactly deduced?
A specific type/reference to a type will always use static dispatch.
fn foo(bar: &Baz) { bar.thing(); }
A dyn trait reference will always use dynamic dispatch and carry around the vtable pointer.
fn foo(bar: &dyn BazTrait) { bar.thing(); }
let a: i32 = 12
let b = &a as &dyn std::string::ToString; // i32 implements the ToString trait
let c = a.to_string(); // Static dispatch
let d = b.to_string(); // Dynamic dispatch through dyn reference
Note that there's not really any polymorphic objects in rust. All polymorphism in this case goes through the dyn reference which contains a pointer to a vtable for a specific trait.
Additionally, going from a dyn reference to a type-specific reference is not easy. Also, certain methods and traits are not dyn-compatible, mostly due to generic parameters.
The main use comes in with various libraries. Doing dynamic dispatch on a specific type is not very useful, but your library might expose a trait which you then call some methods on. If you accept a generic parameter (eg. impl Trait) each such invocation will cause monomorphization (the function body is compiled separately for each generic type combination). This can obviously bloat compile times.
Using a dyn reference in your API will result in only a single version being compiled. The downside is the inability to inline or optimize based on the type.
One additional use I found is that you can sometimes get around the divergent expression type in match expressions. Say you need to print out some values of different types:
let value: &dyn Display = match foo { A(numeric_id) => &numeric_id, B(string_name) => &string_name, C => &"static str", };
This would not work without dyn as each value has a different type.
Does this mean that the Rust frontend is spitting out the intermediate representation such that it doesn't allow the de-duplication at the linking phase? I see that Rust now has its own linker as of Sep 25' but it still works with normal linkers that are used in C and C++ too - gnu ld, lld, mold, ...
And you can use each approach with the same type at different points in the code - even for the same function. It just depends on you local knowledge of the concrete type.
For me, the key insight was from the last paragraph of the article:
C++23 introduces "deducing this", which is a way to avoid the performance cost of dynamic dispatch without needing to use tricks like CRRT, by writing:
I wish the article had gone into more details on how this works and when you can use it, and what its limitations are.One of the papers I had bookmarked when toying with my own language design was someone that had worked out how to make interfaces as fast or faster than vtables by using perfect hashing and using the vtable as a hash table instead of a list.
You can also, when inlining a polymorphic call, put a conditional block in that bounces back to full dispatch if the call occasionally doesn’t match the common case. The problem with polymorphic inlining though is that it quickly resembles the exact sort of code we delete and replace with polymorphic dispatch:
With the polymorphic approach, I just have to create the new subtype, and all the users can do the right thing (if they were written with polymorphism in mind, anyway - if they use virtual functions on the base class).
Still... doing it the C++ way, I can just declare the sub type as deriving from the super type, and I don't have to fix up the super type.
One caveat with "hash vtables" is that you only really see a performance win when the interface has a lot of specializations.
For instance, if you treat some collections as read only, you can define comprehensions across them with a single implementation. But that means the mutators have to be contained in another type, which a subset will implement, and may have covariant inputs.
Could you explain this a bit more? The word "list" makes me think you might be thinking that virtual method lookup iterates over each element of the vtable, doing comparisons until it finds a match -- but I'm certain that this is not how virtual method invocation works in C++. The vtable is constructed at compile time and is already the simplest possible "perfect hashtable": a short, dense array with each virtual method mapping to a function pointer at a statically known index.
So these guys essentially assigned a hashcode to every function of every interface and then you would do dispatch instead of obj.vtable[12] you would do modular math x = singature.hash % len(obj.vtable) and call that.
I believe this was sometime around 2005-2008 and they found that it was fast enough on hardware of that era to be usable.
But maybe I don't get it, since this would require knowledge of all interfaces, and as soon as you require that, it's straightforward to build a minimal-size mapping from method name+signature to integer index: e.g., just form the union of all method declarations appearing in any interface, sort them lexicographically, and use a method's position in this sorted list as its index. Lookups in this map are only ever done at compile time so there's no runtime inefficiency to worry about.
It's a fair amount of extra work, but in a hot loop it's sometimes worth it. "You can often solve correctness problems (tricky corner cases) by adding an extra layer of indirection. You can solve any performance problem by removing a layer of indirection."
Although really it isn’t because the JVM is strongly typed but evaluates some things at load or first invocation so it allows some languages that run in the JVM to be a bit tricky. They first generics implementation on the JVM, called Pizza, leveraged this load time concretization to do its thing.
But if you have a language that can resolve the type system at link time then you can do this trick. Alternatively you could switch to cuckoo hashing and if you next module load starts causing collisions, then so be it.
There's absolutely nothing wrong with this code. It's just that it's not as extensible
It's a 'closed world' representation where the code assumes it knows about every possibility. This make extension more difficult
The code itself is extraordinarily good and performant.
With concepts, templates and compile time execution, there is no need for CRTP, and in addition it can cover for better error messages regarding what methods to dispatch to.
But still CRTP is widely used in low-latency environments :)