SuperElectric

VST3, COM, and Zig

Mon, 08 Jan 2024 1:02:32 GMT

In my previous post, I mentioned that I'd be doing a writeup of some exploratory stuff I've been doing with VST31 and Zig. Before talking about VST3, I want to discuss a different, newer plugin format first, CLAP, which is being developed by the people at Bitwig and u-he. There's a few reasons CLAP is exciting, but the main ones are that it's a completely open format, and that it has a very simple2 ABI that the developer needs to conform to. The simple C-style ABI means that to create a plugin, one must simply fill out a few structs, and expose them in a shared library. For example, the main clap_plugin struct has the following signature:

pub const clap_plugin = extern struct {
    desc: *const clap_plugin_descriptor,
    plugin_data: ?*anyopaque,
    init: *const fn (self: *const struct_clap_plugin) callconv(.C) bool,
    destroy: *const fn (self: *const struct_clap_plugin) callconv(.C) void,
    activate: *const fn (self: *const struct_clap_plugin, sample_rate: f64, min_frame_count: u32, max_frame_count: u32) callconv(.C) bool,
    deactivate: *const fn (self: *const struct_clap_plugin) callconv(.C) void,
    start_processing: *const fn (self: *const struct_clap_plugin) callconv(.C) bool,
    stop_processing: *const fn (self: *const struct_clap_plugin) callconv(.C) void,
    reset: *const fn (self: *const struct_clap_plugin) callconv(.C) void,
    process: *const fn (self: *const struct_clap_plugin, process_data: *const clap_process_t) callconv(.C) clap_process_status,
    get_extension: *const fn (self: *const struct_clap_plugin, id: [*:0]const u8) callconv(.C) ?*const anyopaque,
    on_main_thread: *const fn (self: *const struct_clap_plugin) callconv(.C) void,
};

And the clap_plugin_descriptor type is as follows:

pub const struct_clap_plugin_descriptor = extern struct {
    clap_version: clap_version_t,
    id: [*:0]const u8,
    name: [*:0]const u8,
    vendor: [*:0]const u8,
    url: [*:0]const u8,
    manual_url: [*:0]const u8,
    support_url: [*:0]const u8,
    version: [*:0]const u8,
    description: [*:0]const u8,
    features: [*:null]const ?[*:0]const u8,
};

Of course, as with any standard there is some subtly here, but on the whole it should be immediately obvious to anyone with any sort of audio programming expereince what the majority of fields are doing here. One nice feature of CLAP is that all of the normal features of an audio plugin4 are supported via the standardized "extensions" interface. The key point here is that implementing the standard is straightforward in basically any language, because most languages support exporting a C ABI compatible shared library. Additionally, the documentation is included in the C header file, making it easy to figure out what all of the functions and struct fields are meant to do5. All of these niceties should foreshadow the horror that is the main subject of this post.

Enter VST3

The VST3 standard was released by Steinberg in 2008, replacing its predecessor, VST2.4. In reality, it took a while for the conversion to take place, as there wasn't a strong incentive for plugin developers to make the switch. However, VST2.4 has mostly been phased out now, with Steinberg's own Cubase and Nuendo DAWs dropping support, and Steinberg no longer providing licenses for the SDK. VST3 is based on the VST Module Architecture standard, which itself is basically just... Microsoft COM6. Steinberg expect developers to be implementing VST3 plugins entirely in C++, saying "Unlike COM there is no support for C or other languages yet - simply because there has been no need for this so far." Obviously in a world where new systems languages like Rust, Zig, Nim, and so on are gaining popularity, and C/C++ is no longer the only game in town regarding GC-free programming, I would disagree with this statement, but here we are. Seemingly in response to newer formats like LV2 and CLAP, Steinberg do now distribute a C-header, but it is lacking any actual documentation.

Instead, I relied heavily on the phenomenal series of articles from 2006, COM in plain C, by Jeff Glatt. Fundamentally, COM is an object oriented interface, with APIs consisting of "interfaces," that under the hood take the form of structs with virtual tables (vtables) to be filled in. Each COM object needs two things 1) A GUID and 2) a vtable.

We will begin with the GUID, which is just a 16-byte array filled with a unique set of bytes. There's plenty of tools to generate these bytes, and while there's technically no way to ensure they're globally unique, the chance of the same pair of bytes being generated by the algorithm is so low, uniqueness is a safe assumption to make. To make the handling of the GUIDs a little easier, I wrote a function that will take the standard string format of a GUID7, and spit out the appropriate 16 bytes8.

pub fn parseGuid(str: []const u8) Guid {
    var last_nibble: ?u8 = null;
    var ret = [_]u8{0} ** 16;
    var idx: usize = 0;
    for (str) |char| {
        var nibble: u8 = 0;
        if ('A' <= char and char <= 'F') {
            nibble = char - 'A' + 10;
        } else if ('a' <= char and char <= 'f') {
            nibble = char - 'a' + 10;
        } else if ('0' <= char and char <= '9') {
            nibble = char - '0';
        } else {
            continue;
        }

        if (last_nibble) |last_val| {
            ret[idx] = last_val * 16 + nibble;
            idx += 1;
            last_nibble = null;
        } else {
            last_nibble = nibble;
        }
    }

    return ret;
}

The vtable contains pointers to all of the methods of a given COM objects. All COM objects inherit from the base IUnknown interface9, which has the following methods:

fn queryInterface(self: *anyopaque, iid: [*]const u8, obj: *?*anyopaque) callconv(.C) i32;
fn addRef(self: *anyopaque) callconv(.C) u32;
fn release(self: *anyopaque) callconv(.C) u32;

We will start with addRef and release, as they are pretty straightforward. Combined, they implement reference counted memory management, with addRef incrementing the ref counter, and release decrementing it. When the ref count reaches 0, the object should be freed10. queryInterface serves as the core COM/VST-MA, and is the mechanism by which objects are able to implement multiple interfaces. The method takes an interface ID, iid, and checks if the object implements an interface. If it does, it will return a pointer to the implementation of that interface via obj. Otherwise, obj will be null, and the function will return with a status code reflecting this. Let's look at an example.

Say we have some Object, and we want it to implement two interfaces IAdd, and ISub. We will say IAdd has a GUID of 11111111-1111-1111-1111-111111111111, and ISub has a GUID of 22222222-2222-2222-2222-222222222222. IAdd will have the following vtable:

const IAdd_Vtbl = extern struct {
    fn queryInterface(self: *anyopaque, iid: [*]const u8, obj: *?*anyopaque) callconv(.C) i32;
    fn addRef(self: *anyopaque) callconv(.C) u32;
    fn release(self: *anyopaque) callconv(.C) u32;
    fn add(self: *anyopaque, a: u32, b: u32) callconv(.C) u32;
}

And ISub's vtable will be:

const ISub_Vtbl = extern struct {
    fn queryInterface(self: *anyopaque, iid: [*]const u8, obj: *?*anyopaque) callconv(.C) i32;
    fn addRef(self: *anyopaque) callconv(.C) u32;
    fn release(self: *anyopaque) callconv(.C) u32;
    fn sub(self: *anyopaque, a: u32, b: u32) callconv(.C) u32;
}

Remember, since every COM interface inherits from IUnknown, all of the vtables must start with queryInterface, addRef, and release. The actual implementations of IAdd and ISub will have the following types:

const IAdd = extern struct {
    lpVtbl: *const IAdd_Vtbl;

    const iid = parseGuid("11111111-1111-1111-1111-111111111111");
}

const ISub = extern struct {
    lpVtbl: *const ISub_Vtbl;
    const iid = parseGuid("22222222-2222-2222-2222-222222222222");
}

They could also have member variables, but I won't include any for this example. Finally, we can implement Object

const Object {
    i_add: IAdd;
    i_sub: ISub;

    fn IAdd_queryInterface(self: *anyopaque, iid: [*]const u8, obj: *?*anyopaque) callconv(.C) i32 {
        if (std.mem.eql(iid[0..16], &IAdd.iid)) {
            obj.* = self;
            return 0;
        }
        if (std.mem.eql(iid[0..16], &ISub.iid)) {
            obj.* = @ptrCast(@intFromPtr(self) + @offsetOf(Object, "i_sub"));
            return 0;
        }

        return 1;
    }

    fn IAdd_add(self: *anyopaque, a: u32, b: u32) u32 {
        return a + b;
    }

    fn ISub_queryInterface(self: *anyopaque, iid: [*]const u8, obj: *?*anyopaque) callconv(.C) i32 {
        if (std.mem.eql(iid[0..16], &ISub.iid)) {
            obj.* = self;
            return 0;
        }
        if (std.mem.eql(iid[0..16], &IAdd.iid)) {
            obj.* = @ptrCast(@intFromPtr(self) - @offsetOf(Object, "i_sub"));
            return 0;
        }

        return 1;
    }

    fn ISub_sub(self: *anyopaque, a: u32, b: u32) u32 {
        return a - b;
    }

    fn addRef(self: *anyopaque) callconv(.C) u32 {
        return 0; // STUB!
    }

    fn release(self: *anyopaque) callconv(.C) u32 {
        return 0; // STUB!
    }

    const IAdd_vtbl = IAdd_Vtbl {
        .queryInterface = IAdd_queryInterface,
        .addRef = addRef,
        .release = release,
        .add = IAdd_add,
    }

    const ISub_vtbl = ISub_Vtbl {
        .queryInterface = ISub_queryInterface,
        .addRef = addRef,
        .release = release,
        .add = IAdd_sub,
    }

    fn create() Object {
        return Object {
            .i_add = IAdd {
                .lpVtbl = &IAdd_vtbl,
            },
            .i_sub = ISub {
                .lpVtbl = &ISub_vtbl,
            }
        }
    }
}

Whew. That's quite a bit to just get some toy interfaces implemented. The central idea here is that each of the interfaces must be able to access the other interfaces the object may implement. To get these, a bit of pointer arithmetic needs to be done. Of course, in C++ all of the interface inheritance mechanisms and vtable creation are hidden from the user, but in any other language, it's a real pain. To make things a little nicer, in my codebase I've used some of Zig's very cool metaprogramming to make things a bit easier. For example, I provide this implementation of FUnknown:

fn FUnknown(comptime self_offset: usize, comptime interfaces: []const Interface) type {
    return struct {
        pub const vtbl = c.Steinberg_FUnknownVtbl{
            .queryInterface = queryInterface,
            .addRef = addRef,
            .release = release,
        };

        fn queryInterface(self: *anyopaque, iid: [*]const u8, obj: *?*anyopaque) callconv(.C) c.Steinberg_tresult {
            for (interfaces) |interface| {
                if (std.mem.eql(u8, iid[0..16], &interface.cid)) {
                    const interface_ptr: *c.Steinberg_FUnknown =
                        @ptrFromInt(@intFromPtr(self) - self_offset + interface.ptr_offset);
                    _ = interface_ptr.lpVtbl.addRef(@ptrCast(interface_ptr));
                    obj.* = @ptrCast(interface_ptr);
                    return c.Steinberg_kResultOk;
                }
            }

            return c.Steinberg_kResultFalse;
        }

        // TODO(oliver): Handle ref counting properly
        fn addRef(self: *anyopaque) callconv(.C) c.Steinberg_uint32 {
            _ = self;
            return 1;
        }

        // TODO(oliver): Handle ref counting properly
        fn release(self: *anyopaque) callconv(.C) c.Steinberg_uint32 {
            _ = self;
            return 1;
        }
    };
}

So, for any object to implement the FUnknown interface, it only has to pass a list of interfaces (which contain their offset from the the base interface along with their GUID), and its relative offset within the object. This makes implementing new objects quite a bit easier, and one can see how this would be expanded for other interfaces. Hopefully you have a bit of a better understanding of VST-MA, and how you might go about implementing it in a language other than C++.

VST3 Architecture

Now that we've got the drudgery of COM/VST-MA out of the way11, we can talk a bit about the actual structure of a VST3 plugin. I'm not going to go too in depth with this, because Steinberg's documentation isn't as thin here. The entry point for a VST3 plugin is the getPluginFactory function. This returns a COM object implementing the IPluginFactory interface:

const Steinberg_IPluginFactoryVtbl = extern struct {
    queryInterface: *const fn (*anyopaque, [*]const u8, *?*anyopaque) callconv(.C) Steinberg_tresult,
    addRef: *const fn (*anyopaque) callconv(.C) Steinberg_uint32,
    release: *const fn (*anyopaque) callconv(.C) Steinberg_uint32,
    getFactoryInfo: *const fn (*Steinberg_IPluginFactory, *struct_Steinberg_PFactoryInfo) callconv(.C) Steinberg_tresult,
    countClasses: *const fn (*Steinberg_IPluginFactory) callconv(.C) Steinberg_int32,
    getClassInfo: *const fn (*Steinberg_IPluginFactory, Steinberg_int32, *struct_Steinberg_PClassInfo) callconv(.C) Steinberg_tresult,
    createInstance: *const fn (*Steinberg_IPluginFactory, Steinberg_FIDString, Steinberg_FIDString, *?*anyopaque) callconv(.C) Steinberg_tresult,
}

The countClasses and getClassInfo allow the host to query for various objects in the plugin, and createInstance allows the host to create instances of said objects. The two central objects in a VST3 are the "Edit Controller" and the "Processor"12. The Edit Controller, which implements the IEditController interface, manages the state of all of the parameters of the plugin, as well as creating the plugin window. The Processor, which implements IComponent and IAudioProcessor, does the actual DSP for the plugin, and its process function is where the main audio processing work takes place. The specifics of these objects are well documented, so I'm going to end the whistle-stop tour of VST3 here.

Zig

So, now that I've gone through all this hassle, it's worth explicating why I felt it was a good idea to this in Zig, as opposed to using some framework like JUCE, or just in C++ using the VST3 SDK. For one, as an exercise, I find it significantly more satisfying to use a language like C or Zig, as it removes the usual layers of abstraction going on via complex C++ templates and macros and the like, and reveals what's actually going on. Secondly, Zig is just much more fun to write than any other language I've use, and after using the language for a few months, it's rare I find myself pondering what the "right" way to do something in the language is. This is in large part due to the way metaprogramming is handled. In C/C++, it can take quite a while to get a deep understanding of what a given codebase's macros are doing, and this goes doubly so for Rust's macros. In Zig, the metaprogramming follows the same rules as the rest of the language, and so there's no need to understand arcane template/trait/macro systems.

Finally, Zig places a lot of importance on C-interoperability. Being able to easily create structs and functions with a C ABI, while still getting to use more modern features, makes programming much more fluid. Rust is great for safety-critical applications, but not having to wrap everything that touches C in an unsafe block is a really nice luxury. Of course, Zig is not without its downsides. The documentation, especially for the standard library, can be pretty sparse. There are also times when programs will just segfault, even if they're compiled in Debug mode. However, I'm hopeful that Zig will continue to be a viable language for me to write plugins in, given how good the experience I've had with it so far has been. Til next time!

- OJF

Notes

1. An audio plugin interface developed by Steinberg, that by an accident of history is very popular, and widely supported.

2. You will learn that this is not the standard when it comes to plugin formats

3. I am assuming some familiarity with the Zig language here, but if you're familiar with any C-style language you're about 80% of the way there already. The main thing to point out here is that Zig is a bit more expressive with its pointer types. The standard * indicates a pointer to a single item, [*] is a pointer multiple items, and [*:0] is a multi-item pointer, with a sentinel value of 0a. Nullability is expressed via the ? prefix. Of course, in the C ABI these are all just bog standard C-pointers, with no safety checks, but Zig allows us to have the best of both worlds. Additionally, the extern and callconv(.C) ensure that the structs have the correct C ABI, and the correct calling convention, respectively.

a. That is, the array is terminated with a null or 0 value. This generalizes C-style null terminated strings really nicely.

4. Things like parameters, I/O ports, MIDI, etc.

5. That being said, Zig does not currently support exporting comments via its translate-c facilities, which is a bit frustrating.

6. Toccata and Fugue in D minor should sound in your head every time you hear these words.

7. XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

8. Conveniently, this conversion can be done at compile time in Zig, since the string is hardcoded in.

9. Referred to as FUnknown in VST-MA, but it's the same thing.

10. During development, I've just left these blank, and all of my objects have a lifetime of the entire program. In production however, this needs to be managed properly

11. For the sadists among you there's plenty more to learn :)

12. These can be the same object, but we will treat them as distinct.