ForgeStream

Trade-offs in designing DSLs


Trade-offs in designing Domain Specific Languages

A Domain-Specific Language (DSL) is a programming abstraction tailored to a specific application domain, enabling developers to express ideas at a higher level than general-purpose programming languages (GPLs). Unlike GPLs, DSLs often focus on specific tasks and may not represent complete programs or even computational logic.

It is very common to use DSLs when programming in Rust, as there are a lot of high-quality DSLs in the ecosystem:

It’s worth noting that in this blog post we’ve chosen to take a very broad view on what constitutes a DSL. Depending on what you think is needed to constitute a “language”, not all of these crates are DSLs.

Some observations about the list above:

One reason for the popularity of DSL crates is that Rust has some very strong tools for building DSLs such as proc-macros and a rich trait system. In this blog post we will discuss the design space of DSLs in Rust: Why would you want to create a DSL, what design questions to ponder before creating your own DSL and finally we will explore some common archetypes of answers one might have to those questions.

Why create your own DSLs?

Here are some real-world examples of DSLs used at IDVerse, showcasing the range of problems they can address:

The design space of DSLs in Rust

Having established the motivations, we’ll now outline the major questions to ask when creating a DSL.

1. Users: Who will use the DSL?

Understanding who will work with the DSL can inform many design choices, especially regarding implementation effort. Sanding down the rough edges can often end up taking considerable amounts of time.

2. Error messages: How critical is good feedback?

Creating good error messages isn’t particularly difficult, but it can be time-consuming.

We recommend using syn::Error::to_compile_error or std::compile_error!() when writing Rust macros.

When creating DSLs that take external input, we recommend the codespan crate to produce error messages with precise file and line information. This crate provides a Span datatype that you can use annotate objects created by your parser. These spans must be propagated all the way to your error printing routine, where you can use the codespan-reporting crate to output pretty error messages.

3. Output: What does the DSL produce?

The kind of output or outputs generated by the DSL can constrain the kinds of DSLs that are suitable.

4. Evaluation time: When should the DSL be evaluated?

The timing of when the DSL is evaluated constrains which kinds of DSLs that are possible.

5. Source of truth: How will changes be managed?

Suppose you have multiple representations of the data encoded in your DSL stored in different formats. These other formats might for example be documentation, language bindings, external schemas or AWS configurations. How do changes in one representation get applied in other representations?

6. Storage: How is the DSL input stored?

The method of storing DSL input will affect how it’s accessed and how the system behaves.

7. Parsing: How is the DSL parsed?

The choice of parsing mechanism can impact the flexibility, performance, and complexity of the DSL.

8. Normalization: Any pre-evaluation validation?

Before evaluating the DSL, you may need to normalize or validate the input to ensure correctness, simplify code generation, or enable faster execution.

9. Project scope: How much effort will you invest?

Finally, consider how much time and resources you’re willing to allocate for the DSL’s design and maintenance.

Archetypical small DSLs in Rust

To illustrate the trade-offs outlined earlier, we analyze three common types of DSLs coming from the IDVerse codebase. These three were chosen, since we believe they are somewhat archetypical for how smaller DSLs might look in other codebases.

macro_rules!() helpers

These lightweight macros provide concise solutions to repetitive tasks. For example, our ddb_item!() macro simplifies creating HashMaps for DynamoDB AttributeValues. It’s used like this:

let item = ddb_item!(
    "id" => S("123"),
    "name" => S("John Doe"),
    "age" => N(24),
);

A simplified implementation can be found at the end of the blog post. If we evaluate it along the parameters above, we get something like this:

Derive proc-macros

Custom derive macros can make code a lot easier to read and perhaps just as importantly serve as a push to standardize how you implement similar operations at your company. Our example here is a #[derive(AppEnv)] macro. Using it looks something like this:

#[derive(Debug, AppEnv)]
pub struct AppEnv {
    #[env("MESSAGING_TABLE_NAME")]
    pub messaging_table_name: String,

    #[env("MESSAGING_TABLE_GSI1_INDEX_NAME")]
    pub messaging_table_gsi1: String,

    #[env("SQS_QUEUE_URL")]
    pub sqs_queue_url: String,

    #[env("LAMBDA_FUNCTION_ARN")]
    pub lambda_function_arn: String,

    #[env("JWT_SECRET")]
    #[convert(Call?(decode_jwt))]
    #[mock(Secret::new(b"jwt_secret".to_vec()))]
    pub jwt_secret: Secret<Vec<u8>>,

    #[env("EXPIRATION_TIME_DAYS")]
    #[convert(Parse, Call(chrono::Days::new))]
    pub expiration_time: chrono::Days,
}

// Call it like this:
let app_env: AppEnv = AppEnv::load_from_env()?;//

Again, we can evaluate the macro along the parameters from above:

External-input code generators

Code generators are typically invoked either as directly as binaries or through a build.rs script. Some archetypical examples from the Rust ecosystem is protobuf or planus, though internal tools can be made with minimal effort as well. For instance we have small tool to output serde definitions and JSON schemas based on RON files that look something like this:

Event(
    name: "MyEvent",
    version: "1.7",
    description: "Description",
    data: Struct(
        description: "Payload data for MyEvent",
        fields: {
            "first_field": Field(
                type: "Number",
                description: "First data field",
            ),
        }
    ),
    metadata: Struct(
        description: "Payload metadata for MyEvent",
        fields: {
            "first_field": Field(
                type: "Number",
                description: "First metadata field",
            ),
        }
    ),
)

If you do end up writing a code generator yourself, we can recommend completely ignoring formatting while generating your code and instead relying on rustfmt. If you end up generating very nested code, you might even want to let rustfmt handle it twice.

Conclusion

DSLs are a powerful tool to simplify code, improve correctness and create company-wide conventions in your code base. Rust provides excellent mechanisms to implement them at various levels of complexity, and which mechanism to use depends on your use case. When deciding on the right approach, consider the trade-offs in complexity, maintainability, and user experience.

In the future we hope to open source some of the DSLs mentioned in this article, assuming we are able to detangle them from our company-specific logic. We also hope to do more blog posts with some more hands-on guides on how to create these kinds of DSLs yourself.

Appendix: ddb_item!() macro

This is a simplified version of the ddb_item!() macro we use at IDVerse:

#[macro_export]
macro_rules! attr_value {
    // Strings use to_string() instead of into().
    ($attrs:ident, $key:literal, S($value:expr)) => {
        $attrs.insert(
            String::from($key),
            aws_sdk_dynamodb::types::AttributeValue::S($value.to_string()),
        );
    };
    // Numbers use to_string() instead of into().
    ($attrs:ident, $key:literal, N($value:expr)) => {
        $attrs.insert(
            String::from($key),
            aws_sdk_dynamodb::types::AttributeValue::N($value.to_string()),
        );
    };
    // Value can directly be converted to an AttributeValue
    ($attrs:ident, $key:literal, _($value:expr)) => {
        $attrs.insert(String::from($key), $value.into());
    };
    ($attrs:ident, $key:literal, $t:ident($value:expr)) => {
        $attrs.insert(
            String::from($key),
            aws_sdk_dynamodb::types::AttributeValue::$t($value.into()),
        );
    };
    ($attrs:ident, $key:literal, $t:tt?($value:expr)) => {
        if let Some(v) = $value {
            attr_value!($attrs, $key, $t(v));
        }
    };
}

#[macro_export]
macro_rules! ddb_item_helper {
    ($attrs:ident, $(,)?) => {};
    ($attrs:ident, $key:literal => $t:tt($value:expr), $($rest:tt)*) => {
        attr_value!($attrs, $key, $t($value));
        ddb_item_helper!($attrs, $($rest)*);
    };
    ($attrs:ident, $key:literal => $t:tt?($value:expr), $($rest:tt)*) => {
        attr_value!($attrs, $key, $t?($value));
        ddb_item_helper!($attrs, $($rest)*);
    };
}

/// Macro to help building HashMap<String, AttributeValues>.
#[macro_export]
macro_rules! ddb_item {
    ($($items:tt)*) => {{
        let mut attrs = std::collections::HashMap::new();
        ddb_item_helper!(attrs, $($items)*);
        attrs
    }}
}

#[cfg(test)]
mod tests {
    use std::collections::HashMap;

    use aws_sdk_dynamodb::types::AttributeValue;

    #[test]
    fn test_ddb_item() {
        let passthrough = AttributeValue::S("passthrough".to_string());
        let some_passthrough = Some(passthrough.clone());

        let item = ddb_item! [
            "string" => S("hello"),
            "num" => N(12),
            "some string" => S?(Some("world")),
            "none string" => S?(None::<String>),
            "passthrough" => _(passthrough.clone()),
            "some passthrough" => _?(some_passthrough),
             "some passthrough" => _?(None::<AttributeValue>),
        ];

        let expected = HashMap::from([
            ("string".to_string(), AttributeValue::S("hello".to_string())),
            ("num".to_string(), AttributeValue::N(12.to_string())),
            (
                "some string".to_string(),
                AttributeValue::S("world".to_string()),
            ),
            ("passthrough".to_string(), passthrough.clone()),
            ("some passthrough".to_string(), passthrough),
        ]);

        assert_eq!(item, expected);
    }
}

Footnotes

  1. Interestingly this language uses serde recursively: the parser relies on serde to parse definitions written in RON, while the generated output itself includes serde annotations for serialization and deserialization.