
Trade-offs in designing DSLs
Trade-offs in designing Domain Specific Languages
A Domain-Specific Language (DSL) is a programming abstraction tailored to a specific application domain, enabling developers to express ideas at a higher level than general-purpose programming languages (GPLs). Unlike GPLs, DSLs often focus on specific tasks and may not represent complete programs or even computational logic.
It is very common to use DSLs when programming in Rust, as there are a lot of high-quality DSLs in the ecosystem:
- Serde: Serialization/deserialization types
- Clap: Concise definitions of CLI interfaces
- Logos: Concise lexer definitions
- LALRPOP: Concise parser definitions
- Diesel: An SQL query builder
- Protobuf: Schemas for reading and writing files in the Protocol Buffers format
- Rhai: A scripting language designed to be embedded in Rust binaries
- Bevy: Although primarily a game engine, its highly specialized APIs, such as Bevy systems, can be analyzed in the context of DSL design choices.
It’s worth noting that in this blog post we’ve chosen to take a very broad view on what constitutes a DSL. Depending on what you think is needed to constitute a “language”, not all of these crates are DSLs.
Some observations about the list above:
- The DSLs are implemented using many different technologies: macros, the Rust trait system or more traditional language parsers using lexers and parser generators.
- Most of the DSLs are evaluated at or before compilation, but typically generate code that is evaluated at runtime. This is fairly typical for DSLs in Rust.
- Some of these crates (like
serde
anddiesel
) have their “DSL input” embedded directly inside Rust source files, while others (likeprotobuf
andrhai
) take input that would normally be stored separate from the Rust source code.
One reason for the popularity of DSL crates is that Rust has some very strong tools for building DSLs such as proc-macros and a rich trait system. In this blog post we will discuss the design space of DSLs in Rust: Why would you want to create a DSL, what design questions to ponder before creating your own DSL and finally we will explore some common archetypes of answers one might have to those questions.
Why create your own DSLs?
- Code Improvements: You might want to reduce code duplication or improve correctness or maintainability in existing code.
- Problem-specific solutions: Solving company-specific problems or filling gaps where no DSL exists.
- Learning and experimentation: Experimenting with Rust features and exploring design challenges
Here are some real-world examples of DSLs used at IDVerse, showcasing the range of problems they can address:
- We have a proc-macro for specifying how to extract and parse values from environment variables and collect them into a struct.
- We have a parser/generator for SMPP protocol messages. While creating this parser, we developed a small DSL to specify the fields of different structs from the specification. This included details on how these structs were parsed.
- We have a small, opinionated language for defining the types of EventBridge events and a compiler for turning these definitions into both
serde
-structs, JSON schema files and documentation1. - We have a framework for specifying how DynamoDB models are serialized, deserialized and queried. This framework can definitely be viewed as a DSL. We are hoping to open source it at some point, but so far haven’t found time to detangle it from company-specific code.
The design space of DSLs in Rust
Having established the motivations, we’ll now outline the major questions to ask when creating a DSL.
1. Users: Who will use the DSL?
Understanding who will work with the DSL can inform many design choices, especially regarding implementation effort. Sanding down the rough edges can often end up taking considerable amounts of time.
- Language experts: These are the creators of the DSL, typically more able to tolerate unintuitive syntax or obtuse error messages.
- Developers: Users familiar with the system, but not necessarily the DSL itself.
- Domain experts: Non-programmers, such as business analysts or other specialists, who need to generate or tweak the DSL input.
- Machine-generated: The DSL input is automatically produced by a tool or system, with no human involvement.
- End users: These could be users directly interacting with the DSL (e.g., writing search queries) or indirectly (e.g., using a UI that generates DSL input).
2. Error messages: How critical is good feedback?
Creating good error messages isn’t particularly difficult, but it can be time-consuming.
- Minimal: Only vague error messages, with little help to directly point to the issue.
- Custom: More specific error messages that relate directly to the issue.
- Advanced: Custom error messages that include code snippets or syntax highlighting for easier debugging.
We recommend using syn::Error::to_compile_error
or std::compile_error!()
when writing Rust macros.
When creating DSLs that take external input, we recommend the codespan
crate to produce error messages with precise file and line information. This crate provides a Span
datatype that you can use annotate objects created by your parser. These spans must be propagated all the way to your error printing routine, where you can use the codespan-reporting
crate to output pretty error messages.
3. Output: What does the DSL produce?
The kind of output or outputs generated by the DSL can constrain the kinds of DSLs that are suitable.
- Code generation: The DSL outputs code. It might even output code to targeting multiple programming languages, or have tweaks you can apply to specific platforms.
- Documentation: The DSL outputs structured documentation.
- Other formats: The DSL outputs data in various other machine-readable formats, such as JSON schemas or configuration files.
- Runtime results: The DSL executes actions at runtime, such as querying a database or updating the system state.
4. Evaluation time: When should the DSL be evaluated?
The timing of when the DSL is evaluated constrains which kinds of DSLs that are possible.
- Pre-compilation: Evaluate the DSL ahead of time and output code. Sometimes it might be beneficial to store this output in version control.
- Compile time: Evaluate the DSL during compilation, e.g. by using a proc-macro or a
build.rs
file. - Runtime: Evaluate the DSL during execution, which can offer flexibility but may impact performance.
5. Source of truth: How will changes be managed?
Suppose you have multiple representations of the data encoded in your DSL stored in different formats. These other formats might for example be documentation, language bindings, external schemas or AWS configurations. How do changes in one representation get applied in other representations?
- Manual updates: Update each version of the data manually, potentially using validators to ensure consistency.
- Single source: Choose one of the required output format as the canonical source of truth, then use converters to generate any other format.
- Dedicated source: Sometimes it can be challenging to convert between the required output formats. In those cases it might make sense to create a new authoritative format, and use converters to derive any other formats from that authoritative format.
6. Storage: How is the DSL input stored?
The method of storing DSL input will affect how it’s accessed and how the system behaves.
- Inline in code: Embedded directly in the source code.
- Compile-time loading: Loaded from disk or network resources during the compilation process.
- Runtime loading: Loaded dynamically during execution.
- Rust object: The DSL does not have a parsing stage. Instead Rust objects are created by other parts of the system and passed directly to the DSL.
7. Parsing: How is the DSL parsed?
The choice of parsing mechanism can impact the flexibility, performance, and complexity of the DSL.
macro_rules!()
: Use Rust macros to transform DSL-like syntax into code.- Proc-macros: Procedural macros can be used for more complex transformations during compilation.
- Serde: You can encode your DSL in formats like YAML or RON and use Serde to deserialize the input into Rust structs.
- Traits: With enough abstraction, using the trait system can feel like interacting with a DSL. This way any parsing is handled by the Rust compiler.
- Custom Parser: If needed, you can write your own parser using tools like Logos for lexical analysis and LALRPOP for parsing.
8. Normalization: Any pre-evaluation validation?
Before evaluating the DSL, you may need to normalize or validate the input to ensure correctness, simplify code generation, or enable faster execution.
- Type checking: Ensuring that data types are valid and consistent.
- Normalization: A normalization step can be useful to ensure that whatever requirements you have are checked and errors reported on bad input. This might mean converting strings to enums, or converting lists of
(key, value)
pairs into a maps after checking that no key is duplicated. - Intermediate representation: When outputting code, sometimes it can be beneficial to transform the DSL input into an intermediate form that more closely follows the generated code.
- Executable form: When executing a DSL at runtime, sometimes it makes sense to compile the input into a form more suitable for execution, such as bytecode.
9. Project scope: How much effort will you invest?
Finally, consider how much time and resources you’re willing to allocate for the DSL’s design and maintenance.
- Quick hack: A small-scale solution, such as automating a repetitive task in your codebase.
- Large-scale: A more extensive project that requires careful planning and long-term maintenance, involving multiple developers and users.
Archetypical small DSLs in Rust
To illustrate the trade-offs outlined earlier, we analyze three common types of DSLs coming from the IDVerse codebase. These three were chosen, since we believe they are somewhat archetypical for how smaller DSLs might look in other codebases.
macro_rules!()
helpers
These lightweight macros provide concise solutions to repetitive tasks. For example, our ddb_item!()
macro simplifies creating HashMaps
for DynamoDB AttributeValue
s. It’s used like this:
let item = ddb_item!(
"id" => S("123"),
"name" => S("John Doe"),
"age" => N(24),
);
A simplified implementation can be found at the end of the blog post. If we evaluate it along the parameters above, we get something like this:
- Users: Language experts and developers.
- Error messages: Minimal feedback; misuse often results in generic compiler errors.
- Output: Code generation. Produces Rust code for inclusion in the same project.
- Evaluation time: Compile-time.
- Source of truth: Not applicable: No additional formats are relevant here.
- Storage: Inline in code.
- Parsing:
macro_rules!()
provides built-in parsing for pattern matching. - Normalization: Nothing meaningful.
- Project scope: Quick and effective for small-scale use cases.
Derive proc-macros
Custom derive macros can make code a lot easier to read and perhaps just as importantly serve as a push to standardize how you implement similar operations at your company. Our example here is a #[derive(AppEnv)]
macro. Using it looks something like this:
#[derive(Debug, AppEnv)]
pub struct AppEnv {
#[env("MESSAGING_TABLE_NAME")]
pub messaging_table_name: String,
#[env("MESSAGING_TABLE_GSI1_INDEX_NAME")]
pub messaging_table_gsi1: String,
#[env("SQS_QUEUE_URL")]
pub sqs_queue_url: String,
#[env("LAMBDA_FUNCTION_ARN")]
pub lambda_function_arn: String,
#[env("JWT_SECRET")]
#[convert(Call?(decode_jwt))]
#[mock(Secret::new(b"jwt_secret".to_vec()))]
pub jwt_secret: Secret<Vec<u8>>,
#[env("EXPIRATION_TIME_DAYS")]
#[convert(Parse, Call(chrono::Days::new))]
pub expiration_time: chrono::Days,
}
// Call it like this:
let app_env: AppEnv = AppEnv::load_from_env()?;//
Again, we can evaluate the macro along the parameters from above:
- Users: Developers.
- Error messages: Custom compile-time errors using
syn::Error::to_compile_error
. - Output: Code generation. The macro generates Rust code for parsing environment variables and assigning them to fields in a struct. At some point we will expand the macro to also output documentation.
- Evaluation time: Compile-time.
- Source of truth: Currently not applicable, since we only have a single format. Eventually we will generate documentation from the Rust code.
- Storage: Inline in code.
- Parsing: Proc-macros.
- Normalization: Type checking and basic normalization are handled during compilation.
- Project scope: Still fairly small. The entire implementation is ~500 lines of Rust code.
External-input code generators
Code generators are typically invoked either as directly as binaries or through a build.rs
script. Some archetypical examples from the Rust ecosystem is protobuf or planus, though internal tools can be made with minimal effort as well. For instance we have small tool to output serde definitions and JSON schemas based on RON files that look something like this:
Event(
name: "MyEvent",
version: "1.7",
description: "Description",
data: Struct(
description: "Payload data for MyEvent",
fields: {
"first_field": Field(
type: "Number",
description: "First data field",
),
}
),
metadata: Struct(
description: "Payload metadata for MyEvent",
fields: {
"first_field": Field(
type: "Number",
description: "First metadata field",
),
}
),
)
- Users: Developers.
- Error messages: Custom. The generator mostly provides user-friendly errors messages, however we are limited by what is possible using serde.
- Output: Code generation, documentation and JSON schemas.
- Evaluation time: Pre-compilation. The code generator is a binary which can either be invoked manually or through CI. It outputs the generated files into the
src/
directory of the project. - Source of truth: Single source. The RON files serve as the canonical source of truth, from which both Rust code and JSON schemas are derived.
- Storage: Compile-time loading. The RON files are read and parsed during the build process to produce the output files.
- Parsing: Serde.
- Normalization: No major transformations/validations done after parsing by Serde.
- Project scope: Fairly small, the implementation is ~800 lines of Rust code.
If you do end up writing a code generator yourself, we can recommend completely ignoring formatting while generating your code and instead relying on rustfmt. If you end up generating very nested code, you might even want to let rustfmt handle it twice.
Conclusion
DSLs are a powerful tool to simplify code, improve correctness and create company-wide conventions in your code base. Rust provides excellent mechanisms to implement them at various levels of complexity, and which mechanism to use depends on your use case. When deciding on the right approach, consider the trade-offs in complexity, maintainability, and user experience.
In the future we hope to open source some of the DSLs mentioned in this article, assuming we are able to detangle them from our company-specific logic. We also hope to do more blog posts with some more hands-on guides on how to create these kinds of DSLs yourself.
Appendix: ddb_item!()
macro
This is a simplified version of the ddb_item!()
macro we use at IDVerse:
#[macro_export]
macro_rules! attr_value {
// Strings use to_string() instead of into().
($attrs:ident, $key:literal, S($value:expr)) => {
$attrs.insert(
String::from($key),
aws_sdk_dynamodb::types::AttributeValue::S($value.to_string()),
);
};
// Numbers use to_string() instead of into().
($attrs:ident, $key:literal, N($value:expr)) => {
$attrs.insert(
String::from($key),
aws_sdk_dynamodb::types::AttributeValue::N($value.to_string()),
);
};
// Value can directly be converted to an AttributeValue
($attrs:ident, $key:literal, _($value:expr)) => {
$attrs.insert(String::from($key), $value.into());
};
($attrs:ident, $key:literal, $t:ident($value:expr)) => {
$attrs.insert(
String::from($key),
aws_sdk_dynamodb::types::AttributeValue::$t($value.into()),
);
};
($attrs:ident, $key:literal, $t:tt?($value:expr)) => {
if let Some(v) = $value {
attr_value!($attrs, $key, $t(v));
}
};
}
#[macro_export]
macro_rules! ddb_item_helper {
($attrs:ident, $(,)?) => {};
($attrs:ident, $key:literal => $t:tt($value:expr), $($rest:tt)*) => {
attr_value!($attrs, $key, $t($value));
ddb_item_helper!($attrs, $($rest)*);
};
($attrs:ident, $key:literal => $t:tt?($value:expr), $($rest:tt)*) => {
attr_value!($attrs, $key, $t?($value));
ddb_item_helper!($attrs, $($rest)*);
};
}
/// Macro to help building HashMap<String, AttributeValues>.
#[macro_export]
macro_rules! ddb_item {
($($items:tt)*) => {{
let mut attrs = std::collections::HashMap::new();
ddb_item_helper!(attrs, $($items)*);
attrs
}}
}
#[cfg(test)]
mod tests {
use std::collections::HashMap;
use aws_sdk_dynamodb::types::AttributeValue;
#[test]
fn test_ddb_item() {
let passthrough = AttributeValue::S("passthrough".to_string());
let some_passthrough = Some(passthrough.clone());
let item = ddb_item! [
"string" => S("hello"),
"num" => N(12),
"some string" => S?(Some("world")),
"none string" => S?(None::<String>),
"passthrough" => _(passthrough.clone()),
"some passthrough" => _?(some_passthrough),
"some passthrough" => _?(None::<AttributeValue>),
];
let expected = HashMap::from([
("string".to_string(), AttributeValue::S("hello".to_string())),
("num".to_string(), AttributeValue::N(12.to_string())),
(
"some string".to_string(),
AttributeValue::S("world".to_string()),
),
("passthrough".to_string(), passthrough.clone()),
("some passthrough".to_string(), passthrough),
]);
assert_eq!(item, expected);
}
}
Footnotes
-
Interestingly this language uses serde recursively: the parser relies on serde to parse definitions written in RON, while the generated output itself includes serde annotations for serialization and deserialization. ↩