Clusters are the heart of Materialize, the engines that make operational workloads go. Before you can ingest data from a source, maintain complex queries in realtime, or sink out your changes, you have to size and create a cluster - an isolated pool of compute resources dedicated to your workloads.

Today we’re excited to announce a few improvements to Materialize’s cluster sizings, including new names, new sizes, and more oomph.

From T-shirts to engines

First, we’ve created a clearer naming system for cluster sizes. Until today, we’ve used “T-shirt” sizes, qualitative names like 2xsmall or medium that map to some amount of credit cost and behind-the-scenes compute resources.

After working hands on with customers and prospects over the last year, we found a few wrinkles with this naming system. First, T-shirt naming did not allow for an intuitive understanding of how compute resources scale between sizes: We knew a medium was bigger than a 2xsmall, but by how much? If a workload was using 35% of a large, what size can it safely downsize into?

We also wanted to make it easier for users to understand the relationship between a cluster’s cost and its compute resources. A cluster’s credit cost has always been tied directly to its compute resources, but the T-shirt size names provided no insight into this relationship.

As a result, we’re deprecating the T-shirt sizes, and we’re introducing new cluster size names based on their credit cost (specifically in “centicredits”, or cc).

Converting to the new names is easy: A 3xsmall cluster used to cost 0.25 credits/hour, and voila, it’s now a 25cc cluster! An xlarge used to cost 16 credits/hour, and voila, it is now a 1600cc cluster! [1]

These names should give a more intuitive mapping to their relative sizes: How much more compute is given to an 800cc than a 200cc? 4x! How much larger is a 1600cc than a 100cc? 16x! [2]

[1] For those unfamiliar with scooters, motorcycles, or Mario Kart, a 25cc engine is pretty teeny whereas a 1600cc is quite large.

[2] Note that these ratios aren’t always exact between cluster sizes for many deep technical reasons, but they’re a close approximation for how to think about relative scale. When we aren’t able to get an exact linear relationship, we always round up in favor of the customer and offer more compute per credit.

Disk-enabled clusters

Second, in this new cluster sizing scheme, customers will get disk-enabled clusters that offer spill-to-disk capacity. Yup, that’s right - Materialize processes workloads in-memory, but as the needs arise, Materialize will automatically offload processing to disk, seamlessly handling key spaces that are larger than memory. This lets customers process larger datasets than memory alone would permit, efficiently handling larger workloads without running into memory constraints. This ensures graceful degradation and reliable operations to provide an optimal user experience.

Intermediate sizes

Last up — we have added new cluster size options! When we first drafted our cluster sizes, we ~2x’d the compute resources for each T-shirt size we offered. This gave customers great flexibility to right-size their clusters with smaller workloads, but as the workload scaled, especially beyond the capacity of a medium, the impact of jumping to the next cluster size up could result in a large jump in cost.

Therefore, our last improvement to our cluster sizing system is the addition of two new cluster sizes, the 600cc and 1200cc. These sizes fit between our former medium/ large and large/ xlarge sizes, respectively. This addition smooths out the sizing curve, giving us and our customers more opportunities to right-size clusters running workloads of all sizes.

Conclusion

In total, here are our revamped cluster size offerings. We’re excited to offer this new and improved set of names, sizes, and spill-to-disk capabilities to power your Materialize workloads.

Size Former Size Credits (per hour)
25cc 3xsmall 0.25
50cc 2xsmall 0.5
100cc xsmall 1
200cc small 2
400cc medium 4
600cc 6
800cc large 8
1200cc 12
1600cc xlarge 16
3200cc 2xlarge 32
6400cc 3xlarge 64
128c 4xlarge 128
256c 5xlarge 256
512c 6xlarge 512

More Articles

Key Concept

Loan Underwriting Process: The Move to Big Data & SQL

In this blog, we'll examine the loan underwriting process, including the current landscape, credit modeling, and the move toward big data and SQL.

Kevin Bartley

May 7, 2024

Key Concept

Strategies for Reducing Data Warehouse Costs: Part 3

Decrease your data warehouse costs by sinking precomputed results and leveraging real-time analytics.

Kevin Bartley

Apr 16, 2024

Key Concept

Strategies for Reducing Data Warehouse Costs: Part 2

Here's how to save money on your data warehouse bill with normalized data models and data mesh principles.

Kevin Bartley

Apr 16, 2024

Try Materialize Free