Rudderstack, Snowplow and Open-Source CDP Alternatives to Segment

Mark Rittman
Rittman Analytics Blog
9 min readFeb 20, 2022

--

Segment is clearly the number one choice for organizations looking to add data collection, a customer data platform (CDP) and routing of that data to downstream marketing destinations, with the largest number of sources and destinations from any one vendor and options for schema governance, identity resolution, customer profiling and journey orchestration.

Rittman Analytics is a Segment Select Partner and we’ve implemented Segment Connections, Segment Personas and Segment Protocols for a number of clients, setting up tracking plans and building customer data platforms that blend real-time behavioural data with data warehouse insights to build rich, contextual and persuasive customer experiences.

And yet … Segment’s monthly user-based pricing model can become expensive if you’re running a fast-growing B2C website with rapidly increasing visitor counts who tend to browse once, rarely convert but quickly burn through your MTU quota (Monthly Tracked Users).

and whilst Personas offers user-friendly audience building and journey orchestration tools, it’s API-based approach to data ingestion and limited user data model can be a poor match to the way your organization actually stores and organises your customer data.

So what about one of the open-source data collection and CDP alternatives to Segment such as Rudderstack or Snowplow? Are they as good, how do they work and why might you still want to stay using Segment, despite the cost?

Rudderstack is probably the simplest Segment alternative to explain; an open-source re-implementation of Segment Connections, the “classic” part of Segment that collects user event data from websites, mobile and other apps and then routes it to all of your downstream marketing and customer applications.

Rudderstack’s Javascript SDK is API-compatible with Segment’s, making migration just a case of finding all of the analytics.track and analytics.identify Javascript event calls on your website and appending rudder to each of them (or you can use the web snippet in our Github gist to hijack the Segment SDK event calls and route them to Rudderstack as well).

Anyone who’s familiar with Segment’s UI will be at home in Rudderstack’s, which is a bit less polished but improving fast and features a similar workflow to Segment’s for creating connections, viewing the live event stream and piping those events to your downstream destinations and cloud warehouse.

What gets your attention initially with Rudderstack is its pricing model; what keeps that interest is it’s “warehouse-first” approach to storing, connecting and activating your first-party customer data.

Segment’s Monthly Tracked Users (MTU) pricing approach means that a B2C eCommerce site with visitors that rarely convert, browse once and never return pays the same as a B2B site with high-value, logged-in frequently returning users that are far more likely to respond to personalized user journeys and precisely-targeted offers.

Rudderstack’s hosted cloud option charges by the event instead of user, significantly reducing the penalty for sites with high bounce rates and low conversion rates. They do this is by not holding any of your event data in their own cloud storage, relying on you the customer streaming those events into your own cloud data warehouse.

Which is great, in-fact, if like most businesses you’re looking to own your own first-party data and store it the same location as your legal jurisdiction. What’s not so great is finding out that your backup has failed and then realising that Rudderstack doesn’t have a copy either, so customers on Rudderstack’s top-end Enterprise Tier now have an event replay facility, essential for selling to organizations with deep pockets and a deep aversion to losing their customer data.

You can only compete on price so long before another, even cheaper service comes along and then things become a race to the bottom. What makes Rudderstack interesting therefore to us as platform architects is their warehouse-centric approach to building your customer data platform, an architectural pattern we wrote about back last year in another blog post titled “Why (and How) Customer Data Warehouses are the New Customer Data Platform”.

That blog was written in the context of Hightouch, a “reverse ETL” tool that enables organizations to take the rich, joined-up centralized records of their customers, their interests and insights from their behavioral patterns and sync them to their CRM, marketing and ad platform services.

Rudderstack comes with its own reverse ETL tool, Warehouse Actions, that whilst not quite as sophisticated as Hightouch, does the job and integrates with the rest of the Rudderstack suite of products.

Rittman Analytics are delivery partners for both Rudderstack and Hightouch and each has its own place depending on the needs of our clients, and we use Warehouse Actions ourselves to keep our Hubspot CRM system up-to-date with insights we generate from visits to our website and form submissions.

Of course if we’re talking about open-source alternatives to Segment, the original grand-daddy of them all is Snowplow Analytics, which like Rudderstack is also an open-source project with commercial sponsors and actually predates Segment by a few months.

Snowplow was created as a solution to a different problem than the one Segment originally addressed; how to break free from the restrictions that contemporary web analytics tools placed on accessing visitor behavioral data at the detailed, event-level, as opposed to Segment’s mission to replace all of the individual web trackers on your site with a single one that acted as a router to all of your downstream destinations.

But in-practice, most organizations we see starting out with Segment, Rudderstack and Snowplow use it for one purpose first and foremost: capturing website, mobile app and web app user activity at the event level and storing it in a cloud data warehouse to combine customer data and analyze it using tools such as dbt and Looker.

What makes Snowplow special though is the emphasis it places on well-structured schemas for recording the details of your user events, and the lifecycle of those schemas as your users and their behaviours evolve. While Segment has Protocols to enforce schema governance after the event at heart it’s a service that favours flexibility over formality, allowing you to collect any event schema that conforms to its track and other API specs.

Snowplow’s approach makes it more likely that you’ll end up with user data that you can actually make sense of and derive meaning from, but at the cost of more initial work up-front to define those schemas and typically, a need for an IT department to host Snowplow and all its collectors, enrichers and other components for you and then manage, scale and monitor those comp

Or, you could do as we’ve done and consume Snowplow as a managed service through another one of our partners, SnowcatCloud, who take care of all the hosting, configuration, scaling and monitoring for you; and in an age where third-party cookies are disappearing and services like Google Analytics uses machine learning to fill-in the gaps in user data they no longer connect, SnowcatCloud’s new Snowplow-based Iceberg identity graph service offers the promise of still being able to build a complete, 360-degree view of all customer interactions across all of their devices

So is that it for Segment, have their customers all moved to cheaper or more specialised open-source products such as Rudderstack, Snowplow or even Google Analytics 4? The answer, according to IDC, is no, and with Segment’s acquisition by Twilio and the launch of Twilio Engage last year their strategy increasingly looks like one where they move increasingly into the growth automation market, marketing automation platforms powered by data pipelines and APIs.

In the end it comes down to what are your needs, and at what stage in your data and marketing maturity is your organization currently at? This report by the Customer Data Institute, I think, sums up the market in which Segment, Rudderstack, Snowplow and all of the other customer data platform and customer data infrastructure vendors operate:

“This report groups CDP vendors into four categories based on the functions provided by their systems. Each category includes functions provided by the previous categories … Categories are:

Data CDP. These systems gather customer data from source systems, link data to customer identities, assemble unified customer profiles, and store the results in a database available to external systems … In practice, these systems also can extract audience segments and send them to external systems. Systems in this category often employ specialized technologies for data management and access. Some began as tag management or Web analytics systems and retain considerable legacy business in those areas.

Analytics CDP. These systems provide the features of a data CDP plus analytical applications. The applications always include customer segmentation and sometimes extend to machine learning, predictive modeling, revenue attribution, and journey mapping. These systems often automate the distribution of data to other systems.

Campaign CDP. These systems provide data assembly, analytics, and customer treatments. What distinguishes treatments from segmentation is that treatments can be different for different individuals within a segment. Treatments may be personalized messages, outbound marketing campaigns, real time interactions, or product or content recommendations. These systems often include features to orchestrate customer treatments across channels.

Delivery CDP. These systems provide data assembly, analytics, customer treatments, and message delivery. Delivery may be through email, Web site, mobile apps, CRM, advertising, or several of these. Products in this category often started as delivery systems and added CDP functions to support advanced analytics, personalization, or multi-channel campaigns.

Clearly, under this categorization, Rudderstack and Snowplow are Data CDPs and if combined with consulting services from specialist analytics partners such as Rittman Analytics, can become Analytics CDPs that provide a platform for segmentation, audience building and insights on your customers’ behaviours and needs. This in-fact describes the needs of most of our clients and we’ve implemented Rudderstack for a number of them, using savings made from their new license deal to pay for their CDP implementation.

But for clients further up the customer data platform maturity curve, Segment’s move into the Delivery CDP space means it continues to meet their needs as those needs increase in sophistication, scale and complexity; we still have many clients extremely happy with their Segment platform and others that are adopting it for the first time, and it’s still hands-down the best engineered, most fully-featured and user-friendly customer data infrastructure and platform that we work with today.

Every client has their own particular needs and Segment, Rudderstack and Snowplow through SnowcatCloud offer different services meeting different needs for customers in different stages in their data maturity. We’re proud to partner with all three vendors and can advise on which best suits your needs, then work with your data team to build out the customer data platform that’s just right for you.

Interested? Find out More

Rittman Analytics is a boutique analytics consultancy specializing in the modern data stack who can help you get started with Segment, Rudderstack, Hightouch or Snowplow, centralise your data sources and enable your end-users and data team with best practices and a modern analytics workflow.

If you’re looking for some help and assistance building-out your analytics capabilities on a modern, flexible and modular data stack, contact us now to organize a 100%-free, no-obligation call — we’d love to hear from you!

--

--

CEO of Rittman Analytics, host of the Drill to Detail Podcast, ex-product manager and twice company founder.