Featured

What does e-commerce for data really mean?

Transparency

The way that data is sold today is extremely opaque. Unlike Software-as-a-Service companies (Slack, Stripe, Shopify, etc.), Data-as-a-Service companies almost never put their pricing on their website (SafeGraph is a stark exception). In fact, most don’t even put much about their product on their website. Instead, DaaS companies or companies that sell data “on the side” usually have a pretty generic website that says something like “We’ve got really interesting data in the category of X! Call us if you’re interested.”

On that phone call, a salesperson assesses the buyer’s interest in the data, their use case, their time horizon, and their willingness to pay. They pick a number, making an educated guess based on their conversation. But they are unlikely to make it based on the true market value of the data. This is partly because the provider may not know how interchangeable his data is or what competitive datasets are going for. He may also not know what data the buyer already has, and the buyer is unlikely to tell him. Data becomes exponentially more valuable when it is combined with other datasets. To expose the most potent combination would be to expose the buyer’s own competitive advantage.

This price opacity leads, as price opacity usually does, to an extremely inefficient market. Sellers and buyers have a hard time finding each other. When they do, the difference between what buyers are willing to pay and sellers are willing to take is extremely wide. Buyers may want to buy just a single row of the provider’s data: the missing link in their chain of data that will bring untold fortune to their business. Sellers want to sell entire datasets, on recurring licenses, in order to boost their own revenues and please investors. This extreme misalignment of incentives leads to an opportunity for middlemen like data brokers, and data marketplaces, to charge extremely high premiums for introducing buyer to seller. But these middlemen don’t do much for price transparency.

Our goal is to bring price transparency to data sales by encouraging sellers to post pricing for their data publicly, and giving market feedback. Sellers who have particularly “exclusive” datasets, that is, data they don’t intend to sell to that many buyers, can choose to hide pricing on their storefront and build orders on behalf of customers at any price. But we hope that as more data sellers enter the market, price transparency becomes the norm. We also hope to offer guidance to our customers who are just getting started in the data business. The size of a dataset is not always correlated with a higher price: a dataset that has just 100 rows may be just as valuable or more valuable than a dataset with one billion rows. Pricing may be dynamic: pricing for data around certain filters, say a certain geographic area or company demographic, may be more valuable than data around others. And lastly, price may depend a lot on the target buyer. Corporate data buyers have different needs than financial ones.

“Productizing” Data

Datasets are hard to describe. They are a list of fields. They are a set of values at a point in time. You can’t take a picture of them or really capture what they are, because the data is always changing. And so when someone is interested in a dataset to purchase, they are left with approximations: a small sample, usually in the form of a spreadsheet. A data dictionary which contains the list of fields in the dataset and their definitions. A marketing powerpoint.

None of these artifacts are compelling in the way that shopping online is compelling. E-commerce broadly lets the customer experience the product without actually possessing it. It packages the product in an appealing way while also seamlessly delivering the product to customers as soon as they are ready to buy. It mimics in many ways the brick and mortar retail experience by taking advantage of our human psychology – the allure of branding, the satisfaction of seeking out and finding, and the avoidance of scarcity. Find what you want and buy it in just a few clicks. No conversation necessary.

For salespeople at data providers, the artifacts they have at their disposal are almost always limiting, and endlessly out of date. A sample set needs to be regenerated on the fly for every new potential customer. A data dictionary needs to be reviewed in case an engineer changed the name of a field. A pdf needs to be revised to reflect the fact that the data is now updated weekly instead of monthly. E-commerce lets providers break free of these artifacts. The data shop is never out of sync with the underlying data. The customer can customize his own sample. The storefront is the collateral.

Distribution

Products without distribution are in no man’s land, sitting on the proverbial shelf in someone’s garage. Data is no different. While several data marketplaces have popped up in the past 18 months or so to streamline data distribution (Amazon and Snowflake each have their own, and there are a few startup competitors like Narrative and Datarade) the value that these companies provide to data companies is in the infrastructure, not in the marketing. By providing the pipes to data buyers that already buy on Amazon, or already have their data in a Snowflake data warehouse, these companies pitch to data providers that buyers will flock to their product simply because it exists.

What these companies don’t do is help the providers in the marketplace actually stand out. They don’t make the products seem valuable. There may be a short description. There may be a free sample of the dataset – who knows when it’s from. But there is no easy way to find data in the marketplace, nor a compelling reason to buy.

Much like the internet at large, if data providers want to build a brand, they need to control their own message. And so, if their products are only listed in a crowded marketplace, it’s very unlikely that those products will stand out. It’s very unlikely that buyers will know what their data means, or why it’s valuable. They will still in all likelihood end up back on the company’s website, clicking “Book a Demo” and talking to the salesperson to figure out what’s really going on.

Much like Shopify offers brick and mortar retailers the tools to easily build an online store, we offer DaaS companies the tools to easily build their online data shop. Customers can search and filter within the actual dataset to see if the data has what they need (there is even a “match” feature for stock tickers, town names, brands, etc. to see if the dataset contains what you’re really interested in). They can get an instant price quote based on their search criteria, and they can checkout and get the data immediately by zip file. What’s more, their search is saved, so they can come back to get a refreshed copy of the data again when they need it, or subscribe for automatic updates.

Sellers get analytics on what data buyers are searching for within their data storefront. They get lead capture from buyers who aren’t yet ready to check out but who are interested and want to know more. They get order management, with customer names, the data they bought, and the price they paid all in one place. And most importantly, they get control over the messaging around their product – what makes this data special? Why should you buy it? What is it worth?

For questions, drop us a line at sales@getsyndetic.com.

Advertisement

Applying design thinking to data

Design thinking has been around since the 1960’s, but it really started gain influence in the early 90’s, when IDEO developed its design process grounded in problem solving and empathic thinking. IDEO is generally credited with bringing design thinking out of academia at the Stanford Design School (also known as d.school) and into the mainstream. Soon Apple, Google, GE, and Samsung popularized the rapidly iterative approach to design, TED Talks were filmed, and suddenly design thinking was everywhere. But what is it?

According to the Interaction Design Foundation, design thinking is an iterative process in which we seek to understand the user, challenge assumptions, and redefine problems in an attempt to identify alternative strategies and solutions that might not be instantly apparent with our initial level of understanding. It involves empathizing with our intended user to deeply understand how they will use our product, what problems they might encounter in doing so, and how we might eliminate those problems through constant iteration and refinement.

The data space is sorely lacking in design thinking.

Much of the focus on data in recent years has been on managing data as an asset rather than as a product. By thinking of data as something to extract value from first, rather than as something to be used first in order to extract value, the industry is missing a huge opportunity to make data more consumable, more usable, and more delightful. One company buys or acquires data from several other places, their engineers normalize and transform it into a clean, consistent standard format, and then they sell that dataset on to other companies. Pretty straightforward.

But what if we applied a design thinking rubric to the process instead?

The principles of design thinking were first described by Nobel Prize laureate Herbert Simon in The Sciences of the Artificial in 1969. There are generally thought to be five stages of design thinking:

  1. Empathize
  2. Define
  3. Ideate
  4. Prototype
  5. Test

Empathy is the most important principle underpinning design thinking, because it requires you to put yourself in the mind of your user. As a data provider, what does this mean? Think about who you are selling your dataset to. Start with the company. Is it a corporate user? A hedge fund user? A venture capital user? Next the actual person at that company. Are they a business person or a data scientist? Maybe both are involved in the decision to buy? Or is your dataset intended to be consumed directly by a system, such as a BI tool? Your data product may need to be quite different in order to meets the needs of these various consumers and their use case.

For example, let’s say you have a data product you are targeting to hedge funds. What types of funds are you targeting? Some funds are quantitative, meaning that they make their trading decisions based on mathematical and statistics analysis. Some funds are fundamental, meaning that they make their trading decisions based on research, by studying trends in a sector or looking at company-specific events. The types of data that these funds buy, and from whom, may vary widely. Additionally, quant funds want data that they can feed into their statistical models. They prefer data to be as machine-readable and easy to manipulate as possible, therefore they typically want it in as raw a form as possible. Fundamental funds on the other hand may want insights and analytics to be pulled together into a report that can be readily consumed by an investment professional.

The definition phase involves defining your users needs and their problems. What are your user’s problems? Maybe your user is a retail brand that doesn’t have good data on the effectiveness of its marketing campaigns. Your data is going to be used by marketing professionals so that they can create more targeted and effective campaigns going forward. What are those users biggest problems with the data they already have? Is it not particularly useable because it is siloed in too many different systems, with no common way of linking the data together? Is it just stale and hasn’t been updated in a while? Is it incomplete in some way?

There is a concept in design thinking called wicked problems, which are seemingly intractable or especially tricky to solve. A wicked problem may involve incomplete, contradictory, changing or overlapping requirements. It may involve interdependencies that illuminate that one problem is actually just a symptom of a larger problem. What are the wicked problems in data?

We would propose that the fluid nature of data (that is, the fact that datasets change over time) is one of the wicked problems that both data buyers and sellers must reckon with in order to extract the most value out of the data they exchange. From the very outset of a sales discussion the data buyer must get to heart of not only what is this data right now, but what will this data be tomorrow? One month from now? One year from now? Will it become more valuable over time, or less? How often does it change? Why does it change?

Design thinking asks us to reframe wicked problems and to challenge assumptions, which leads us to the next phase: ideation. One of the reasons we started Syndetic was to challenge the assumption that data sales must involve an exchange of static documents which are meant to in some way “represent” the dataset. To us, there is no way to represent a dynamic product with a static document. That’s why we came up with this concept of a “live” data dictionary that is always hooked into the underlying dataset. But we went further than that. Why not eliminate the exchange entirely, and let the buyer start exploring the dataset itself right away? That way they can see how it is today, and how it is tomorrow. They can buy just a point in time sample, or they can subscribe to regular updates.

By this point you can start with the prototype phase, which is when your database becomes a product rather than just some data lying around in a database somewhere. You’ve identified who your target user is, how they are going to use the data, and what problems they have. But now you have to get it in the hands of users and see if they actually want to buy it. This is where a data storefront really shines, because in the old days of [phone rings…] “Hey, I saw that you guys sell ESG data, and I run an ESG fund, so what’s your data about?” it can take many weeks to months to even get your prototype in the hands of your user. They might ask for a sample of your data product, but how do you know what kind of sample to provide? Do they want 100 random rows of data? Do they want the entire dataset? Do they really just need the header names? Maybe they only are interested in a specific slice of your data – for example, if you sell foot traffic data, a buyer may only be interested in foot traffic to retail stores, and doesn’t care about restaurants. An online data storefront helps you get right to the point because the buyer can filter down your dataset right from the beginning to “retail stores” and pull sample directly from the site.

As you drive more traffic to your storefront, you can start the final phase of the design thinking process: testing. You’ll learn how your customers want to slice and dice your dataset. You’ll learn what might be missing or incomplete. Perhaps most importantly, you’ll learn how to more accurately price your data, because you’ll have a better sense of what your data is worth. All through an iterative process where you systematically improve your product, offer it to more customers, and improve it again. Running this type of testing, data collection and iterative process through a disjointed, phone or email-based sales motion is almost impossible. With each lead that comes to your site, you will gain more information about your user, their wicked problems, and how your data might help them. And your dataset will become ever more elegantly designed.

Getting Started

1.  Getting data into Syndetic

The first step in building your data shop is to get the data to us. By connecting to your data, we automatically generate your shop and keep it up to date so your customers never get stale data. You have a few options:

  • S3
    • We will create an S3 bucket for you and send you credentials to push files to us. If you are creating multiple data products, you can create a subfolder for each product. As data updates, you should push revisions to us. It is important to send us complete updated revisions of any files rather than diff files.
  • Upload
    • If you prefer a more manual approach or don’t have that many files, you can simply upload files to Syndetic as you build out your shop.

2. Creating datasets

A dataset is equivalent to a data product. This can be a table in a database, or a file that you wish to sell. Your customers will be able to drill down into the data in two different ways: by selecting different packages, which slice the file vertically (by number of fields included) or by engaging different filters, which slice the data horizontally (by number of rows included). 

Creating packages

A package is a data offering that you create from a dataset. For example, if I have a dataset about companies that contains 100 fields covering all types of metadata on those companies – information about its financials, its employees, and its products, I may choose to make three packages out of a single dataset: one package that includes fields 1-20, which are all related to financials, one package that includes fields 20-30, which covers employees, and one that includes fields 31-100, which relate to its products. The important thing to note is that each package must rely on the same primary id field (e.g., company ID) in order to create different packages from the same dataset.

Note: It is ok to have only one package, where you are effectively making the entire dataset available at once.

When you create a package, you will be asked to describe the package (e.g. “Financial information on companies, including market cap, revenue, and PE ratio), set the ID field, and configure which fields you would like to set as filters. Filters allow your customers to slice the data so that they can purchase only data on companies with more than 1,000 employees, for example. Or only technology companies. You get the idea.

Pricing

Syndetic allows you to set pricing for every package that you create. Think of your shop as an ecommerce site as you would for any other type of product. You can update pricing at any time, but you cannot offer different prices to different customers. Products can be classified as either subscription products, which have a price per month, or one-time products, which have a price per row. To mark a data product as a subscription product, simply toggle to Subscription product when creating or editing your data product.

Set pricing per package

If you don’t want to publicly display pricing and instead only want to use your shop for lead get, you can check this box in your shop settings. Note that doing so will disable all checkout features on your shop, and only allow customers to contact you to build an order on their behalf.

Blurring   

Blurring refers to whether you’d like the search results for your customers to be blurred out instead of showing them actual results from your dataset. In general we would recommend not blurring out results, as this makes a customer less likely to purchase, except in a few cases:
 – If the dataset is very small (say under 100 rows), and search results are likely to return just a few rows
 – If your pricing per row is very high, i.e. each row of data is extremely valuable, you may not want customers to see even a few rows of data

Building an order on a customer’s behalf

When you are logged in as an Admin, you can view your own data shop at any time by clicking View Shop under any Data Product.

This lets you see what your customers see when they visit your shop. From here, you have the ability to build an order with any price for any customer. This is useful for customers who get special pricing, or to send a sample of your data to someone at no cost.

To do so, from your shop, run the search that you want to run on behalf of your customer and click Get Data.

Select the package(s) you want and click Build Order.

You’ll be taken to a page where you can enter all of the details for your customer’s order including the price and their name and email address. When you click Create Order, your customer will receive an email notifying them that the order has been placed and with an invoice to pay (unless the price is 0). Once payment is received the data will be released and they will receive a link to download the data.

Shop settings

The shop settings page is where you control the look and feel of your shop. Add your company logo, set the color scheme, and add an email address to receive notifications when leads come through your shop.

The custom domain is where your shop lives. You can link to your data shop from your company’s website, share it with customers directly as part of your sales process, or use it yourself on sales calls by sharing your screen as you walk your customers through your data offerings. Write to help@getsyndetic.com if you are having trouble wiring up your data shop to your company’s website.

  • Home page

If you have multiple datasets, your home page is where you will drive your customers to. You can add a title, subtitle, and hero graphic to your homepage.

  • Managing your customers

To see a list of customers who have purchased data from your shop and learn more about their purchases, click on the Orders link on the left panel. Here, you can see customer names, contact information, their search, and amount sold.

You can also view inbound leads the come through when customers click the Contact Sales button from your shop. You will also receive an email notification that looks like this:

If you have any questions, write to us at help@getsyndetic.com.

Data vendor tear sheets and why they matter

In our last post, we explored how the meaning and importance of data dictionaries have changed over time. 20 years ago, a data dictionary referred to a list of field names and types spit out by a database. Now, a data dictionary serves as a sort of spec sheet for a dataset: it must be both accurate and attractive, putting the dataset in its best light so potential buyers understand what the data means and also why it’s valuable.

Let’s move one level up to the so-called vendor tear sheet. A tear sheet is a one page summary document describing a data vendor and the dataset offering. It too must serve several functions – to describe the vendor at a high level, such as legal company name and contact information, but also to describe the data product as a whole – what type of data it is, if it’s a time series dataset or not, how often the data is updated, how the data is delivered, what are the legal and contractual requirements when buying the dataset, does it contain any PII, etc. etc. At Syndetic we’ve read hundreds of vendor tear sheets, spoken with dozens of data buyers, and come to the conclusion that just as with data dictionaries, it’s time for tear sheets to be reformed.

Recently a group of researchers in the machine learning community published a paper called Datasheets for Datasets, proposing a standardized process for documenting datasets. They cite inspiration from the electronics industry, in which electronic parts “no matter how simple or complex” are accompanied by a datasheet outlining their characteristics, recommended use cases, test results, and so on. The authors propose an analogous standard in machine learning, where every dataset is accompanied by a datasheet outlining its motivation, composition, collection process, and recommended uses.

We’d like to expand this recommendation beyond the machine learning community to argue that any dataset exchanged between two companies should be accompanied by a standard datasheet, or tear sheet. Both sides of the market benefit. For data providers, completing the tear sheet encourages reflection on how their data is sourced, delivered, maintained, and used. Often a tear sheet also acts as an SLA between buyer and seller, with the data provider promising to notify their customer if, for example, the data is updated less frequently than it is stated in the tear sheet. The burden to monitor conformance to tear sheets today still falls predominantly upon the buyer. For data buyers, a standard tear sheet makes it easier to compare data vendors and datasets with each other. It encourages accountability and transparency within the industry. It also gives the buyer something to refer to if the data doesn’t seem to conform to the specs outlined in the tear sheet after purchase. In this way, it acts as another contract between data buyer and seller – not legally binding like the data licensing agreement signed by the parties, but equally important to the success of the ongoing relationship.

Managing tear sheets is becoming increasingly complex for both data providers and data buyers. This makes the adoption of a standard all the more urgent. Account managers at data providers must manage tear sheets across all sales channels: direct to customer and indirectly through data aggregators which have proliferated in the past 12 months. Just keeping track of which tear sheets have been sent to who can be a nightmare. Similarly, data buyers must manage the tear sheets they receive from vendors and view in the aggregators. Important points about the dataset are often lost in Sharepoint attachments or in the personal notes of an employee. Standards, and a centralized tear sheet management system, make it much more likely that these attributes don’t get lost.

Some industries are starting to adopt their own tear sheet standards, such as the one adopted by the Alternative Data Council within the financial services industry. We are a member of the council, and we built the recommended FISD standard into Syndetic so that our customers, data providers who often sell into financial services, can create and manage their tear sheets from one central system. You can create one for free here.

We expect that other industries will follow the lead of financial services and move to adopt their own best practices and standards for data tear sheets. We will be closely following the standards as they evolve. If you know of an industry working to adopt a standard data tear sheet, drop us a line at tearsheets@getsyndetic.com.

Why does everyone hate their data dictionary?

As cofounder of Syndetic, I’ve talked to a lot of people about their data dictionaries. At this point, probably dozens of people, ranging from data governance managers at large enterprises to founders of early stage tech companies. And every single one of the lot hates their data dictionary. When I say hates, I mean that they say something like “Ugh. I won’t even show it to you. It’s an embarrassment.”

Why does everyone hate their data dictionary? A sort of meta-spreadsheet, a data dictionary on its face sounds like a relatively simple thing. It is a document describing the meaning of a dataset. Typically this includes field names and types (e.g. string, text, varchar) and maybe some annotations that describe the lineage of the data (where did it come from) and the business definition. But as with many workflows that are captured in spreadsheets, things can go awry very quickly.

  1. They are difficult to maintain.

The first person to create a data dictionary for their company usually has great intentions. They may be the first data scientist hired there, or the first data governance professional. They are diligent and organized, and dedicated to the mission of ferreting out every last bit of information about their information. They meticulously craft a spreadsheet (or google sheet) that contains the best information available to them at the time. They double and triple-check it for accuracy. But then, of course, things go off the rails.

Maintaining a data dictionary is not a full time job. And so, the person who created it cannot be expected to be thinking about this document at all times. They go back to doing their day job, and bit by bit, changes start to happen. Engineers change schemas without thinking to alert the person who created or keeps the data dictionary. Data salespeople call their prospects and walk them through the fields of the dataset, but realize that the annotations don’t quite make sense for the use case of their prospect. So they make a copy of the spreadsheet, change the annotation, and send it out. Product teams buy a system to capture data that used to be captured manually. And the dictionary very quickly gets out of date, often before anyone even realizes.

It is dangerous to have any company asset that is so dependent on one person in the organization, in this case the creator of the dictionary. If that person leaves the organization, all history of the document often goes with them, and a new person in that role may be tempted to just scrap it and start over. But then the problem repeats.

  1. They are necessary, but not sufficient, to fully explain the data.

Oftentimes, datasets are shared with business analysts or other non-technical people inside organizations who are tasked with assessing whether the data they are being provided is useful. For these people, receiving a data dictionary containing a bunch of field names and types is the standard. But it doesn’t really help them make the assessment they need to judge whether the data is of high quality; whether it is of better quality than can be purchased elsewhere by a different provider, or whether it has improved or decreased in quality over time. For these people, a data dictionary is often filed away and they go about other means to try to assess the data’s quality. Can you send me a sample of 100 rows? What are the coverage rates for each field? What are the most common values that I’m likely to see?

If it all looks good, and they start receiving the data, often they will move on to the next data provider and next assessment. Rarely do teams have the resources to monitor their incoming data files on an ongoing basis for anomalies, like a sudden increase in null values in a particular field. Even more rarely do teams conduct regular data assessments to ask for new sample sets or statistics on the data. They simply move on.

Data as a product is very different than an application or a service because its value is dependent on many other variables besides the data being good or not. For example, the usability of the data is extremely important. You can have the most complete dataset in the world on say, university rankings, but if the data is not usable, it is worthless. By useable, I really mean that it can be easily joined with other datasets. And that’s because people in the market to buy data on university rankings aren’t just curious whether Stanford is ranked #1 again this year. They want to answer questions that require the data to be joined with data on say, student populations, geography, or fundraising. Rarely can a dataset be so valuable in isolation. Data providers should understand this, and work as hard as possible to make their data easily combined with other datasets. 

Another area where data as a product is special is that it is (usually) a collection of facts. However the data was collected, if a dataset contains information about property transactions, there is an objective truth to the amount of those transactions. A prospective buyer of that dataset is primarily concerned with whether the data is actually accurate or not. If it’s not, it’s not only worthless, but also potentially very damaging to that company’s business, as decisions will be made (such as pricing) in reliance on that data. It is in every data provider’s interest to invest as many resources as possible to the accuracy of its data with rigorous testing and monitoring.

  1. They’re ugly.

The standard in data dictionaries is the good old Excel spreadsheet, closely followed by a word document that has been saved as a PDF. It’s curious to me that for all of the time and money that companies spend on product marketing, they do not do a good job of marketing their actual product, which is the data itself. Software companies often pride themselves on design and on making their application as user-friendly and intuitive as possible. But when they receive an inquiry about their data, they send over a spreadsheet. Surely there is a better way.

  1. They cause confusion within the organization.

As with any workflow trapped in a spreadsheet, users of the spreadsheet often don’t know if they can trust it. Before sending out the dictionary to an important prospect, a salesperson may look it over and ask a few people in engineering or product if it’s still accurate. They are unlikely to know. If a current customer has a problem with the data, like if the file breaks, and they call up the support team at the data provider, the support team is going to check the actual file that was sent to the customer. They are not going to check the data dictionary. And so you have a reference document that is not really reliable, which sows confusion among many teams that need to work closely together to support their product. Confusion = time wasted that can be spent on other more valuable things.

Hate your data dictionary? Drop us a line at inquiries@getsyndetic.com.

Introducing Syndetic

At Syndetic we make software for data companies. Syndetic literally means connective. It comes from the Greek word syndetikos, which means to bind together. We chose this name because data is connective. While it has become a cliche that data is the new oil, we actually see data as the connective tissue that binds companies to each other. They exchange it in the course of transacting; they ingest it to power their businesses, and they sell it (or give it away) to add value to the greater ecosystem. We’re starting today with tools for companies that sell data, which are often misunderstood. Data-as-a-service is relatively new, but more and more companies are offering a data product alongside their core business because they have, in the course of building that business, built a valuable dataset as well.

Companies that sell data need to do two things in addition to building their dataset:

  1. Convey the meaning of their data
  2. Convey its value

Achieving these two things is surprisingly difficult. Why? For one thing, there is a lack of tools in the market specifically designed for DaaS, which means there are a lot of hacked together solutions out there. For another, data is by its nature fluid. Building software for a thing that is constantly changing is difficult. Lastly, there is increased competition in the data marketplace, as more and more data companies are founded in every vertical, and more incumbents launch data products. 

So when a salesperson at a data company calls up a prospect and says “I have some really valuable data I’d like to sell you,” the first thing the prospect is likely to ask is “Okay, what kind of data is it? And how do I know that it’s valuable to me?” And then the salesperson will say, “Let me send you our data dictionary, which explains our data schema, and a sample of the data so you can see what you think.”

Today, data dictionaries are almost always spreadsheets. Some companies keep a spreadsheet in a folder in a shared drive, some use a Google sheet, and some use a Word document that consists of a list of field names, types, and their business meaning. Who manages that spreadsheet often depends on the size of the company and the structure of their data organization – at a tech company or a startup it might be a data scientist, but at a large enterprise it might be a data steward or person responsible for a data governance program. Some companies have lots of process around the spreadsheet – there is an owner who is in charge of the whole spreadsheet, or maybe certain tabs or fields. If someone else who uses the spreadsheet needs to make an update to it, they will send a request to the owner, or submit a ticket through a project management tool. There is some approval process to make the change. Only certain people are allowed to send the spreadsheet out to prospects or current customers. When a field name or definition changes, chaos reigns. Who is in charge of updating the spreadsheet? Who is responsible for letting the customer know so their data pipeline doesn’t break?

This is why we are introducing Syndetic: a web platform for conveying the meaning and value of your datasets. We are purpose-built for DaaS companies, so you don’t need to hack another tool to work for data. We know that it is fluid. We know that what engineering does affects the business side, and vice versa. We know that depending on the use case, different fields may mean different things to different people. 

Go to www.getsyndetic.com to get started – upload your current data dictionary, or create one from scratch.

Allison and Steve, Cofounders