After you create your Syndetic account, we want to make it as easy possible for you to get started creating shareable data dictionaries. After you first create your account, you are brought to this screen:
There are two workflows: one for users who already maintain a data dictionary and would like to import it into Syndetic, and one for users who are starting from scratch and want to get started sharing information about their datasets.
At Syndetic we think of everything in terms of datasets. A dataset is the slice of data that you want to explain to another person. These may be different bundles of data that you sell, or they may be different categories of data that you work with for internal purposes. For example, let’s say that your company sells data on financial institutions. You may have one dataset related to asset managers, another related to prime brokers, and a third related to stock exchanges. Each of these datasets needs to be individually packaged into a product, which means it needs to be explained and marketed. Collectively, all of the explanations of all of the datasets together comprise your data dictionary.
If you already have a data dictionary, simply click on the Upload your data dictionary button and send it to us. We’ll take your dictionary in whatever format it lives now – Excel spreadsheet, Google sheet, Word document, PDF – and break it down into its component datasets and load them into Syndetic for you. We turn around dictionaries in 1-2 business days. Once loaded, we’ll send you an email and explain how to manage and start sharing your datasets.
If you don’t have a data dictionary, you hate the one you have (most people do!), or you want to start fresh, click Create a dataset.
Now you’ll be prompted to give a name and description to your dataset. Remember, this is to identify a slice of data that you want to share with another person. Use a name and description that you think will be most helpful in explaining the dataset to someone who is not intimately familiar with your database. You can always change it later.
Once you’ve described your dataset, you’ll be brought to the screen to upload your data extracts. A data extract is the data itself; it is required in order to get the automatically generated statistics (like coverage rates, top values, and character ranges) and automatic samples. You can use Syndetic without loading a data extract, but it is not nearly as valuable. As we like to say around here, statistics are worth 1,000 spreadsheets! We want to make it as easy as possible for the recipient of your dictionary to get a sense of the shape of the data you are sharing. These simple statistics (along with a small sample set) are the best way to convey the meaning and value of your data.
So click on Upload data extract and you will be brought here.
The self-serve version of Syndetic accepts csv formatted files with headers in the first row and files hosted on the internet.
If you have an Excel spreadsheet, you can click File — Save As and select csv from the drop-down menu to re-save your file in a CSV format. Note that in order to save an Excel file as a csv, your spreadsheet will need to have only one tab.
The Enterprise version of Syndetic allows you to hook up your database directly so that you can pull extracts as often as you’d like, and get automatically refreshed statistics and samples. Contact us for more info. We can connect to any database that speaks ODBC/JDBC, the standard for database connections. Even with the Standard version of Syndetic, every time you upload a new data extract, the stats and samples will automatically refresh.
Once you’ve loaded your dataset, you can get started on the fun part – annotating your fields. We think this part is also the most important, because it’s where you convey the meaning behind the data. Where did this data set come from? What does it mean to your customer? How do you want to market this dataset? Maybe that’s customer-specific, because different customers may use the same dataset for entirely different purposes. Knowing the intended use case of your customer and describing the dataset for that use case is key. To start annotating, click on Manage fields.
When you click on a field, you are shown the common values and statistics on that field, to help with your annotating. When you click Edit for that field, you are taken to this page.
Because these annotations will appear in your final, published dictionary, think about how you want to describe this field to your customer.
- Display name is useful if your data contains hard-to-read or especially long field names that you can’t change, e.g. sales_id_quart_20170412
- Description is meant for describing the field from a business perspective to your customer. Keep it relatively short so that it fits neatly in your dictionary summary table on the published page.
- Lineage is meant for describing where the data comes from – is it system generated from your app? Was it entered manually by a field sales rep? Does it come from a database you integrated into your tech stack years ago from a company you merged with? Data lineage is important for understanding the history behind this data and why it is what it is.
- Notes are meant for any extra information about this field that may be helpful to your customer in understanding this field in particular. It’s almost like an introduction to the field, and will show up in bright blue in the published page.
For every field you have the option to publish or hide the statistics and values for that field. You also have the option to hide that field from the sample sets that are auto-generated by Syndetic. This may be useful for fields that contain particularly sensitive information. If you choose to hide a field from the sample sets by unchecking that checkbox, on the published page it will look like this:
From the main datasets page, you can click on Preview at any time to view your dictionary live and see how it will appear to other people you share the link with. This should help with your annotations along the way.
Any changes you make to your dictionary using the management layer are immediately reflected on the published page. It’s as fast as just hitting refresh.
We want annotating to be collaborative, which is one of the many benefits of using Syndetic over a spreadsheet. You may want to involve data engineers, customer-facing relationship managers, or salespeople in your annotation process. To add users to your organization, click on Add a User.
When you add a user, you set their email and password, and an email will be sent to their address notifying them that they’ve been added to your organization. You can set permissions for each user that determine their capabilities:
- Allow this user to manage other users means this user can create other users for your organization
- Allow this user to create and edit objects means this user can add datasets, annotate fields, and edit content.
By default anyone who signs up from the Syndetic home page creates their own organization, so if you want to sign up your whole team, only one person should create an account from the home page, and they should invite the other users from their team. For enterprise packages, contact us.
From your organization’s homepage, you can also Add a Logo which will be included on all of your published dictionaries, and manage your billing. Make sure to add a logo so you get a snappy published dictionary page like this!