Datatokens 2: Non-Fungible, Fungible and Composable Datatokens

For data baskets, limited-edition datasets, priced compute pipelines, and more

Trent McConaghy

Published in

Ocean Protocol

9 min readNov 25, 2019

[Related: Datatokens Part 1, Part 3]

Introduction

Ocean Protocol aims to kickstart an open Data Economy. Datatokens are a key part. Part 1 of this series gave the base premise — we can tokenize data access control — and described how this gives a new approach to data custody, via crypto wallets.

This article builds on that foundation. It asks:

What are driving use cases for fungible datatokens? How would we implement them?
What are driving use cases for hierarchically-organized (composable) datatokens? How would we implement them?

We will see that by answering these questions, we unlock a huge variety of new applications: data baskets, limited-edition datasets, priced compute pipelines, and more.

Together, all types of datatokens form the lifeblood of an open Data Economy.

Context: The Open Data Economy

Ocean Protocol aims to unlock data for AI to help democratize AI, and kickstart an open Data Economy. Just as we (the blockchain community) have been opening up the Money Economy, we aim to do the same with a Data Economy:

Then, what does a transparent, permissionless Data Economy might look like? Once again, we can draw inspiration from the Token Economy. The Token Economy has a base layer (reserve currency, app platform, funding platform) running on a blockchain substrate. It has a last-mile for utility — dapps — and an economic last-mile: token custody, token exchanges / other DeFi, and mining. It’s tokens that connect everything in the data economy. Each token typically has some monetary characteristics, which is befitting for a Money Economy.

In short, the foundations for the open Money Economy are a blockchain substrate and tokens. The heart and the lifeblood, respectively.

Similarly, we see that the foundations for the open Data Economy are a data-oriented blockchain substrate and datatokens. The heart and the lifeblood, respectively.

Within this Data Economy, the data-oriented blockchain substrate is about access control, which is built into Ocean Protocol, where v1.0 shipped in July 2019. Here’s the architecture of this substrate.

The second part key part of the Data Economy — the lifeblood — is datatokens. The previous post (Part 1) gave the base premise (tokenize access control) and focused on crypto wallets for data custody. This post examines three variants of datatokens and related use cases: ERC721 (non-fungible), ERC20 (fungible), and ERC998 (composable).

Non-fungible Datatokens (NFDTs), via ERC721

NFDTs Use Cases

Improve data custody, where only a single entity has access. Crypto NFT wallets become data wallets.
Existing NFT use cases. NFDTs can flow through existing crypto infrastructure for existing NFT use cases: NFT marketplaces, financial supply chains (e.g. with Centrifuge), and more.

Fungible Datatokens (FDTs), via ERC20

FDTs Use cases

Every non-fungible datatoken is its own unique snowflake. However, there are cases where you might want identical data access tokens, aka fungible datatokens (FDTs).

Here are some use cases for FDTs:

Improve data custody and data management, where more than one entity might have access. Sharing data (giving permission) is done by sending 1.0 datatokens to someone else. ERC20 wallets become data wallets.
Leverage more crypto infrastructure. This is probably the biggest use case for fungible datatokens. Most crypto wallets and exchanges work for fungible (ERC20) but not non-fungible (ERC721). This goes for other infrastructure too, such as DAOs and CEXes.
DeFi applications. Data DEXes, stablecoins, lending, derivatives, insurance, and more. Part 3 of this series explores the myriad data * DeFi use cases, which mostly use fungible datatokens.
Limited edition datasets. You can use artificial scarcity to help price discovery, akin to NYC taxi medallions. Imagine 1000 “limited edition” datatokens each providing access to a valuable data feed. These tokens could be bought and sold in an order-book based exchanges to provide price discovery. (Related: limited-edition digital art like our past ascribe work.)
Popularity-based pricing. Consider if you want the price of a dataset to go up as token popularity (# buyers) goes up, and down as popularity goes down, according to a pre-set schedule. This could be implemented in a bonding curve where the x-axis is # datatokens minted, and y-axis is OCEAN, DAI, or ETH. In this setting, there is no bound to the supply of datatokens.

ERC721 → ERC20: Re-Fungible Datatokens

Even if a datatoken starts non-fungible (ERC721), they can be wrapped with a fungible token (ERC20) to be made made “re-fungible”. Here, the ERC20 token would include a bonding curve.

Each time a person wants to buy a fungible datatoken (an ERC20 token to access a given dataset), that token is minted (going right on the y-axis of the bonding curve). The price with each new buy is according to the shape of the bonding curve; typically this is monotonically increasing which means price goes up with each new purchase.
Each time someone sells their fungible datatoken, the token is burned (going left on the bonding curve). The price of the token goes down after each sell.

Composable Datatokens (CDTs), via ERC998

CDTs Use Cases

Here are some reasons that we might want to hierarchically organize (compose) datatokens:

Streaming data. Consider a stream of data, where every interval of ten minutes there’s another chunk of data. You want to package and sell the last 24 hours worth of data as a single token.
Many data sources. Consider 100 data streams from 100 unique Internet-of-Things (IoT) devices. You want to package and sell one interval’s worth of 100 data chunks.
Data baskets. Consider: you’re an asset manager with ninja data science skills. You’ve grouped together an awesome group of 1000 datasets that each have individual (but small) value. You want to sell this group of data as a single asset to others growing their asset base, wanting to diversify by holding data assets. These data assets could be static or dynamic (streaming) data services.
Data indexes. Consider a future with thousands potentially investable datastreams. You track the top 100 and make it easy for others to invest in those as a single asset, similar to today’s index funds.
Data frames. You have a huge dataset, but you only want to give access to a subset of that dataset to others. You’ve specified the subset using a Pandas dataframe (Pandas is a mainstay of Python data science tooling).
Priced compute pipelines. Consider: You’re a data scientist and have put together a series of steps — a compute pipeline — to train a private AI model. You want to make that pipeline available to others. It needs to operate on private data, i.e. you can’t see the training data or intermediate results. How do you easily pay for the total set of services across the pipeline?
Annotating metadata including reputation or quality. You’re a data marketplace that wants to give more information about the dataset: what reputation your users have given the dataset, what is the dataset’s quality according to your marketplace’s quality measures, or whether it is input training data vs output. You want this data to be on-chain, but you don’t have access control to the metadata field.

On Implementing Composable Datatokens (CDTs)

ERC998 is a standard for composable tokens. Each item in the basket can be ERC20, ERC721, or an ERC998.

We can implement a composable datatoken (CDT) by simply collecting together any existing fungible (ERC20) or non-fungible (ERC721) datatokens into an ERC998 token. We can build larger hierarchies by collecting together ERC998s in addition to ERC20 and ERC721 datatokens.

Top down or bottom-up. ERC998 allows for top-down composition, where the holder at the root node ERC998 controls the rest of the tree. It also allows for bottom-up composition, where a token can “attach itself” to other tokens. E.g. metadata information could be attached to a raw blob of data.

Here’s how each use case above is handled.

(1) Streaming data. The chunk of each time interval is a non-fungible datatoken (NFDT). Then a composable datatoken (CDT) collects together 24 hours’ worth of NFDTs.

(2) Many data sources. Each independent datasource is a NFDT. A CDT collects them together.

(3) Data baskets. An ERC998-based CDT collects together sub-tokens (NFDTs, FDTs, and smaller CDTs) into an asset of value. This might also use other financial basket protocols like Set Protocol or Melon Protocol.

(4) Data indexes. One way to implement this is like (3), using top-down ERC998. An owner of the top-level ERC998 token would own all the sub-tokens in the basket (think Set Protocol). Alternatively, one could create new data index token and attach it to each of the datatokens, bottom-up style (ERC998). To make it tradeable, the bottom-up token would be attached to collateral (think Uma Protocol), or put into a prediction market (e.g. Gnosis).

(5) Data frames. A lower-level datatoken holds the whole dataset. Then a higher-level CDT holds permissions to just the subset, where the specific subset is described in a Pandas dataframe stored in the higher-level CDT metadata.

(6) Compute pipelines. The pipeline might look like: input raw training data X/y → clean the data → store cleaned data → build model → store model → input raw test data X_test → run predictions → store predicted result y_test. This is an interleaving of data service → compute service → data service → etc. It could be executed as an Ocean Service Execution Agreement (SEA) where Ocean orchestrates the steps. Here’s the extra-interesting part: each compute & data service is itself tokenized. Then there’s an ERC998 composable token that holds each of those tokens, along with metadata about how to connect them (e.g. SEA style).

(7) Annotating metadata, e.g. reputation, quality. Your marketplace would do this by creating a new token holding the extra metadata, then attaching that new token to the existing dataset token using ERC998 bottom-up approach. Alternatively, you could use a “tagging” standard.

Note: while we use ERC998 as an example, the idea of CDTs works for other protocols too.

Summary

The following image summarizes the relation among datatokens.

Left to right: non-fungible datatokens (NFDTs); Fungible datatokens (FDTs); composable datatokens (CDTs).

Conclusion

Ocean Protocol aims to kickstart an open Data Economy. Datatokens are a key part.This article described how we can have non-fungible, fungible, and hierarchically-organized (composable) datatokens. NFDTs, FDTs, and CDTs. These tokens unlock a broad range of new applications for datatokens, such as data baskets, limited-edition datasets, chunks of streamed data, compute pipelines, and more.

Acknowledgements

Thanks to Matt Lockyer and my colleagues at BigchainDB for feedback on this article.

Datatokens Series & Related Media

This blog post is one of a series:

“Datatokens 1: Data Custody. Data access control, meet crypto wallets & data DAOs.” [link]
“Datatokens 2: Non-Fungible, Fungible and Composable Datatokens. For data baskets, limited-edition datasets, priced compute pipelines, and more.” [this article]
“Datatokens 3: Data and Decentralized Finance (Data * DeFi).
Data as Collateral. Data DEXes, Data Loans, Data Stablecoins, and Datatokens in Financial Supply Chains.” [link]

Related media:

This talk at Outlier Ventures Diffusion, Oct 2019 in Berlin, explored how datatokens could be used. This talk at RadicalxChange, Nov 2019 in Berlin, riffed on it further.
This blog post (and related talk) describes datatokens in the context of mobility.

Updates

[Aug 12, 2020] Changed from needing a “base” datatoken to be ERC721, to the possibility that a base could be ERC721 or ERC20. This better reflects the Ocean V3 release which implements datatokens as ERC20s.
[Aug 17, 2020] Changed “datatoken” to “datatoken” (no space) to be consistent with Ocean V3 release.

Follow Ocean Protocol via our Newsletter and Twitter; chat with us on Telegram or Discord; and build on Ocean starting at our docs.