The Data Value-Creation Loop

Theory & practice on data flywheels and supply chains, with a case study in Ocean Predictoor

Published in

Ocean Protocol

15 min readSep 7, 2023

*The Data Value-Creation Loop.* Create value from data, make $ from that value, and reinvest. *Data* is at each step of the loop.

== Contents ==

1. Introduction
2. The Data Value-Creation Loop
3. Using the Loop to Analyze Opportunities
4. From a Loop to a Data Supply Chain
5. Money, and Data Supply Chains
6. Embedding in a Tokenized Ecosystem
7. Case Study: Ocean Predictoor
8. Conclusion

Complementary media: [video][slides-PDF][GSlides]

Summary

How does one survive and thrive in the open data economy? The heart of this is making money, of course. But that’s too simple to be useful. We need a better framework to think about this, to can help guide selection of verticals, of specific product ideas, and refinement of those ideas.

This article introduces such a framework: the Data Value-Creation Loop. And unrolled, it’s a data supply chain. This article provides theory, along with guidance about how to put the theory into practice.

This article wraps up with a case study: how Ocean Protocol has used this framework to create Ocean Predictoor.

1. Introduction

The core infrastructure is in place for an open data economy. Dozens of teams are building on it. But it’s not 100% obvious for teams how to get successful. We ask:

How do people *sustain* and *thrive* in the emerging open data economy?

The goal: from nothing, to (barely) sustaining, and finally to thriving. Money matters.

The answer can be simple:

Ensure that they can make money!

But what does that mean? What specific actions can be taken? We need to dive deeper. The next question is:

*How* do people make money in the open data economy?

Here are first-cut answers, from our vantage as a data economy tool-provider (Ocean): give affordances for teams to monetize selling data & running marketplaces, give more flexibility on gathering fees, and more.

These are all needed. And (for Ocean) they’ve been built, and they’re being used. But this is not enough. Just because a team can take payments, doesn’t mean the payments are coming! So we must go further.

We need a better framework to think about this, which can help guide selection of verticals, of specific product ideas, and refinement of those ideas. This article introduces such a framework, shows how it can be used in general, and how Ocean Protocol has used it including the case study of Ocean Predictoor.

2. The Data Value-Creation Loop

2.1 Introduction to the loop

To the question of “How do people make money in the open data economy?”, our answer is:

Create value from data, make money from that value, and loop back and reinvest this value creation into further growth

We call this the Data Value-Creation Loop. The figure at the top illustrates.

2.2 Details per Step in Loop

Let’s go through the steps of the loop.

At the top, the user gets data by buying it or spending $ to create it.
Then, they build an AI model from the data.
Then they make predictions. E.g. “ETH will rise in next 5 minutes”
Then, they choose actions. E.g. “buy ETH”.
In executing these actions, they data scientist (or org) will make $ on average.
The $ earned is put back into buying more data, and other activities. And the loop repeats.

In this loop, dapp builders can help their users make money; data scientists can earn directly; and crypto enthusiasts can catalyze the first two if incentivized properly (e.g. to curate valuable data).

2.3 Data is Throughout the Loop

Data is at each step of the loop. And it’s not just just training data. Training data is at the first step; there’s also model parameters as data, and predictions as data.

3. Using the Loop to Analyze Opportunities

3.1 Does it actually close the data value-creation loop?

As a baseline: if your project idea can’t even close the loop, then it’s dead-on-arrival. In this case, you need to go back to the drawing board. You need to improve your idea to close the loop, or you need to explore other ideas.

Be careful to ensure that you can actually close the loop. It’s extremely seductive to tell yourself that you’ve closed the loop, so that you can feel like you’ve met the constraint and can move on. Don’t fool yourself. Ask yourself: “if this thing is live, do the parts fit together where there can truly be $ flowing through?” Red team yourself. Ask, will the machine work?

3.2 Key criteria for opportunities

Let’s say you’ve filtered down to verticals / project ideas that have a realistic chance of closing the loop. There are perhaps dozens of verticals or hundreds of possible opportunities. Then, how do select which to pursue? We’ve found two measuring sticks help the most.

Key criteria:

How quickly can users go through the loop? Some verticals or projects may take 5–10 years to go through, such as drug discovery. Others may take 10 seconds, for example DeFi trading. This spectrum of time scales spans 10 orders of magnitude! Choose wisely.
What’s the expected $? This is the total addressable market (TAM) times the probability of success. The latter depends on your product, technology, go-to-market, competition, and so on. How much of an edge do you have? Or, how much of an edge could Ocean tech give you?

3.3 Using the loop to analyze verticals / projects

We can then use these two criteria as two dimensions to analyze verticals and ideas. For any given data application, the loop should be fast with serious $ opportunity. We analyzed dozens of possible verticals with according to these criteria. The image below illustrates.

Analysis of candidate verticals and projects via criteria of (a) expected $ and (b) speed of iterations through the Data Value-Creation Loop

Here are some examples from the image.

Small $, slow. Traditional music is small $ and slow, because incumbents like Universal dominate by controlling the the back catalogue.
Large $, slow. Medicine is large $ but slow, due to the approval process.
Small $, fast. Decentralized music is fast but small $ (for now! Fingers crossed).

We want: large $, fast. Here are the standouts.

Decentralized Finance (DeFi) is a great fit. One can loop at the speed of blocks (or faster), and trade volumes have serious $.
LLMs and modern AI is close: one can loop quickly, and with the right application make $. The challenge is: what’s the right application?

It’s ok for you to disagree with the specific assessments here! We encourage you to go ahead and do your own analysis, embedding your own ideas:)

3.4 Impact of Data Value-Creation Loop on Ocean Strategy

First: at Ocean, we encourage our ecosystem collaborators to close the data value-creation loops, with maximum $ and speed.

Second: we follow our advice for internal projects too. Accordingly, a lot of our focus is on the DeFi and LLMs / modern AI. Ocean Predictoor is a prime example, as this article later elaborates. These are natural for us because Ocean is natively a Web3 and an AI project.

Loops, then scale. Once one or two fast & high $ data value-creation loops have been established on Ocean, where people are sustainably making money, we’ll likely adjust our activities to scale those loops up.

Ubiquity. Our aim is to grow over the long-term, until Ocean is ubiquitous as a tool to level the playing field on data and AI.

4. From a Loop to a Data Supply Chain

4.1 Supply chain example: coffee supply chain

The image below shows a (simple) supply chain for coffee. Coffee farmers in e.g. Costa Rica grow coffee beans. They sell them to distributors. Via ≥1 distributors the beans eventually make it to a coffee roastery, which roasts the beans. Finally the roastery sells the beans to a retail coffee outlet.

Sometimes one organization handles >1 step. Eg Starbucks not only sells coffee at retail, it also roasts its own beans and handles some of the distribution aspect.

4.2 From Loop, to Data Supply Chain

What if we unroll the data value-creation loop? We get a data supply chain. The image below illustrates.

In the data supply chain, data is at each step of the chain. As in the loop, the data at each step is different: raw data, models, predictions. The image below illustrates.

Data supply chain. There’s data at each step, until the last mile.

5. Money, and Data Supply Chains

5.1 Money, and Coffee Supply Chains

Consider the coffee supply chain. While the product flows forward (bean to cup), the money flows backward. It starts at the last mile where retail customers spend $ to buy their product at e.g. Starbucks. Then it disperses backwards through the roasters, distributors, and finally to coffee farmers. The image below illustrates.

5.2 Is there actually $ in the coffee last mile?

There’s a baseline constraint: if there are no last-mile customers, then there’s no $ to the rest of the chain. If there was no Starbucks, hipster cafes, or greasy-spoon diners, then no $ is entering the system. And no amount of heroics from farmers or distributors or roasters will be lead to sustainable $.

5.3 Who makes the serious money in coffee?

Coffee is already an established industry. There are many last miles. Everyone in the chain does get paid to keep doing their thing. But, who’s making the serious $ ? Let’s do a quick analysis.

At the last-mile side: at Starbucks, one cup of large iced coffee costs $4.00
At the first-mile side: a Costa Rican farmer gets $23 for 1 kg of coffee beans. 1 kg of coffee beans makes about 140 cups. Therefore it’s $0.16 per cup.

Coffee supply chain: who makes the serious $ ?

In coffee, the last mile makes the most, the first mile the least. The last mile (e.g. Starbucks) is closest to the final customer, charges more, and has more room to charge a higher margin. The first mile is typically in a commodity game, where differentiation is difficult and it’s a race to the bottom for margins. It’s the reddest of oceans.

It’s way easier to make serious $ as a Starbucks than as a Costa Rican coffee farmer. (Let alone a Canadian wheat farmer, from personal experience!)

This generalizes. In most supply chains, the most opportunity for serious $ is at the last mile.

5.4 Money, and Data Supply Chains

Let’s map our learnings from coffee supply chains to the nascent concept of data supply chains.

The data product flows forward as data being operated upon; and the money flows backward. The money starts at the last mile where users take action on prediction data to make $ (e.g. making a successful DeFi trade). Then it disperses backwards through the data supply chain, to the raw data suppliers. The image below illustrates.

5.5 Is there actually $ in the data last mile?

As a baseline: if there are no last-mile customers opening their wallets for $, then there’s no $ to the rest of the data supply chain. And, there’s no $ people running inference on models to make predictions, there’s $ to AI/ML people building the models, and there’s $ for the raw data suppliers.

This constraint is the supply-chain extension the prior constraint “can you close the loop”? It emphasizes the source of $ even more.

This baseline constraint may sound obvious, right? Amazingly, not in practice. We’ve seen too many data projects that have not even considered where the $ might come from. And sometimes we’ve been guilty of it ourselves 🙈.

Furthermore, you can’t just LARP that you’ve considered making $, and be done with it. Ask, as with the loop: “if this thing is live, do the parts fit together where there can truly be $ flowing through?” Red team yourself.Ask will the machine work?

If you can’t honestly answer “yes” to the above, then go back to the drawing board. Improve your idea, or explore other ideas, until the machine works.

5.6 Who makes the serious money in data? What’s the most valuable data?

While coffee is already an established industry, data generally is only partly there, and decentralized data specifically is only nascent. But we can hypothesize: where are the best opportunities for serious $ ?

Recall: in coffee supply chains, the serious $$ is in the last mile, right next to the final customer. We hypothesize that it’s the same for data supply chains. In this case, the $$ from “final customer” is result of actions taken from data. For example, in DeFi trading, the $$ is from a successful trade, as a result of actions taken from prediction data.

Some of $$ from the last mile (take action, make $) then goes backwards to makers of predictions, then model parameters, then raw data. The image below illustrates.

Data supply chain: who makes the serious $ ?

The last mile is “take action, make $” and doesn’t have a data product. The second-last mile is “predict” and does have a data product (predictions). Therefore, our model is: the most valuable data is predictions.

And of course the predictions must be accurate, or they’re useless. Often, the more accurate, the more valuable yet.

6. Embedding the Loop in a Tokenized Ecosystem

Q: I’m part of a token project with a data orientation. How do I connect its higher-level token design with the Data Value-Creation Loop?

A: You can embed the Data Value-Creation Loop as a subsystem / component within the higher-level token design.

Let’s illustrate via examples. The Web3 Sustainability Loop is a system design for long-term growth of Web3 projects. It’s been widely taught through Token Engineering Academy. Ocean Protocol, Boson Protocol, and others use it. Even Ethereum implicitly uses it, via EIP1559. How can that loop connect with the Data Value-Creation Loop?

Here’s how: There may be ≥1 Data Value-Creation Loops inside the Web3 Sustainability loop. The image below illustrates for the case of Ocean Protocol. The data loops are in the ecosystem box.

7. Case Study: Ocean Predictoor

7.1 Introduction

The previous sections were theory along with guidance about how to put the theory into practice. This section describes how Ocean Protocol has acted on this theory and guidance for itself, to create Ocean Predictoor.

7.2 What product to focus on?

First, we chose predictions as the data type. Most valuable, remember?

Then, we chose the DeFi as the vertical. It had the highest expected $ and speed of iterations. It didn’t hurt that Ocean’s always worked closely DeFi tools: AMMs for data DEXes, OCEAN → veOCEAN locking & curation, OCEAN-backed stable asset H2O, and more.

Within DeFi, we filtered to ideas that can actually close the loop, where there’s a last mile that makes $ and then uses it to buy predictions. For example, traders already make $ from trading, and have demonstrated that they’re willing to buy data (e.g. exchange data from Kaiko).

Applying these constraints, potential last-mile customers included traders, MEV searchers, loaners, and liquidity providers. Among those, traders and MEV searchers have the fastest speed of iterations, with high expected $ (depending on product). MEV requires sub-block latencies, which Ocean tech hasn’t been optimized for. Hence, we focused on traders.

From that, we quickly zoomed into ideas around “prediction system” that could evolve into “trading system”. The image below left illustrates. And, below right shows how we saw the Data Value-Creation Loop could close.

Left: choosing DeFi vertical. Right: closing the Data Value-Creation Loop, traders make $ with predictions as alpha bought from predictoors.

We envisioned that the people making predictions (“predictoors”) would take care of all the previous steps in the data supply chain as well. This meant that there would only be two key actors in the system: predictoors and traders. The image below illustrates.

7.3 High-level Token Engineering steps

With the product focus in mind, here’s the high-level TE steps that we took.

7.4 Setting up rules of the game

We experimented with several token games, with prototypes in Python. We realized the challenge of keeping predictions private, so explored several prototypes for that, and choosing Oasis Sapphire in the end. Then we built the system: smart contracts, Python code (for the users, and for us).

The image below shows the game that was chosen and built. Predictoor docs have details.

The Predictoor game that was chosen and built.

Predictoor is a multi-sided platform: traders buying prediction feeds, and predictoors submitting predictions. It needed to solve the empty network problem. Our solution: we do the AI/ML legwork to get our own prediction bots running. That way there’s always at least some predictions. Beyond that, the game can take over.

With the Python code interface, the webapp running, and the AI/ML prediction bots running, we launched Predictoor.

7.5 Make the game easy to play

We narrowed our target users (predictoors, traders) to data scientists that know Python and aren’t scared of running Web3 in Python. This meant that their main interface could be libraries, or even easier, github repos that they could git clone or fork. We went with the latter.

Yet we also built a webapp, shown below right. Why? Because we wanted target users to quickly grok what the product would do, then proceed to the code interface.

We built the predictoor.ai webapp to help newcomers to quickly grok the product

7.6 Kickstart the network with token incentives

We’d already solved the “fully empty” network problem by running our own AI/ML prediction bots. We knew we wanted more people to play the prediction game. How: incentives for people to run accurate predictoor bots even if initial organic sales to traders were low.

We considered several designs and arrived at a simple one: run a bot that steadily buys feeds, with those sales making their way to predictoors. We rolled it out as Predictoor Data Farming.

In the 6 months that followed, these incentives have led to >$400M in monthly staking volume, 385K monthly transactions, and accuracy steadily increasing. It’s working:)

7.7 Optimize

We are optimizing on our own internal modeling and trading: “eating our own dog food” gives us empathy for the user and enables rapid iterations to improve. We have steadily improved the flow for predictoors and traders, within the READMEs. And we have added a simulator, a CLI, and analytics. The image below illustrates.

The main predictoor & trader interface is READMEs. We have added a simulator, CLI, and analytics

8. Conclusion

This article introduced such a framework: the Data Value-Creation Loop. And unrolled, it’s a data supply chain. It provided theory, along with guidance about how to put the theory into practice.

Finally, it described how Ocean Protocol has acted on this theory and guidance for itself, to create Ocean Predictoor. Predictoor has promising traction.

Notes

I (Trent) originally published this content in “Ocean Protocol Update || 2023”, Mar 10, 2023.

To make the content evergreen-useful and for easier reference, I extracted and adapted it into this standalone article, published on Sep 7, 2023.

On May 2, 2024, I gave a talk about this loop for Token Engineering Academy. For that talk, I finally put much more of the ideas to paper (so to speak), and added the Ocean Predictoor work of the previous 14 months. Here are the GSlides and the video from that talk.

On May 4, 2024, I drew on the recent talk’s content to update this article itself. Tweet.

About Ocean Protocol

Ocean was founded to level the playing field for AI and data. Ocean tools enable businesses and individuals to trade tokenized data assets seamlessly to manage data all along the AI model life-cycle. Ocean-powered apps include enterprise-grade data exchanges, data science competitions, and data DAOs. Follow Ocean on Twitter or TG, and chat in Discord.

In Ocean Predictoor, people run AI-powered prediction bots or trading bots on crypto price feeds to earn $. Predictoor has over $800 million in monthly volume, just six months after launch with a roadmap to scale foundation models globally. Follow Predictoor on Twitter.

Data Farming is Ocean’s incentives program.