Ocean Compute-to-Data in JupyterLab

A demo of remote execution of compute algorithms on datasets

Samer Sallam
Ocean Protocol

--

[update Nov 2021: this blog post is obsolete. The updated flow is via ocean.py README. You can do it from a basic Python shell, or from a Jupyter notebook.]

Ocean Protocol is committed to incentivize the sharing of data for the benefit of Data Science and AI (Artificial Intelligence) practitioners. With the initial release, we enabled blockchain-based escrow payments and service agreements to facilitate exchange of datasets in an automated access control system leveraging Ethereum blockchain smart-contracts.

Some data providers have strict privacy requirements that disallow data to be accessed directly and being downloaded by third-party consumers. Such privacy concerns make data providers uncomfortable with sharing their data. To mitigate this situation, instead of allowing data consumers to download the data directly, it is possible to deliver the code or compute to the data to perform the analysis or modeling wherever the data resides.

Ocean’s v2 release Compute-to-Data includes a general-purpose system to support remote execution of compute algorithms on datasets. Here we present an example of how to use this system in a JupyterLab environment using a Python notebook. This example makes use of the squid-py SDK which supports all the features of Ocean Protocol.

A previous blogpost introduced the basics of Manta Ray — Ocean Protocol for Data Scientists. In this post, we will focus on the compute-to-data feature and will only demonstrate a shorter version of the complete notebook found here and can be run in the JupyterLab instance at datascience.oceanprotocol.com.

Let’s dive in

We start by creating an Ocean instance and new accounts: a publisher account to publish the dataset in Ocean Protocol as a new asset, and a consumer account to order the compute service and run the algorithm.

from squid_py import Ocean, Config
from mantaray_utilities.user import create_account
configuration = Config(OCEAN_CONFIG_PATH)
ocn = Ocean(configuration)
# Create new accounts for publisher and consumer
publisher_acct = create_account(faucet_url, wait=True)
consumer_account = create_account(faucet_url, wait=True)
Balance of consumer account on Ocean Testnet.

Publish dataset with a compute service

Now that we have accounts setup and filled with some ETH and OCEAN tokens, let’s publish a dataset with a compute service.

First, we define the compute service attributes. The compute service is managed using a Kubernetes cluster and is accessed via Brizo API which is the data providers proxy HTTP server.

# Build compute service to be included in the asset DDO
cluster = ocn.compute.build_cluster_attributes(
'kubernetes',
'/cluster/url'
)
containers = [ocn.compute.build_container_attributes(
"tensorflow/tensorflow",
"latest", "sha256:cb57ecfa6ebbefd8ffc7f75c0f00e57a7fa739578a429b6f72a0df19315deadc")
]
servers = [ocn.compute.build_server_attributes(
'1', 'xlsize', 16, 0, '16gb', '1tb', 2242244)]
provider_attributes = ocn.compute.build_service_provider_attributes(
'Azure', 'Compute power 1', cluster, containers, servers
)
attributes = ocn.compute.create_compute_service_attributes(
price=13,
timeout=3600,
creator=publisher_acct.address,
date_published=get_timestamp(),
provider_attributes=provider_attributes
)
service_endpoint = Brizo.get_compute_endpoint(ocn.config)
template_id = ocn.keeper.template_manager.create_template_id(
ocn.keeper.template_manager.SERVICE_TO_TEMPLATE_NAME['compute']
)
service_descriptor = ServiceDescriptor.compute_service_descriptor(
attributes,
service_endpoint,
template_id
)

Brizo is the provider service proxy (a RESTful API). Its job is to serve all of the data related service requests.

Then create the dataset Asset and register it on-chain and in the metadata store (Aquarius).

# Get example of Meta Data from file
metadata = get_metadata_example()
ddo = ocn.assets.create(
metadata,
publisher_acct,
[service_descriptor],
providers=[provider_address],
use_secret_store=False
)
# Verify the asset was created successfully both in the metadata store and on-chain
asset = ocn.assets.resolve(ddo.did)
assert asset and asset.did == ddo.did
New asset is successfully registered.

Let’s take a peek at the contents of a compute service definition.

compute_service = ddo.get_service(ServiceTypes.CLOUD_COMPUTE)
pprint("Compute service definition: \n{}".format(json.dumps(compute_service.as_dictionary(), indent=2)))
>> Compute service definition:
{
"type": "compute",
"serviceEndpoint":
"https://brizo.marketplace.dev-ocean.com/api/v1/brizo/services/compute",
"attributes": {
"main": {
"name": "dataAssetComputingServiceAgreement",
"creator": "0x3f4311103a92a6dE4647f56e9e585120caa821C7",
"datePublished": "2020-05-27T20:49:20Z",
"price": "13",
"timeout": 3600,
"provider": {
"type": "Azure",
"description": "Compute power 1",
"environment": {
"cluster": {
"type": "kubernetes",
"url": "/cluster/url"
},
"supportedContainers": [
{
"image": "tensorflow/tensorflow",
"tag": "latest",
"checksum":
"sha256:cb57ecfa6ebbefd8ffc7f75c0f00e57a7fa739578a429b6f72a0df19315deadc"
}
],
"supportedServers": [
{
"serverId": "1",
"serverType": "xlsize",
"cpu": 16,
"gpu": 0,
"memory": "16gb",
"disk": "1tb",
"maxExecutionTime": 2242244
}
]
}
}
},
...

A complete sample of a compute service lives in squid-py.

Ordering the compute service

Now that the dataset Asset is published with the compute service, let’s see how to order the compute service and run the algorithm.

# Order the compute service and submit payment
agreement_id = ocn.compute.order(
ddo.did,
consumer_account,
provider_address=provider_address
)
compute_approval_event = ocn.keeper.compute_execution_condition.subscribe_condition_fulfilled(
agreement_id, 30, None, [], wait=True, from_block=0
)

Getting the “compute_execution_condition” event indicates that the provider has acknowledged the user’s request for the compute service and the user can start running compute jobs.

Running the computation

To start the computation, we will prepare the algorithm metadata which includes the raw algorithm code as well as required information about the algorithm.

Alternatively, the algorithm and its metadata can be published as an Asset in Ocean Protocol similar to publishing the dataset above. Running a published algorithm only requires sending the algorithm id (the Asset DID), see squid-py.

# Prepare the algorithm script and description
algorithm_text = get_algorithm_example()
# build the algorithm metadata object to use in the compute request
algorithm_meta = AlgorithmMetadata(
{
'language': 'python',
'rawcode': algorithm_text,
'container': {
'tag': 'latest',
'image': 'amancevice/pandas',
'entrypoint': 'python $ALGO'
}
}
)

The “ALGO” macro in the entrypoint attribute is replaced with the executable algorithm path.

And now with all parameters ready, we can start the compute job using the above algorithm.

job_id = ocn.compute.start(
agreement_id,
ddo.did,
consumer_account,
algorithm_meta=algorithm_meta
)
Compute job started using the aforementioned algorithm.

The start compute request above will return a job_id which can be used to request status and results information about the running job.

status = ocn.compute.status(agreement_id, job_id, consumer_account)
Status of the compute job that was started.

At this point we need to wait until the job is completed. We can keep checking the job status until we get a “Job completed” status message.

Finally, the results are ready.

result = ocn.compute.result(agreement_id, job_id, consumer_account)
print(f'The result is here: {result.get('urls')[0]})

The results urls above can be used to download the results files directly.

You can try out the notebook example at datascience.oceanprotocol.com. Log in with your github account, then once JupyterLab has loaded, open up the s05_compute_to_data notebook and have fun.

Follow Ocean Protocol on Twitter, Telegram, LinkedIn, GitHub & Newsletter for project updates and announcements. And chat directly with other developers on Gitter.

--

--