How to navigate the exchange

Navigating a new interface can be daunting, so we're here to make the Valyu exchange easier to understand. Our goal is to break it down for you and provide practical tips on how to use it efficiently.

Here is what you will see when you first land on the exchange website:

This is the open data view of the exchange, where you can browse all of the thousands of open datasets we have available to find the ones specific for your use-cases, whether that be training/fine-tuning/RAG. Lets break down the key components:

  • Side-bar - Here you can filter the datasets base on your needs. Make use of the AI-powered search for filters to find exactly what you are after.

  • Top-bar:

    • Navigation - At the top left we have the navigation to other (currently in beta) sections of the exchange.

    • Search - This AI-powered search bar performs a search on all the datasets over the full datacard, so search by category/name/language/etc

    • Sign in - Click here to request beta-access to unreleased parts of the exchange.... click here

  • Datacards - All of the (filtered) datacards are displayed in the centre, for more details on how to make sense of these datacards click here Once you find a dataset you like, click on it to bring up the provenance view:

Provenance view

Once you have selected a dataset, you will be brought to the provenance dashboard, lets break down what you're seeing:

  • Provenance view - In the centre we have the provenance view. This shows the full lifecycle of the data. To the left of the dataset we have its origins:

    • Parent datasets (available on the exchange, or an identifier for datasets that aren't on the exchange)

    • Sources, e.g. wikipedia, reddit, ....

    • Models, where the data has been generated in part by a machine learning model

    • For full documentation on the datacards click here

  • To the right of the dataset we have its children:

    • Models trained on the dataset

    • Datasets derived from the current one

  • Side-bar - Here we have navigation to the characteristics, and license pages of the dataset. More on this below

  • Top-bar - The dataset name, and beside it is the datacard score. If you have found that the dataset is right for your use-case, then you can click on the download dropdown menu and download the dataset

So, you've verified the history of the dataset, you now want to taker a closer look at the data itself, navigate to characteristics:

Characteristics view:

The characteristics view gives you an overview of the topics included in the dataset, the languages used, text metrics (or video/audio/time series for other modalities), freshness, and more.

Lets break down what we are seeing:

  • Dataset Quality - This star diagram has the following properties:

    • Freshness

    • Documentation

    • Openness (with respect to licenses)

    • Downloads (hugging face)

    • Properties (currently undefined, please give us feedback on what you'd like to see)

  • Other metrics:

    • Description

    • Topics

    • Metrics

    • Metadata

    • Identifier

If this dataset is still ticking all the boxes, it is time to move to the licenses page:

License page

Here we see all of the licenses that apply to the dataset. Each license has its own license card that gives an overview of:

  • What the license means you can do with the dataset

  • What the license prohibits you from doing

  • Tips on staying compliant

  • A summary of the license

It will also highlight any conflicts between license:

Last updated