Skip to content

Hello world!

Let’s create a simple datakit to add some numbers together.

Prerequisites

Installing the CLI

First, we need to install the datakitcli tool.

Terminal window
python --version # Ensure you have a Python version >= 3.11 installed
pip install opendatacli

If everything is working, you should see the following output followed by a help message when you type dk:

Terminal window
# Usage: dk [OPTIONS] COMMAND [ARGS]...
# ...

Building the default execution container

This only needs to be done once. In the future, this container will be available on DockerHub to download and you can skip this step.

Terminal window
git clone https://github.com/opendatastudio/python-run-base
cd python-run-base
./build.sh

Creating a new datakit

Let’s create a new datakit. The dk CLI tool provides a convenient command to do this:

Terminal window
dk new helloworld

This will create a new datakit inside a directory called helloworld-datakit.

This is what your new datakit should look like:

  • Directoryhelloworld-datakit/
    • Directoryhelloworld/
      • algorithm.json
      • algorithm.py
    • datakit.json

This simple starter datakit contains an algorithm that takes a single numerical input and multiplies it by 2.

Your first run

Let’s run your new datakit. First, initialise the default run:

Terminal window
cd helloworld-datakit
dk init

This will create a directory called helloworld.run in the root of your datakit directory.

  • Directoryhelloworld-datakit/
    • Directoryhelloworld/
    • Directoryhelloworld.run/
      • run.json
    • datakit.json

This directory stores all the information about your run so that others can easily reproduce the same analysis. run.json stores the run configuration - the input and output values of your analysis and their associated options. This is updated by the CLI every time you execute the run.

By default, the input variable x is set to the default value specified in algorithm.json. You can set the value of x to something else if you’d like:

Terminal window
dk set x 9001

Now we can execute the run:

Terminal window
dk run

And view the result:

Terminal window
dk show x
# ╭─ x ─╮
# │ 42 │
# ╰─────╯
dk show result
# ╭─ result ─╮
# │ 84 │
# ╰──────────╯

Our algorithm took the value of input variable x, 42, multiplied it by 2 to get 84, and stored the resulting value in the result variable.

Adding an input

Let’s modify our algorithm to take two variables and add them together.

First, open up helloworld/algorithm.json:

helloworld/algorithm.json
{
"name": "helloworld",
"title": "New algorithm",
"profile": "datakit-algorithm",
"code": "algorithm.py",
"container": "opendatastudio/python-run-base:latest",
"signature": {
"inputs": [
{
"name": "x",
"title": "X",
"description": "An input variable",
"type": "number",
"null": false,
"default": {
"value": 42
}
}
],
"outputs": [
{
"name": "result",
"title": "Result",
"description": "An output variable",
"type": "number",
"null": true,
"default": {
"value": null
}
}
]
}
}

Here you can see the definitons of our two existing variables, the input x and the output result. To add a second input variable, we need to add another input definition to the inputs list:

helloworld/algorithm.json
{
"name": "helloworld",
...
"signature": {
"inputs": [
{
"name": "x",
"title": "X",
"description": "An input variable",
"type": "number",
"null": false,
"default": {
"value": 42
}
},
{
"name": "y",
"title": "Y",
"description": "Another input variable",
"type": "number",
"null": false,
"default": {
"value": 100
}
}
],
"outputs": [
...
]
}
}

Save and close helloworld/algorithm.json. We will need to initialise the datakit again to add this new variable to the run configuration:

Terminal window
dk reset # This deletes any existing runs
dk init

Now we can set and view the new input variable value:

Terminal window
dk show y
# ╭─ y ─╮
# │ 100 │
# ╰─────╯
dk set y 200
# ╭─ y ─╮
# │ 200 │
# ╰─────╯

Next, we need to modify the algorithm code to use this new input:

helloworld/algorithm.py
def main(x, y):
"""An algorithm that adds two numbers together"""
return {
"result": x + y,
}

Running the new algorithm

Now we’re ready to execute:

Terminal window
dk run
dk show result
# ╭─ result ─╮
# │ 242 │
# ╰──────────╯

Our algorithm added the values of x and y together to get 242 and stored this result in the helloworld.run run configuration.

Next, we will learn how to work with tabular data through resources.