Hello world!
Let’s create a simple datakit to add some numbers together.
Prerequisites
Installing the CLI
First, we need to install the datakitcli tool.
python --version # Ensure you have a Python version >= 3.11 installedpip install opendatacli
If everything is working, you should see the following output followed by a help
message when you type dk
:
# Usage: dk [OPTIONS] COMMAND [ARGS]...# ...
Building the default execution container
This only needs to be done once. In the future, this container will be available on DockerHub to download and you can skip this step.
git clone https://github.com/opendatastudio/python-run-basecd python-run-base./build.sh
Creating a new datakit
Let’s create a new datakit. The dk
CLI tool provides a convenient command to
do this:
dk new helloworld
This will create a new datakit inside a directory called helloworld-datakit
.
This is what your new datakit should look like:
Directoryhelloworld-datakit/
Directoryhelloworld/
- algorithm.json
- algorithm.py
- datakit.json
This simple starter datakit contains an algorithm that takes a single numerical input and multiplies it by 2.
Your first run
Let’s run your new datakit. First, initialise the default run:
cd helloworld-datakitdk init
This will create a directory called helloworld.run
in the root of your datakit
directory.
Directoryhelloworld-datakit/
Directoryhelloworld/
- …
Directoryhelloworld.run/
- run.json
- datakit.json
This directory stores all the information about your run so that others can
easily reproduce the same analysis. run.json
stores the run configuration -
the input and output values of your analysis and their associated options. This
is updated by the CLI every time you execute the run.
By default, the input variable x
is set to the default value specified in
algorithm.json
. You can set the value of x
to something else if you’d like:
dk set x 9001
Now we can execute the run:
dk run
And view the result:
dk show x# ╭─ x ─╮# │ 42 │# ╰─────╯
dk show result# ╭─ result ─╮# │ 84 │# ╰──────────╯
Our algorithm took the value of input variable x
, 42, multiplied it by 2 to
get 84, and stored the resulting value in the result
variable.
Adding an input
Let’s modify our algorithm to take two variables and add them together.
First, open up helloworld/algorithm.json
:
{ "name": "helloworld", "title": "New algorithm", "profile": "datakit-algorithm", "code": "algorithm.py", "container": "opendatastudio/python-run-base:latest", "signature": { "inputs": [ { "name": "x", "title": "X", "description": "An input variable", "type": "number", "null": false, "default": { "value": 42 } } ], "outputs": [ { "name": "result", "title": "Result", "description": "An output variable", "type": "number", "null": true, "default": { "value": null } } ] }}
Here you can see the definitons of our two existing variables, the input x
and
the output result
. To add a second input variable, we need to add another
input definition to the inputs
list:
{ "name": "helloworld", ... "signature": { "inputs": [ { "name": "x", "title": "X", "description": "An input variable", "type": "number", "null": false, "default": { "value": 42 } }, { "name": "y", "title": "Y", "description": "Another input variable", "type": "number", "null": false, "default": { "value": 100 } } ], "outputs": [ ... ] }}
Save and close helloworld/algorithm.json
. We will need to initialise the
datakit again to add this new variable to the run configuration:
dk reset # This deletes any existing runsdk init
Now we can set and view the new input variable value:
dk show y# ╭─ y ─╮# │ 100 │# ╰─────╯
dk set y 200# ╭─ y ─╮# │ 200 │# ╰─────╯
Next, we need to modify the algorithm code to use this new input:
def main(x, y): """An algorithm that adds two numbers together""" return { "result": x + y, }
Running the new algorithm
Now we’re ready to execute:
dk run
dk show result# ╭─ result ─╮# │ 242 │# ╰──────────╯
Our algorithm added the values of x
and y
together to get 242 and stored
this result in the helloworld.run
run configuration.
Next, we will learn how to work with tabular data through resources
.