Dagster Guide
OSO uses Dagster to perform all data orchestration in our backend infrastructure. Most of the time, contributors will be working in Dagster to connect new data sources. If the source data already exists in OSO, take a look at contributing models via sqlmesh.
In order to setup Dagster in your development environment, check out the getting started guide.
Data Architecture
As much as possible, we store data in an Iceberg cluster,
making it the center of gravity of the OSO data lake.
Thus, data ingest will typically go into Iceberg tables.
pyoso is the best way to query any data in these tables.
This will make it significantly easier to decentralize OSO infrastructure
in the future.
Job schedules
All automated job schedules can be found on our public Dagster dashboard.
Currently, our main data pipeline runs once per week on Sundays.
Alert system
Dagster alert sensors are configured in
warehouse/oso_dagster/factories/alerts.py
Right now, alerts are reported to #alerts in the
OSO Discord server.
Secrets Management
When you are creating new data sources for OSO, you may need to handle secrets (e.g. passwords, access keys, DB connection strings).
DO NOT CHECK SECRETS INTO THE REPOSITORY!
Instead, please use the OSO SecretResolver to properly handle your secrets.
Local secrets
While you are developing your code,
the right place to store secrets is in your root .env file.
The OSO SecretResolver organizes secrets by
(prefix, group, key).
Dagster will automatically load secrets from your environment by
the following convention PREFIX__GROUP__KEY.
By default, all secrets in Dagster use the prefix, dagster.
For example, we store all Clickhouse secrets under the clickhouse group.
Thus, these are the environment variables we'd set for Clickhouse:
DAGSTER__CLICKHOUSE__HOST=
DAGSTER__CLICKHOUSE__USER=
DAGSTER__CLICKHOUSE__PASSWORD=
You can reference a secret using SecretReference
and resolve it using SecretResolver.
from ..utils import SecretReference, SecretResolver
password_ref = SecretReference(group_name="clickhouse", key="password")
password = secret_resolver.resolve_as_str(password_ref)
In order to get a reference to the SecretResolver,
you'll want to accept it as a Dagster resource.
You can see
definitions.py and
clickhouse.py as an example.
Production secrets
When you are ready to run your assets in production, please reach out to the core OSO team on Discord. We will arrange a secure way to share your secrets into our production keystore.
Restating SQLMesh models
To learn how to restate SQLMesh models, check the SQLMesh ops guide.
Seed Data
To test the integration between dagster assets and SQLMesh models, check the Seed Data section