Jump to content

Fundraising/techops/docs/analytics stack

From Wikitech

Analytics Stack Components

System Description Host:Port Log Location
Trino Trino is a query engine used to process analytics data in FR-Tech's datalake. the current implementation is composed of one coordinator node and 3 worker nodes, each on their own host Coordinator- fransc2001:8443

Workers-

fransw200[1-3]

<host>/var/lib/trino/trino-data/var/log

Query history in trino or mariadb under mariadb.trino.trino_queries

Dagster Dagster is an orchestrator used to schedule jobs that load or manipulate data fran2001:3000 syslog, tagged with 'dagster'. note that in vb the logs print to stdout, not syslog since we aren't using systemd
dbt dbt (data build tool) is an open source framework for modeling data. dbt is basically an orchestrator for sql commands that ensures the commands run in a way that respects each data model's up/ downstream dependencies fran2001 syslog for the dagster materialization logs, general dbt logs are in: /srv/dagster_data/dbt_log/dbt.log

individual run logs are in: /srv/dagster_data/dbt_target/ check the DBT_LOG_PATH and DBT_TARGET_PATH env vars, respectively. When running dbt locally, these variables should be defined in your .analytics.env file

Hive Metastore Hive Metastore holds metadata used by Iceberg and Trino to map Trino tables to file locations in minIO fransc2001:9083 or the same host as the Trino Coordinator syslog
Metabase Metabase is a Business Intelligence tool used to visualize and analyze data. FR Analytics is in the process of migrating from Apache Superset to Metabase fran2001:9081 syslog
Superset Superset is a Business Intelligence Tool. We are migrating from Superset to Metabase fran2001:9080 syslog
minIO minIO is an object storage tool that holds our datalake. the physical data files are stored in minIO franio200[1-3]:9000 not currently logging
MariaDB MariaDB is a database used by CiviCRM to hold all its data frdb2003 syslog and /srv/sqldata/{hostname}-slow.log
dlt dlt (data load tool) is an open source framework for loading data to and from various APIs and databases fran2001 syslog, but not currently tagged. all dlt logs are probably tagged with dagster since dlt scripts run through dagster

syslog is in /var/log/syslog any host

Analytics How-to's

Fundraising/techops/docs/analytics stack/how to guides

Troubleshooting

Fundraising/techops/docs/analytics stack/troubleshooting

How Environment Variables get passed and parameterized

Fundraising/techops/docs/analytics stack/env vars