Data analysis
Analyze with DuckDB
Thanks to DuckDB the data collected by wallowa
can be analyzed in several ways including:
- Query using SQL, the ASOF join (see GitHub Pull Request duration for an example), full text search, Ibis, Polars, Vaex, and DataFusion
- Explore using Jupyter Notebooks, DBeaver, Tableau, and YouPlot
- Export to Parquet and Parquet on S3, CSV, JSON, Excel, Pandas, Apache Arrow
Follow the DuckDB guides to learn more.
Tables
There is only one table in wallowa
so far.
wallowa_raw_data
This table stores the raw JSON payloads from the APIs that data is fetched from. Queries can use the DuckDB JSON extension to extract the data of interest from the payloads. See:
- Shredding Deeply Nested JSON, One Vector at a Time for a demo and tutorial of the DuckDB functionality
- GitHub Pull Request duration for an example using data from the GitHub Pulls API
sql
CREATE SEQUENCE seq_wallowa_raw_data;
CREATE TABLE IF NOT EXISTS wallowa_raw_data (
id INTEGER PRIMARY KEY DEFAULT NEXTVAL('seq_wallowa_raw_data'),
created_at TIMESTAMP DEFAULT now() NOT NULL,
loaded_at TIMESTAMP,
"data_source" VARCHAR,
data_type VARCHAR,
metadata JSON,
"data" VARCHAR
)