After aggregating a massive amount of data to train our machine learning model, it becomes crucial to monitor and visualize how that data moves through the system — from ingestion to processing and storage. To ensure observability, we adopt an open-source and cost-efficient stack built on RisingWave and Grafana.
📦 Why This Stack?
Component Purpose Why it’s Chosen RisingWave Stream-native database for real-time data PostgreSQL compatible
Grafana. Visualization and dashboarding Open-source, highly extensible
The pictorial view in the figure below provides a good mental picture what is happening.
Fig 1. Pushing features to Grafana
RisingWave can be seamlessly integrated into a Kubernetes cluster using the Helm-based deployment script provided below. Although this setup is primarily intended for development environments, it closely mirrors the configuration required for a production-grade deployment, making it a reliable starting point.
#!/bin/bash
# Add the RisingWave Helm chart repository (with forced update to ensure the latest version is used)
helm repo add risingwavelabs https://risingwavelabs.github.io/helm-charts/ --force-update
# Update your local Helm chart repository cache
helm repo update
# Install or upgrade RisingWave in the Kubernetes cluster
# - Creates the 'risingwave' namespace if it doesn't exist
# - Waits for all resources to be ready before exiting
# - Applies configuration overrides from the provided YAML file
helm upgrade --install --create-namespace --wait risingwave risingwavelabs/risingwave \
--namespace=risingwave \
-f manifests/risingwave-values.yaml
Once this script is run successfully, one can port forward the application to your local device and login to the risingwave application as shown below.
Depending on your setup you might want to have Risingwave pull the data from the Kafka producer or you can push the data into it.
We can now move on and have the Grafana applied to the kube cluster, this can be achieved easily with the script below
#!/bin/bash
helm repo add grafana https://grafana.github.io/helm-charts
helm upgrade --install --create-namespace --wait grafana grafana/grafana --namespace=monitoring --values manifests/grafana-values.yaml
If this should be applied correctly the result should be the image shown below, after you port forward from your kube cluster
Now that our feature store data source is connected, we can proceed to build the Grafana dashboard.
Setting up the dashboard is straightforward, but it requires a solid understanding of SQL to query the right data and derive meaningful insights. In our case, we aimed to visualize candlestick (OHLC) values as they are streamed into the feature store in real time, providing immediate feedback on how data is evolving and supporting data-driven model development.
Conclusion
In this setup, we’ve demonstrated how to establish a robust data infrastructure by streaming data from Kafka into RisingWave and visualizing it with Grafana. This pipeline provides real-time visibility into the flow and structure of our data—laying a strong foundation for downstream machine learning tasks.
In our next episode, we will build on this foundation by constructing a regression model aimed at predicting cryptocurrency volume based on the rich dataset we've captured and monitored.