Skip to main content
Share

How to Use Reductstore as a Data Sink for Kafka

· 8 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Kafka Data Sink Kafka stream saved in ReductStore database

In this guide, we will explore the process of storing Kafka messages that contain unstructured data into a time series database.

Apache Kafka is a distributed streaming platform capable of handling high throughput of data, while ReductStore is a databases for unstructured data optimized for storing and querying along time.

ReductStore allows to easily setup a data sink to store blob data for applications that need precise time-based querying or a robust system optimized for edge computing that can handle quotas and retention policies.

This guide builds upon an existing tutorial which provides detailed steps for integrating a simple architecture with these systems. To get started, revisit "Easy Guide to Integrating Kafka: Practical Solutions for Managing Blob Data" if you need help setting up the initial infrastructure.

You can also find the code for this tutorial in the kafka_to_reduct demo on GitHub.

Share

Release v1.8.0: Introducing Data Replication

· 4 min read
Alexey Timin
Software Engineer - Database, Rust, C++

We are pleased to announce the release of the latest minor version of ReductStore, 1.8.0. ReductStore is a time series database designed for storing and managing large amounts of blob data.

To download the latest released version, please visit our Download Page.

What's New in ReductStore v1.8.0

In this release, we're introducing a crucial feature for any database - data replication. Now, you can create a server-side task that "subscribes" to new records written to a bucket and forwards them to another bucket. This bucket can be located on the same instance or a remote one. Since all databases implement replication differently based on their specializations, let's examine how ReductStore tackles this.

Share

Kafka Integration Tutorial for Blob Data

· 12 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Kafka ReductStore Example Sensor data processed and labeled by AI, stored in ReductStore, with metadata relayed to Kafka

In this tutorial, we will walk through a simple and practical setup for integrating Kafka with ReductStore to handle unstructured data streams from edge devices. We'll cover the basics of setting up Kafka and ReductStore using Docker, creating Kafka topics in Python, and managing blob data and metadata.

If you are new to Kafka and ReductStore, here's a quick summary of the technology:

  • Apache Kafka is a distributed streaming platform to share data between applications and services in real-time.
  • ReductStore is a time-series database for blob data, optimized for edge computing and complements Kafka by providing a data storage solution for files larger than 1MB–Kafka's maximum message size.

In our example, we will deploy a simple architecture with a single instance of Kafka and ReductStore running on a local machine. We will demonstrate how to create Kafka topics, write data to ReductStore, and forward metadata to Kafka.

For an easy start, you can follow along by cloning the reduct-kafka-example repository containing all the code snippets and Docker Compose files used in this tutorial.

Share

Implementing Data Streaming in PyTorch from Remote DB

· 9 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

PyTorch Training Diagram PyTorch training loop with data streaming from remote device

When training a model, we aim to process data in batches, shuffle data at each epoch to avoid over fitting, and leverage Python's multiprocessing for data fetching through multiple workers.

The reason that we want to use multiple workers is that GPUs are capable of handling large amounts of data concurrently; however, the bottleneck often lies in the time-consuming task of loading this data into the system.

Moreover, the challenge is even trickier when there is simply too much data to store the whole dataset on disk and we need to stream data from a remote database such as ReductStore.

In this blog post, we will go through a full example and setup a data stream to PyTorch from a playground dataset on a remote database.

Let's dig in!

Share

Keeping MQTT Data History with Node.js

· 6 min read
Alexey Timin
Software Engineer - Database, Rust, C++

The MQTT protocol is widely used in IoT applications because of its simplicity and ability to connect different data sources to applications using a publish/subscribe model. While many MQTT brokers support persistent sessions and can store message history as long as an MQTT client is not available, there may be cases where data needs to be stored for a longer period. In such cases, it is recommended to use a time series database. There are many options available, but if you need to store unstructured data such as images, sensor data, or Protobuf messages, consider using ReductStore. It is a time series database specifically designed for storing large amounts of blob data and optimized for IoT and edge computing.

ReductStore provides client SDKs for many programming languages to integrate it into your infrastructure. In this example, we will use the client SDK for JavaScript.

Let's make a simple application to understand how to keep a history of MQTT messages using ReductStore and Node.js.

Share

Open-Source Alternatives to Landing AI

· 7 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Photo by Luke Southern Photo by Luke Southern on Unsplash

In the thriving world of IoT, integrating MLOps for Edge AI is important for creating intelligent, autonomous devices that are not only efficient but also trustworthy and manageable.

MLOps—or Machine Learning Operations—is a multidisciplinary field that mixes machine learning, data engineering, and DevOps to streamline the lifecycle of AI models.

In this field, important factors to consider are:

  • explainability, ensuring that decisions made by AI are interpretable by humans;

  • orchestration, which involves managing the various components of machine learning in production–at scale; and

  • reproducibility, guaranteeing consistent results across different environments or experiments.

Share

Implementing AI for Real-Time Anomaly Detection in Images

· 8 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Photo by Randy FathPhoto by Randy Fath on Unsplash

The journey of taking an open-source artificial intelligence (AI) model from a laboratory setting to real-world implementation can seem daunting. However, with the right understanding and approach, this transition becomes a manageable task.

This blog post aims to serve as a compass on this technical adventure. We'll demystify key concepts, and delve into practical steps for implementing anomaly detection models effectively in real-time scenarios.

Let's dive in and see how open-source models can be implemented in production, bridging the gap between research and practical applications.

Share

Release v1.7.0: Provisioning & Batch Writing

· 2 min read
Alexey Timin
Software Engineer - Database, Rust, C++

We are pleased to announce the release of the latest minor version of ReductStore, 1.7.0. ReductStore is a time series database designed for storing and managing large amounts of blob data.

To download the latest released version, please visit our Download Page.

What's new in 1.7.0?

ReductStore v1.7.0 introduces two new features that make it easier to provision resources and write data in batches, which can improve your performance and efficiency when using ReductStore for edge computing and AI applications.

Share

Benchmark against MinIO and InfluxDB

· 6 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Benchmarks don't lie, let's put the systems to the ultimate test.

Diagram of ReductStore vs MinIO and InfluxDB benchmark on Edge Device HX401ReductStore vs. MinIO & InfluxDB on Edge Device HX401

For anyone deeply immersed in the engineering world of Edge Computing, Computer Vision, or IoT, you'll want to read further to understand why a time series database for blob data is needed and where it stands out.

Enter our contest: First, we have ReductStore—a time series database for blob data—specifically designed for edge devices.

Its counterpart? The duo of MinIO and InfluxDB, each optimized for their niche in blob storage and time-series data respectively.

In a head-to-head comparison, which system really leads in performance and wins the speed race?

Let's roll up our sleeves and deep-dive into this benchmarking analysis to separate fact from fiction.

Share

Release v1.6.0: license and client SDK for Rust

· 3 min read
Alexey Timin
Software Engineer - Database, Rust, C++

We are pleased to announce the release of the latest minor version of ReductStore, 1.6.0. ReductStore is a time series database designed for storing and managing large amounts of blob data.

To download the latest released version, please visit our Download Page.

What is new in 1.6.0?

Business Source License (BUSL-1.1)

We have updated the ReductStore license to the Business Source License (BUSL-1.1). This license permits free usage of the database for development, research, and testing purposes. Furthermore, it can be used in a production environment for free, provided that the Aggregate Financial Capacity of the company is less than $2,000,000 for the previous year. For additional information, please refer to here.

We believe that the new license strikes a good balance between freedom and revenue generation. This balance is necessary to maintain and improve our technology, and to bring benefits to its users.