Skip to main content
Share

3 Ways to Store Computer Vision Data

· 7 min read
Alexey Timin
Software Engineer - Database, Rust, C++

When it comes to computer vision, data storage is a critical component. You need to be able to store images for model training, as well as the results of the processing for model validation. There are a few ways to go about this, each with its own advantages and disadvantages. In this post, we’ll take a look at three different ways to store data in computer vision applications: a file system, an S3-like object storage and ReductStore. We’ll also discuss some of the pros and cons of each option.

A Simple Computer Vision Application

For demonstration purposes, we’ll use a simple computer vision application which is connected to a CV camera and runs on an edge device:

Computer Vision Application

The camera driver captures images from the CV camera every second and forwards them to the model, which then detects objects and displays the results in the user interface.

Your images and results need to be stored for training and validation purposes. The customer may also wish to view images featuring anomalous objects. These requirements present the challenge of maintaining a history of blob or unstructured data.

Share

Performance comparison: ReductStore Vs. Minio

· 7 min read
Alexey Timin
Software Engineer - Database, Rust, C++

In this article, we will compare two data storage solutions: ReductStore and Minio. Both offer on-premise blob storage, but they approach it differently. Minio provides traditional S3-like blob storage, while ReductStore is a time series database designed to store a history of blob data. We will focus on their application in scenarios that require storage and access to a history of unstructured data. This includes images from a computer vision camera, vibration sensor data, or binary packages common in industrial data.

Handling Historical Data

S3-like blob storage is commonly used to store data of different formats and sizes in the cloud or internal storage. It can also accommodate historical data as a series of blobs. A simple approach is to create a folder for each data source and save objects with timestamps in their names:

bucket
|
|---cv_camera
|---1666225094312397.jpeg
|---1666225094412397.jpeg
|---1666225094512397.jpeg

Share

How to Use Reductstore as a Data Sink for Kafka

· 8 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Kafka Data Sink Kafka stream saved in ReductStore database

In this guide, we will explore the process of storing Kafka messages that contain unstructured data into a time series database.

Apache Kafka is a distributed streaming platform capable of handling high throughput of data, while ReductStore is a databases for unstructured data optimized for storing and querying along time.

ReductStore allows to easily setup a data sink to store blob data for applications that need precise time-based querying or a robust system optimized for edge computing that can handle quotas and retention policies.

This guide builds upon an existing tutorial which provides detailed steps for integrating a simple architecture with these systems. To get started, revisit "Easy Guide to Integrating Kafka: Practical Solutions for Managing Blob Data" if you need help setting up the initial infrastructure.

You can also find the code for this tutorial in the kafka_to_reduct demo on GitHub.

Share

Release v1.8.0: Introducing Data Replication

· 4 min read
Alexey Timin
Software Engineer - Database, Rust, C++

We are pleased to announce the release of the latest minor version of ReductStore, 1.8.0. ReductStore is a time series database designed for storing and managing large amounts of blob data.

To download the latest released version, please visit our Download Page.

What's New in ReductStore v1.8.0

In this release, we're introducing a crucial feature for any database - data replication. Now, you can create a server-side task that "subscribes" to new records written to a bucket and forwards them to another bucket. This bucket can be located on the same instance or a remote one. Since all databases implement replication differently based on their specializations, let's examine how ReductStore tackles this.

Share

Kafka Integration Tutorial for Blob Data

· 12 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Kafka ReductStore Example Sensor data processed and labeled by AI, stored in ReductStore, with metadata relayed to Kafka

In this tutorial, we will walk through a simple and practical setup for integrating Kafka with ReductStore to handle unstructured data streams from edge devices. We'll cover the basics of setting up Kafka and ReductStore using Docker, creating Kafka topics in Python, and managing blob data and metadata.

If you are new to Kafka and ReductStore, here's a quick summary of the technology:

  • Apache Kafka is a distributed streaming platform to share data between applications and services in real-time.
  • ReductStore is a time-series database for blob data, optimized for edge computing and complements Kafka by providing a data storage solution for files larger than 1MB–Kafka's maximum message size.

In our example, we will deploy a simple architecture with a single instance of Kafka and ReductStore running on a local machine. We will demonstrate how to create Kafka topics, write data to ReductStore, and forward metadata to Kafka.

For an easy start, you can follow along by cloning the reduct-kafka-example repository containing all the code snippets and Docker Compose files used in this tutorial.

Share

Implementing Data Streaming in PyTorch from Remote DB

· 9 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

PyTorch Training Diagram PyTorch training loop with data streaming from remote device

When training a model, we aim to process data in batches, shuffle data at each epoch to avoid over fitting, and leverage Python's multiprocessing for data fetching through multiple workers.

The reason that we want to use multiple workers is that GPUs are capable of handling large amounts of data concurrently; however, the bottleneck often lies in the time-consuming task of loading this data into the system.

Moreover, the challenge is even trickier when there is simply too much data to store the whole dataset on disk and we need to stream data from a remote database such as ReductStore.

In this blog post, we will go through a full example and setup a data stream to PyTorch from a playground dataset on a remote database.

Let's dig in!

Share

Keeping MQTT Data History with Node.js

· 6 min read
Alexey Timin
Software Engineer - Database, Rust, C++

The MQTT protocol is widely used in IoT applications because of its simplicity and ability to connect different data sources to applications using a publish/subscribe model. While many MQTT brokers support persistent sessions and can store message history as long as an MQTT client is not available, there may be cases where data needs to be stored for a longer period. In such cases, it is recommended to use a time series database. There are many options available, but if you need to store unstructured data such as images, sensor data, or Protobuf messages, consider using ReductStore. It is a time series database specifically designed for storing large amounts of blob data and optimized for IoT and edge computing.

ReductStore provides client SDKs for many programming languages to integrate it into your infrastructure. In this example, we will use the client SDK for JavaScript.

Let's make a simple application to understand how to keep a history of MQTT messages using ReductStore and Node.js.

Share

Open-Source Alternatives to Landing AI

· 7 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Photo by Luke Southern Photo by Luke Southern on Unsplash

In the thriving world of IoT, integrating MLOps for Edge AI is important for creating intelligent, autonomous devices that are not only efficient but also trustworthy and manageable.

MLOps—or Machine Learning Operations—is a multidisciplinary field that mixes machine learning, data engineering, and DevOps to streamline the lifecycle of AI models.

In this field, important factors to consider are:

  • explainability, ensuring that decisions made by AI are interpretable by humans;

  • orchestration, which involves managing the various components of machine learning in production–at scale; and

  • reproducibility, guaranteeing consistent results across different environments or experiments.

Share

Implementing AI for Real-Time Anomaly Detection in Images

· 8 min read
Anthony Cavin
Data Scientist - ML/AI, Python, TypeScript

Photo by Randy FathPhoto by Randy Fath on Unsplash

The journey of taking an open-source artificial intelligence (AI) model from a laboratory setting to real-world implementation can seem daunting. However, with the right understanding and approach, this transition becomes a manageable task.

This blog post aims to serve as a compass on this technical adventure. We'll demystify key concepts, and delve into practical steps for implementing anomaly detection models effectively in real-time scenarios.

Let's dive in and see how open-source models can be implemented in production, bridging the gap between research and practical applications.

Share

Release v1.7.0: Provisioning & Batch Writing

· 2 min read
Alexey Timin
Software Engineer - Database, Rust, C++

We are pleased to announce the release of the latest minor version of ReductStore, 1.7.0. ReductStore is a time series database designed for storing and managing large amounts of blob data.

To download the latest released version, please visit our Download Page.

What's new in 1.7.0?

ReductStore v1.7.0 introduces two new features that make it easier to provision resources and write data in batches, which can improve your performance and efficiency when using ReductStore for edge computing and AI applications.