When it comes to computer vision, data storage is a critical component. You need to be able to store images for model training, as well as the results of the processing for model validation. There are a few ways to go about this, each with its own advantages and disadvantages. In this post, we’ll take a look at three different ways to store data in computer vision applications: a file system, an S3-like object storage and ReductStore. We’ll also discuss some of the pros and cons of each option.
For demonstration purposes, we’ll use a simple computer vision application which is connected to a CV camera and runs on an edge device:
The camera driver captures images from the CV camera every second and sends them to the model. The model detects something and shows the results in the user interface.
So far it isn’t too complicated, let’s see how we can deal with the data here.
If an application needs to save an image from a CV camera, it can simply save it on a hard drive. We can use a timestamp as a unique identifier, and organize folders and files so that we can access it later via a time interval.
One advantage of this method is that it is very simple. You don’t need any additional components for your system. However, it also has a few drawbacks.
A more advanced approach is to use an object storage system for images. This allows us to organize our data in the storage engine just like folders and files, but access and manage it using an HTTP API.
This is a more flexible approach than a simple file system and has some advantages:
Unfortunately, this solution is also not perfect.
ReductStore is an open source time series database for keeping a history of blobs. It is designed to solve the problem of data reduction and availability for AI/ML applications, where we have data of various sizes and formats continuously coming from data sources.
As you can see, the structure of our application is similar to when using an S3-like storage system, but it works differently. Instead of storing blobs individually, it preallocates blocks of fixed size and writes multiple blobs to each block. This is a more efficient way to write and store data, especially when dealing with small blobs. This approach has the following advantages:
All the approaches to storing historical data for images that we have discussed in this post have their own strengths and weaknesses, and can be useful in different situations. For example, using a file system can be a simple and effective way to store data during the prototyping or proof of concept stage of a computer vision application. It is easy to set up and use, and does not require any additional components or infrastructure. However, it may not be the most efficient or scalable solution in the long term.
Using an S3-like object storage system can be a good option if you already have this type of infrastructure in place, and if you need to store large amounts of data or access data from multiple locations. It provides many benefits and advantages, such as scalability, durability, and security, which can make it a good choice for many different applications. However, it may require additional setup and configuration, and may be more complex to use than a file system.
ReductStore is a specialized time series database for blobs of data in AI/ML applications. It is designed to address the challenges of data reduction and availability. It is a good option if you need to store data on an edge device, and if you want to use advanced features such as a real-time disk quota and high performance.#Tutorials #Computer-vision