ReductStore Extensions
The ReductStore extension system allows you to extend its functionality with custom plugins. These can process data during querying on the storage side. For instance, extensions can be used to manipulate data in columnar formats such as CSV, scale images, search for text in blobs and perform other actions.
This documentation covers the basic concepts of the extension system and explains how to interact with it using query parameters.
Querying with Extensions
Users can interact with the extension system by using the ext
query parameter and the name of the extension in JSON format when querying the data.
The ext
parameter is available in all official ReductStore SDKs, but for the purposes of this example, we will use a JSON request body.
{
"ext": {
"select": {
"columns": [{ "index": 0 }]
}
}
}
This request uses the select
extension to select the first column of CSV records and return each one with only the first column.
However, the processing of the data is not the only thing that the extension can do. It can also return computed labels, which can contain the results of the processing or any other data that can be used in the query condition.
{
"ext": {
"select": {
"columns": [{ "index": 0, "as_label": "speed" }]
},
"when": {
"@speed": { "$gt": 10 }
}
}
}
This request uses the select
extension to select the first column of the CSV data and return only the first column for each row.
The when
condition will then filter the rows by the value of this column, which is now labeled @speed
.
In the end the query returns the records with filtered rows and only one column.
You should use the @
prefix for the computed labels in the when
condition.
This distinguishes the computed labels from the regular labels that are stored in the database.
Data Pipeline with Extensions
As you can see from the previous example, the extension can return computed labels that can be used in the query condition.
However, we also use the when
condition to filter records based on their labels before reading their content.
It would be inefficient to read all the records and pass them to the extension for processing, so the storage engine
has two filtering stages: the first stage filters the records based on their labels before passing them to the extension for processing.
The second stage involves filtering the records based on the computed labels after they have been processed by the extension.
Data pipeline with data processing stage for the extension system.
The above diagram shows the data pipeline for the extension system in response to the following query:
{
"ext": {
"select": {
"columns": [{ "index": 0, "as_label": "speed" }]
},
"when": {
"@speed": { "$gt": 10 } # filter rows after the data processing (here it's CSV rows)
}
},
"when": {
"&color": { "$eq": "green" } # filter records on the retrieval stage
}
}
The query engine uses the when
condition in the query to retrieve records based on the &color
label.
First, the engine filters the data based on the label. Then, it reads the record content from the disk and passes it to the extension for processing.
The extension then processes the data and assigns the computed label @speed
to each row, enabling the engine to use the @speed
label to filter the data.