ReductStore Extensions
The ReductStore extension system allows you to extend its functionality with custom plugins. These can process data during querying on the storage side. For instance, extensions can be used to manipulate data in columnar formats such as CSV, scale images, search for text in blobs and perform other actions.
This documentation covers the basic concepts of the extension system and explains how to interact with it using query parameters.
Querying with Extensions
Users can interact with the extension system by using the ext
query parameter and the name of the extension in JSON format when querying the data.
The ext
parameter is available in all official ReductStore SDKs, but for the purposes of this example, we will use a JSON request body.
{
"ext": {
"select": {
"columns": [{ "index": 0 }]
}
}
}
This request uses the select
extension to select the first column of the CSV data and return each record with only the first column.
However, the processing of the data is not the only thing that the extension can do. It can also return computed labels, which can contain the results of the processing or any other data that can be used in the query condition.
{
"ext": {
"select": {
"columns": [{ "index": 0, "as_label": "speed" }]
}
},
"when": {
"@speed": { "$gt": 10 }
}
}
This request uses the select
extension to select the first column of the CSV data and return only the first column for each record.
The when
condition will then filter the records by the value of this column, which is now labelled @speed
.
You should use the @
prefix for the computed labels in the when
condition.
This distinguishes the computed labels from the regular labels that are stored in the database.
Data Pipeline with Extensions
As you can see from the previous example, the extension can return computed labels that can be used in the query condition.
However, we also use the when
condition to filter records based on their labels before reading their content.
It would be inefficient to read all the records and pass them to the extension for processing, so the storage engine
has two filtering stages: the first stage filters the records based on their labels before passing them to the extension for processing.
The second stage involves filtering the records based on the computed labels after they have been processed by the extension.
Data pipeline with data processing stage for the extension system.
The above diagram shows the data pipeline for the extension system in response to the following query:
{
"ext": {
"select": {
"columns": [{ "index": 0, "as_label": "speed" }]
}
},
"when": {
"&color": { "$eq": "green" },
"@speed": { "$gt": 10 }
}
}
The query engine detects that the when
clause contains a condition with a computed label, @speed
, and ignores it during the data retrieval stage.
The engine filters the data based on the &color
label, and only then reads the record content from the disk and passes it to the extension for processing.
The extension then processes the data and assigns the computed label @speed
to the record, allowing the engine to use it in the second filtering stage.
The pipeline's output is records that match both conditions based on labels and computed labels.
The detection of computed labels occurs recursively within nested conditions. If a nested condition contains a computed label, the entire tree is ignored.