Skip to main content
Version: Next

ReductStore Extensions

The ReductStore extension system allows you to extend its functionality with custom plugins. These can process data during querying on the storage side. For instance, extensions can be used to manipulate data in columnar formats such as CSV, scale images, search for text in blobs and perform other actions.

This documentation covers the basic concepts of the extension system and explains how to interact with it using query parameters.

Querying with Extensions

Users can interact with the extension system by using the #ext directive in the conditional query and the name of the extension in JSON format when querying the data.

{
"#ext": {
"select": {
"columns": [{ "index": 0 }]
}
}
}

This request uses the select extension to select the first column of CSV records and return each one with only the first column.

However, processing the data is not the only capability of the extension. It can also return computed labels, which may contain processing results or any other values useful for filtering entities within records (e.g., CSV rows, JSON objects). This makes it possible to filter data directly based on the processing outcomes.

{
"#ext": {
"select": {
"columns": [{ "index": 0, "as_label": "speed" }]
},
"when": {
"@speed": { "$gt": 10 }
}
}
}

This request uses the select extension to select the first column of the CSV data and return only the first column for each row. The when condition will then filter the rows by the value of this column, which is now labeled @speed. In the end the query returns the records with filtered rows and only one column.

info

You should use the @ prefix for the computed labels in the when condition. This distinguishes the computed labels from the regular labels that are stored in the database.

Data Pipeline with Extensions

As you can see from the previous example, the extension can return computed labels that can be used in the query condition. However, we also use the condition to filter records based on their labels before reading their content. It would be inefficient to read all the records and pass them to the extension for processing, so the storage engine has two filtering stages: the first stage filters the records based on their labels before passing them to the extension for processing. The second stage involves filtering the records based on the computed labels after they have been processed by the extension.

ReductStore Extension Data Pipeline

Data pipeline with data processing stage for the extension system.

The above diagram shows the data pipeline for the extension system in response to the following query:

{
"&color": { "$eq": "green" }, # filter records on the retrieval stage
"#ext": {
"select": {
"columns": [{ "index": 0, "as_label": "speed" }]
},
"when": {
"@speed": { "$gt": 10 } # filter rows after the data processing (here it's CSV rows)
}
}
}

The query engine uses this condition in the query to retrieve records based on the &color label. First, the engine filters the data based on the label. Then, it reads the record content from the disk and passes it to the extension for processing. The extension then processes the data and assigns the computed label @speed to each row, enabling the engine to use the @speed label to filter the data.