Skip to main content
Version: 1.9.x

Data Querying

ReductStore is a time series database that provides efficient data retrieval capabilities. This guide explains how to query data from ReductStore using the CLI, HTTP API, and SDKs.

Concepts

ReducStore provides an efficient data retrieval solution by batching multiple records within a specified time interval into a single HTTP request, which is beneficial for managing large volumes of data as it reduces the number of requests and overall delay.

The query process is designed as an iterator, returning a batch of records in each iteration. This method allows data to be processed in segments, an approach that is useful when managing large datasets.

While it is possible to retrieve a record by its timestamp, this method is less optimal than querying by a time range due to the lack of batching. However, this approach can be useful for querying specific versions of non-time series records, such as AI models, configurations, or file versions, when timestamps are used as identifiers.

Query Parameters

Data can be queried using the ReductStore CLI, SDKs or HTTP API. The query parameters are the same for all interfaces and include:

  • Start and end time range
  • List of label values to include
  • List of labels to exclude
  • Limit on the number of records to return
  • Head flag to retrieve metadata only

There are also more advanced parameters available in the SDKs and HTTP API, such as

  • Query TTL (Time to Live)
  • Continuous flag to keep the query open for continuous data retrieval
  • Pool interval to specify the time interval for pooling data in continuous mode

Time Range

The time range is defined by the start and end parameters. Records with a timestamp equal to or greater than start and less than end are included in the result. If the start parameter is not set, the query starts from the begging of the entry. If the end parameter is not set, the query continues to the end of the entry. If both start and end are not set, the query returns the entire entry.

Include Labels

The include parameter filters the records by the specified label values. Only those that match ALL the specified label values are included in the result.

Exclude Labels

The exclude parameter filters records based on specified labels. Any records matching ALL of these labels will be omitted from the results.

Limit Records

The limit parameter restricts the number of records returned by a query. If the dataset has fewer records than the specified limit, all records are returned.

TTL (Time-to-Live)

The ttl parameter determines the time-to-live of a query. The query is automatically closed when the specified time has elapsed since it was created. This prevents memory leaks by limiting the number of open queries. The default TTL is 60 seconds.

Head Flag

The head flag is used to retrieve only metadata. When set to true, the query returns the records' metadata, excluding the actual data. This parameter is useful when you want to work with labels without downloading the content of the records.

Continuous Mode

The continuous flag is used to keep the query open for continuous data retrieval. This mode is useful when you need to monitor data in real-time. The SDKs provide pool_interval parameter to specify the time interval for pooling data in continuous mode. The default interval is 1 second.

Typical Data Querying Cases

This section provides guidance on implementing typical data querying cases using the ReductStore CLI, SDKs, or HTTP API. All examples are designed for a local ReductStore instance, accessible at http://127.0.0.1:8383 using the API token 'my-token'.

For more information on setting up a local ReductStore instance, see the Getting Started guide.

Querying Data by Time Range

The most common use case is to query data within a specific time range:

import time
import asyncio
from reduct import Client, Bucket


async def main():
# Create a client instance, then get or create a bucket
client = Client("http://127.0.0.1:8383", api_token="my-token")
bucket: Bucket = await client.create_bucket("my-bucket", exist_ok=True)

ts = time.time()
await bucket.write("py-example", b"Some binary data", ts, )

# Query records in the "py-example" entry of the bucket
async for record in bucket.query("py-example", start=ts, end=ts + 1):
# Print meta information
print(f"Timestamp: {record.timestamp}")
print(f"Content Length: {record.size}")
print(f"Content Type: {record.content_type}")
print(f"Labels: {record.labels}")

# Read the record content
content = await record.read_all()
assert content == b"Some binary data"


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Querying Record by Timestamp

The simplest way to query a record by its timestamp is to use the read method provided by the ReductStore SDKs or HTTP API:

import time
import asyncio
from reduct import Client, Bucket


async def main():
# Create a client instance, then get or create a bucket
client = Client("http://127.0.0.1:8383", api_token="my-token")
bucket: Bucket = await client.create_bucket("my-bucket", exist_ok=True)

ts = time.time()
await bucket.write("py-example", b"Some binary data", ts, )

# Query records in the "py-example" entry of the bucket
async with bucket.read("py-example", ts) as record:
# Print meta information
print(f"Timestamp: {record.timestamp}")
print(f"Content Length: {record.size}")
print(f"Content Type: {record.content_type}")
print(f"Labels: {record.labels}")

# Read the record content
content = await record.read_all()
assert content == b"Some binary data"


loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Using Labels to Filter Data

Filtering data by labels is another common use case. You can use include and exclude parameters to filter records by specific labels. The include parameter is used to filter records by specified labels values. Only records that match all specified labels values will be included in the result. Conversely, the exclude parameter is used to exclude records that match all specified labels from the result.

For example, consider a data set with annotated photos of celebrities. We want to retrieve the first 10 photos of celebrities taken in 2006, excluding those of Rowan Atkinson:

import time
import asyncio
from reduct import Client, Bucket


async def main():
# Create a client instance, then get or create a bucket
client = Client("http://127.0.0.1:8383", api_token="my-token")
bucket: Bucket = await client.get_bucket("example-bucket")

# Query 10 photos from "imdb" entry which taken in 2006 but don't contain "Rowan Atkinson"
async for record in bucket.query("imdb", limit=10, include={"photo_taken": "2006"},
exclude={"name": "b'Rowan Atkinson'"}):
print("Name", record.labels["name"])
print("Phot taken", record.labels["photo_taken"])
print("Gender", record.labels["gender"])
_ = await record.read_all()


loop = asyncio.get_event_loop()
loop.run_until_complete(main())