Agentic AI

Products

Solutions

Resources

Company

Pricing

Book A Demo

Agentic AI

Products

Solutions

Resources

Company

Pricing

Book A Demo

Back to All Posts

Under the Hood: The Architecture of Aviso's High-Performance Time Series Database

Sep 9, 2025

AI for Sales

LLM

LQM

Time-Series Database

Bithika Bishesh

At Aviso, our Revenue Intelligence platform is built on a foundation of data. We process billions of data points—from CRM records and sales activities to calendar events and communication logs—to provide unparalleled forecasting accuracy and deal guidance. The temporal nature of this data is its most critical attribute; we don't just care about a deal's current value, but its entire lifecycle, its velocity, and the patterns that emerge over time. To power these complex, time-centric analytics at scale, we built a purpose-driven Time Series Database (TSDB) from the ground up.

Off-the-shelf solutions often present a trade-off: either optimize for high-ingestion rates or for complex, ad-hoc analytical queries. We needed both. Our platform demands a system capable of ingesting millions of events per second while simultaneously serving sub-second queries on high-cardinality data spanning years. This blog post is a deep dive into the architectural decisions, data structures, and algorithms that constitute the core of Aviso's TSDB, enabling us to turn vast temporal datasets into actionable revenue insights.

Why Not Traditional Databases?

Why didn’t we just use an RDBMS or an off-the-shelf TSDB like InfluxDB, TimescaleDB, or Prometheus?

Challenges with RDBMS:

Relational schemas don’t handle high-ingest sequential writes efficiently.
Indexes on timestamp + entity combinations grow unbounded.
Range queries degrade as data volume scales.

Challenges with generic TSDBs:

Optimized for metrics ingestion at second-level resolution (e.g., IoT or monitoring).
Lack of support for hierarchical entities (e.g., opportunity → account → region).
Limited ability to handle late-arriving, corrected business data.
Missing hooks for machine learning feature generation.

Aviso’s TSDB was born out of necessity: to represent, compress, and query enterprise time series data in ways that off-the-shelf solutions could not.

We started with four guiding principles:

Immutable Append-Only Design: Every update to an opportunity, forecast, or pipeline metric is an event in time. Instead of overwriting records, we append changes. This preserves the lineage of sales evolution, crucial for explainability and auditing.
Hierarchical Time Series Model: Data isn’t flat. Opportunities roll up into accounts, accounts into regions, and regions into global forecasts. Our TSDB supports multi-level indexing natively.
Compression Without Compromise: Enterprise data is “bursty.” We combine delta encoding with Gorilla-style compression to reduce storage while maintaining query performance.
ML-Readiness: From the ground up, the database integrates with Aviso’s forecasting engine, exposing sliding windows, lag operators, and anomaly queries directly at the query layer.

Core Architectural Principles

At Aviso, we designed our Time Series Database with three guiding principles: fast write performance, efficient storage, and flexible queries. These principles ensure that our system can handle the realities of enterprise sales data—large, dynamic, and constantly changing—while still giving business users and data teams the ability to ask complex questions at scale.

Data Model: Built for Flexibility

Unlike traditional monitoring databases that track a fixed set of metrics (like CPU load or memory usage), sales data is highly dimensional. Every deal, rep, customer, and product adds a new dimension.

To avoid rigid schemas and “metric explosion,” we adopted a tag-based model. Each data point includes:

Metric Name (e.g., deal_amount, activity_count)
Timestamp (precise to nanoseconds)
Value (numeric measurement)
Tags (context like {deal_id: abc-123, region: emea, rep_id: xyz-789})

This model means a unique time series is defined by its metric and tag set. The benefit? We can slice, filter, and group data with SQL-like flexibility, without having to predefine rigid schemas.

Storage Engine: Optimized for Sales Data

Sales data is written once, updated frequently, and queried heavily. For this workload, we use an architecture that excels at write-heavy, append-only workloads.

In-Memory Buffer (Memtable): New writes are captured in memory and made immediately available for queries.
Durability with WAL: Every write is also logged on disk to ensure nothing is lost, even in case of failure.
Sorted Disk Files (SSTables): Once the memory buffer fills up, data is flushed to disk in sorted, compressed blocks. Background processes keep these files organized for fast queries.

The result: sales data can be ingested at a massive scale while remaining queryable almost instantly.

Sharding and Indexing: Scaling Out Gracefully

To scale horizontally, we partition data by time windows (daily, weekly, or quarterly). Most business queries—like “last quarter’s pipeline”—only touch a small number of these shards, dramatically reducing I/O.

Within each shard, we use inverted indexes to make queries blazingly fast. For example, if you ask:

SELECT avg(deal_amount) 
FROM sales 
WHERE region='emea' AND forecast_category='commit'

The system quickly identifies which series have region=emea and forecast_category=commit, intersects those sets, and fetches only the relevant data blocks. This avoids scanning the entire dataset and ensures sub-second responses, even across billions of data points.

The Journey of a Write

Let’s trace what happens when a new data point—say, an updated deal amount—arrives:

Ingestion: The point is received at the API and routed to the correct shard (based on timestamp).
Write-Ahead Log: The data is written sequentially to disk to guarantee durability.
Memtable Insert: It’s placed into memory, sorted and ready for queries.
Acknowledgement: The system immediately acknowledges the write, usually within a few milliseconds.
Background Flush: Later, the in-memory buffer is compressed and written to disk for long-term storage.

This pipeline balances speed, durability, and efficiency, ensuring real-time data availability without sacrificing long-term scalability.

Storage Efficiency: Compression at Scale

Sales time series data is highly compressible. We apply multiple techniques to shrink data by more than 10x:

Timestamp Compression: We store only the differences between timestamps, and even the differences of those differences, which are typically tiny.
Value Compression: Using lightweight XOR-based compression, we capture just the changing bits between successive values.
Tag Compression: Since tags repeat frequently (region=emea, stage=commit), we replace strings with small dictionary references.

The payoff: reduced storage costs and faster queries due to smaller I/O footprints.

The Journey of a Read

Now let’s walk through a query example:

SELECT percentile(deal_velocity, 95) 
FROM deal_flow 
WHERE stage='negotiation' AND last_updated > '90d' 
GROUP BY time(1w),

Here’s how the system answers it:

Parsing & Planning: The query is broken down—time range, filters, aggregation (95th percentile), and grouping (week, rep).
Shard Pruning: Only shards from the last 90 days are touched; older data is skipped.
Index Lookup: The system consults the inverted index to find all series tagged with stage=negotiation.
Data Fetching: Relevant compressed blocks are retrieved from memory and disk.
Decompression & Aggregation: Data is decompressed in-stream and grouped into weekly buckets per sales rep.
Merge & Finalize: Partial results from shards are combined, producing the final, customer-ready result.

Because the database is designed around time-based partitioning and tag indexing, even queries scanning billions of points return in hundreds of milliseconds.

Real-World Applications in Aviso

Predictive Forecasting
Sales leaders can query evolving deal states and feed them directly into forecasts.
Pipeline Risk Monitoring
Sudden drops in pipeline value trigger alerts in near real-time.
What-if Simulations
By replaying historical sequences, leaders test “what if we had pulled these levers?”
Explainability
The TSDB’s immutable lineage lets users trace why forecasts changed, crucial for trust in AI systems.

Closing Thoughts: The Pulse of Data is Time

Building a custom TSDB was not a trivial undertaking, but it has been fundamental to Aviso's success. By choosing an LSM-tree architecture, we optimized for the high-volume ingestion that is characteristic of enterprise-scale data. Our tag-based data model and inverted index provide the query flexibility needed for deep, ad-hoc revenue analysis. Finally, aggressive, specialized compression schemes make it economically feasible to store years of high-granularity data.

This architecture provides the performant, scalable, and durable foundation upon which our entire AI engine is built. It's what allows our machine learning models to be trained on rich historical context and our users to explore their revenue data in real-time.

The journey is far from over. We are continuously exploring enhancements, from more sophisticated compaction strategies and advanced query optimization to native support for complex event processing and direct integration with machine learning frameworks like TensorFlow and PyTorch. As the velocity and volume of revenue data continue to grow, our TSDB will evolve in lockstep, ensuring the Aviso platform remains at the cutting edge of revenue intelligence. Book a demo now to know more.