KNOWLEDGE BASE
Glossary
Plain-English definitions of the data, AI and analytics terms used across the Dview platform and the broader industry. Search, scan by letter, or filter by category.
A
2 terms- Apache Iceberg Storage
- An open table format that adds database-grade reliability — schema evolution, hidden partitioning, time travel, and ACID transactions — on top of object storage like S3 or GCS. Iceberg is one of the foundational formats that makes the modern lakehouse possible.
- Aqua Dview Platform
- Dview's high-concurrency query engine. Aqua serves dashboards and ad-hoc analytics over centralized data with autoscaling, caching, and predictable performance — without per-query cost surprises.
B
1 termC
1 termD
11 terms- Data Catalog Governance
- A searchable inventory of every dataset in the organization — table, file, dashboard — enriched with descriptions, owners, lineage, and quality scores. The catalog is the front door of any data platform: if a dataset is not in the catalog, it effectively does not exist.
- Data Fabric Architecture
- An architectural approach that weaves together storage, compute, governance, and access across multiple clouds and silos so that data can be queried and governed as if it were in one place. Dview's platform is a data fabric implementation.
- Data Lakehouse Architecture
- A storage architecture that combines the cheap, open-format scale of a data lake with the transactional consistency, schema enforcement, and SQL performance of a warehouse. Lakehouses replace the legacy split between operational lakes and analytical warehouses.
- Data Lineage Governance
- An auditable graph of how every column flows through every transformation — from the source system that produced it to the dashboard that displays it. Lineage powers impact analysis, compliance reporting, and confident debugging.
- Data Masking Governance
- Replacing sensitive values (credit cards, IDs) with realistic-looking but non-identifying tokens — either irreversibly (anonymization) or reversibly (tokenization) — so that the data remains useful for analytics without exposing the underlying secret.
- Data Mesh Architecture
- An organizational pattern where domain teams own their data products end-to-end, supported by a central self-serve platform. Mesh emphasizes ownership and decentralization; it is complementary to (not a replacement for) lakehouse architecture.
- Data Observability Quality
- Continuous monitoring of pipelines and datasets for freshness, volume, schema, distribution, and lineage anomalies. Where data quality asks 'is this row right?', observability asks 'is this entire pipeline behaving as expected right now?'.
- Data Product Architecture
- A curated, versioned, documented dataset built and maintained with the same rigor as a software product — owner, SLA, contract, deprecation policy. Treating datasets as products is the central idea behind data mesh.
- Data Quality Quality
- How accurate, complete, consistent, timely, and unique a dataset is for its intended purpose. Modern platforms encode quality as automated tests on every pipeline run, blocking bad data from reaching downstream consumers.
- Data Warehouse Architecture
- A relational store optimized for analytical queries on structured data. Classic warehouses (Snowflake, BigQuery, Redshift) excel at SQL performance; lakehouses now offer comparable performance over open formats with more flexibility.
- DSense Dview Platform
- Dview's natural-language interface for enterprise data: business users ask questions in plain English and DSense returns SQL-grounded, citation-backed answers using any LLM, with VPC deployment and built-in security guardrails.
E
3 terms- ELT (Extract, Load, Transform) Pipelines
- A data integration pattern that loads raw data into the target store first and transforms it there, leveraging the target's compute. ELT replaced ETL as warehouses and lakehouses became powerful enough to do the transforming.
- Embedding AI / LLM
- A dense vector representation of text, images or other data where semantically similar inputs end up close together in the vector space. Embeddings power retrieval-augmented generation, semantic search, and recommendation.
F
2 terms- Federated Query Query
- Running a single query that transparently spans multiple underlying data stores — for example joining a Postgres table to a Parquet file in S3 without moving the data. Federated query is the backbone of a data fabric.
H
1 termL
1 term- LLM (Large Language Model) AI / LLM
- A neural network trained on massive text corpora that can generate, summarize, translate and reason over natural language. In enterprise data, LLMs are typically used together with retrieval (RAG) so answers stay grounded in the customer's own data.
M
2 terms- Materialized View Query
- A precomputed query result stored as a table and refreshed on a schedule or on data change. Materialized views trade storage and freshness for query speed, and are essential for high-concurrency dashboards.
- Metadata Governance
- Data about data: schemas, owners, descriptions, freshness, sample values, lineage, quality scores. Modern platforms treat metadata as a first-class citizen — a queryable asset in its own right.
O
2 terms- OLAP (Online Analytical Processing) Architecture
- Workloads that aggregate large amounts of historical data to answer analytical questions — dashboards, reports, ad-hoc analysis. OLAP systems are columnar, read-optimized and tolerant of higher latency.
P
3 terms- Parquet Storage
- An open columnar file format that compresses well and lets query engines skip irrelevant columns and row groups. Parquet is the de-facto on-disk format underneath every modern lakehouse table.
- PII (Personally Identifiable Information) Governance
- Data that can identify an individual — name, email, government ID, IP address — directly or in combination with other fields. PII triggers regulatory obligations under GDPR, CCPA, India's DPDP Act, and similar laws.
- Pipeline Pipelines
- An ordered sequence of steps that move and reshape data from source to consumer. A modern pipeline is version-controlled, observable, idempotent, and exposes its lineage and quality state to downstream users.
R
3 terms- RAG (Retrieval-Augmented Generation) AI / LLM
- An architecture where an LLM is given the most relevant chunks of trusted data at query time, retrieved from a vector store or SQL warehouse. RAG is the standard pattern for grounding LLM answers in enterprise data.
- RBAC (Role-Based Access Control) Governance
- An access model where permissions are granted to named roles, and users inherit permissions by holding roles. RBAC scales better than per-user permissions and is the foundation of most enterprise data security postures.
- Row-Level Security Governance
- Access control that filters rows of a table based on the identity or attributes of the requesting user — for example, a regional manager only sees rows for their region. Critical for shared, multi-tenant analytics.
S
5 terms- SCD (Slowly Changing Dimension) Modeling
- A modeling pattern for tracking how an entity (e.g. a customer's address) changes over time. Type 1 overwrites; Type 2 keeps history with effective-from / effective-to columns. Type 2 is the workhorse of dimensional modeling.
- Schema Drift Quality
- When the structure of incoming data changes unexpectedly — a column is renamed, a type widens, a field disappears. Detecting drift early is the difference between a five-minute heads-up and a broken dashboard at 9am.
- Schema Evolution Storage
- The ability of a table format to change its schema over time — adding columns, renaming, widening types — without breaking historical queries. Apache Iceberg, Delta Lake and Hudi all provide this safely.
- Star Schema Modeling
- A dimensional model with a central fact table (orders, events) joined to surrounding dimension tables (customer, product, date). Star schemas are intuitive, queryable by BI tools, and the dominant pattern in analytical warehouses.
- Streaming Pipelines
- Processing data continuously as events arrive, with sub-second latency. Streaming powers real-time fraud detection, alerting, and live dashboards; it is more operationally complex than batch and reserved for genuinely time-sensitive use cases.
T
1 termV
2 terms- Vector Database AI / LLM
- A database optimized for similarity search over high-dimensional vectors (embeddings). Vector DBs are the retrieval half of the RAG pattern, returning the most semantically relevant chunks for a query in milliseconds.
- VPC Deployment Security
- Running a service inside the customer's own Virtual Private Cloud so that data never leaves their network perimeter. VPC deployment is the gold standard for regulated industries — banks, AMCs, healthcare — that cannot share data with vendor-managed multi-tenant environments.