Skip to main content
Dview
KNOWLEDGE BASE

Glossary

Plain-English definitions of the data, AI and analytics terms used across the Dview platform and the broader industry. Search, scan by letter, or filter by category.

A

2 terms
Apache Iceberg
Storage
An open table format that adds database-grade reliability — schema evolution, hidden partitioning, time travel, and ACID transactions — on top of object storage like S3 or GCS. Iceberg is one of the foundational formats that makes the modern lakehouse possible.
Aqua
Dview Platform
Dview's high-concurrency query engine. Aqua serves dashboards and ad-hoc analytics over centralized data with autoscaling, caching, and predictable performance — without per-query cost surprises.

B

1 term
Batch Processing
Pipelines
Running a data job over a bounded chunk of data on a schedule (hourly, daily, weekly) rather than continuously. Batch is simpler to reason about and cheaper than streaming, and remains the right choice for most analytical workloads.
See alsoETLStreaming

C

1 term
CDC (Change Data Capture)
Pipelines
A technique for detecting and propagating only the rows that have changed in a source database, rather than re-copying entire tables. CDC dramatically reduces load on operational systems and makes near-real-time replication economical.
See alsoELTStreaming

D

11 terms
Data Catalog
Governance
A searchable inventory of every dataset in the organization — table, file, dashboard — enriched with descriptions, owners, lineage, and quality scores. The catalog is the front door of any data platform: if a dataset is not in the catalog, it effectively does not exist.
Data Fabric
Architecture
An architectural approach that weaves together storage, compute, governance, and access across multiple clouds and silos so that data can be queried and governed as if it were in one place. Dview's platform is a data fabric implementation.
Data Lakehouse
Architecture
A storage architecture that combines the cheap, open-format scale of a data lake with the transactional consistency, schema enforcement, and SQL performance of a warehouse. Lakehouses replace the legacy split between operational lakes and analytical warehouses.
Data Lineage
Governance
An auditable graph of how every column flows through every transformation — from the source system that produced it to the dashboard that displays it. Lineage powers impact analysis, compliance reporting, and confident debugging.
Data Masking
Governance
Replacing sensitive values (credit cards, IDs) with realistic-looking but non-identifying tokens — either irreversibly (anonymization) or reversibly (tokenization) — so that the data remains useful for analytics without exposing the underlying secret.
See alsoPIIRBAC
Data Mesh
Architecture
An organizational pattern where domain teams own their data products end-to-end, supported by a central self-serve platform. Mesh emphasizes ownership and decentralization; it is complementary to (not a replacement for) lakehouse architecture.
Data Observability
Quality
Continuous monitoring of pipelines and datasets for freshness, volume, schema, distribution, and lineage anomalies. Where data quality asks 'is this row right?', observability asks 'is this entire pipeline behaving as expected right now?'.
Data Product
Architecture
A curated, versioned, documented dataset built and maintained with the same rigor as a software product — owner, SLA, contract, deprecation policy. Treating datasets as products is the central idea behind data mesh.
Data Quality
Quality
How accurate, complete, consistent, timely, and unique a dataset is for its intended purpose. Modern platforms encode quality as automated tests on every pipeline run, blocking bad data from reaching downstream consumers.
Data Warehouse
Architecture
A relational store optimized for analytical queries on structured data. Classic warehouses (Snowflake, BigQuery, Redshift) excel at SQL performance; lakehouses now offer comparable performance over open formats with more flexibility.
DSense
Dview Platform
Dview's natural-language interface for enterprise data: business users ask questions in plain English and DSense returns SQL-grounded, citation-backed answers using any LLM, with VPC deployment and built-in security guardrails.

E

3 terms
ELT (Extract, Load, Transform)
Pipelines
A data integration pattern that loads raw data into the target store first and transforms it there, leveraging the target's compute. ELT replaced ETL as warehouses and lakehouses became powerful enough to do the transforming.
Embedding
AI / LLM
A dense vector representation of text, images or other data where semantically similar inputs end up close together in the vector space. Embeddings power retrieval-augmented generation, semantic search, and recommendation.
ETL (Extract, Transform, Load)
Pipelines
The original data integration pattern: pull data from sources, transform it on a separate engine, then load the result into the warehouse. ETL is still appropriate when target compute is constrained or transformations are heavy.
See alsoELTPipeline

F

2 terms
Federated Query
Query
Running a single query that transparently spans multiple underlying data stores — for example joining a Postgres table to a Parquet file in S3 without moving the data. Federated query is the backbone of a data fabric.
Fiber
Dview Platform
Dview's no-code data pipeline product. Fiber connects 100+ source systems to a centralized lakehouse with auto-schema sync, CDC, and analytics-ready delivery — without engineers writing custom connectors.

H

1 term
Hallucination
AI / LLM
When a large language model generates confident output that is factually wrong or unsupported by its inputs. RAG and grounding-on-data techniques are the primary defense against hallucination in enterprise applications.
See alsoLLMRAG

L

1 term
LLM (Large Language Model)
AI / LLM
A neural network trained on massive text corpora that can generate, summarize, translate and reason over natural language. In enterprise data, LLMs are typically used together with retrieval (RAG) so answers stay grounded in the customer's own data.

M

2 terms
Materialized View
Query
A precomputed query result stored as a table and refreshed on a schedule or on data change. Materialized views trade storage and freshness for query speed, and are essential for high-concurrency dashboards.
See alsoOLAPAqua
Metadata
Governance
Data about data: schemas, owners, descriptions, freshness, sample values, lineage, quality scores. Modern platforms treat metadata as a first-class citizen — a queryable asset in its own right.

O

2 terms
OLAP (Online Analytical Processing)
Architecture
Workloads that aggregate large amounts of historical data to answer analytical questions — dashboards, reports, ad-hoc analysis. OLAP systems are columnar, read-optimized and tolerant of higher latency.
OLTP (Online Transaction Processing)
Architecture
Workloads that read and write small amounts of data with strict consistency and low latency — placing an order, recording a payment. OLTP systems are row-oriented and write-optimized.
See alsoOLAPCDC

P

3 terms
Parquet
Storage
An open columnar file format that compresses well and lets query engines skip irrelevant columns and row groups. Parquet is the de-facto on-disk format underneath every modern lakehouse table.
PII (Personally Identifiable Information)
Governance
Data that can identify an individual — name, email, government ID, IP address — directly or in combination with other fields. PII triggers regulatory obligations under GDPR, CCPA, India's DPDP Act, and similar laws.
Pipeline
Pipelines
An ordered sequence of steps that move and reshape data from source to consumer. A modern pipeline is version-controlled, observable, idempotent, and exposes its lineage and quality state to downstream users.

R

3 terms
RAG (Retrieval-Augmented Generation)
AI / LLM
An architecture where an LLM is given the most relevant chunks of trusted data at query time, retrieved from a vector store or SQL warehouse. RAG is the standard pattern for grounding LLM answers in enterprise data.
RBAC (Role-Based Access Control)
Governance
An access model where permissions are granted to named roles, and users inherit permissions by holding roles. RBAC scales better than per-user permissions and is the foundation of most enterprise data security postures.
Row-Level Security
Governance
Access control that filters rows of a table based on the identity or attributes of the requesting user — for example, a regional manager only sees rows for their region. Critical for shared, multi-tenant analytics.

S

5 terms
SCD (Slowly Changing Dimension)
Modeling
A modeling pattern for tracking how an entity (e.g. a customer's address) changes over time. Type 1 overwrites; Type 2 keeps history with effective-from / effective-to columns. Type 2 is the workhorse of dimensional modeling.
Schema Drift
Quality
When the structure of incoming data changes unexpectedly — a column is renamed, a type widens, a field disappears. Detecting drift early is the difference between a five-minute heads-up and a broken dashboard at 9am.
Schema Evolution
Storage
The ability of a table format to change its schema over time — adding columns, renaming, widening types — without breaking historical queries. Apache Iceberg, Delta Lake and Hudi all provide this safely.
Star Schema
Modeling
A dimensional model with a central fact table (orders, events) joined to surrounding dimension tables (customer, product, date). Star schemas are intuitive, queryable by BI tools, and the dominant pattern in analytical warehouses.
See alsoSCDOLAP
Streaming
Pipelines
Processing data continuously as events arrive, with sub-second latency. Streaming powers real-time fraud detection, alerting, and live dashboards; it is more operationally complex than batch and reserved for genuinely time-sensitive use cases.

T

1 term
Text-to-SQL
AI / LLM
Translating a natural-language question into an executable SQL query against the right tables. Modern text-to-SQL combines schema retrieval, LLM generation, and validation against a semantic layer to keep results trustworthy.
See alsoDSenseLLM

V

2 terms
Vector Database
AI / LLM
A database optimized for similarity search over high-dimensional vectors (embeddings). Vector DBs are the retrieval half of the RAG pattern, returning the most semantically relevant chunks for a query in milliseconds.
See alsoEmbeddingRAG
VPC Deployment
Security
Running a service inside the customer's own Virtual Private Cloud so that data never leaves their network perimeter. VPC deployment is the gold standard for regulated industries — banks, AMCs, healthcare — that cannot share data with vendor-managed multi-tenant environments.
See alsoDSensePII