Skip to main content

Baselinr Quality Studio

Quality Studio is Baselinr's no-code web interface for configuring and managing your entire data quality setup. Configure connections, tables, profiling settings, validation rules, drift detection, and moreโ€”all through an intuitive visual interface. The Quality Studio also provides comprehensive monitoring and analysis of profiling results, drift alerts, run history, and metrics across multi-warehouse environments.

๐Ÿ“š Demo Documentationโ€‹

The Quality Studio supports a demo mode that runs entirely on Cloudflare Pages without database dependencies:

๐ŸŽฎ Try the Demoโ€‹

๐Ÿ‘‰ Try Quality Studio Demo โ†’

Experience the Quality Studio with realistic sample data. The demo showcases all features including:

  • Configuration management
  • Profiling results visualization
  • Drift detection alerts
  • Validation results
  • Root cause analysis
  • Metrics dashboards

Note: The demo uses pre-generated sample data and runs in read-only mode.

๐ŸŽฏ Featuresโ€‹

Core Featuresโ€‹

  • No-Code Configuration: Set up your entire data quality configuration through visual formsโ€”no YAML or JSON required
  • Configuration Management: Visual editors for connections, storage, tables, profiling, validation rules, drift detection, and more
  • Visual & YAML Editor: Split-view editor with real-time sync between visual forms and YAML configuration
  • Run History: View past profiling runs with filtering and search
  • Profiling Results: Detailed table and column-level metrics visualization
  • Drift Detection: Monitor data drift events with severity indicators
  • Validation Results: View and manage data quality validation results
  • Root Cause Analysis: AI-powered correlation of anomalies with pipeline runs and upstream issues
  • Metrics Overview: Aggregate KPIs and trends
  • Multi-Warehouse Support: PostgreSQL, Snowflake, MySQL, BigQuery, Redshift, SQLite
  • Export Functionality: Export data in JSON/CSV formats
  • AI Chat Assistant: Conversational interface for data quality investigation

Technical Stackโ€‹

Frontend:

  • Next.js 14 (App Router)
  • React 18
  • Tailwind CSS
  • Recharts for visualizations
  • TanStack Query for data fetching
  • Lucide React for icons

Backend:

  • FastAPI
  • SQLAlchemy
  • Pydantic
  • PostgreSQL

๐Ÿ“ Project Structureโ€‹

dashboard/
โ”œโ”€โ”€ backend/ # FastAPI backend
โ”‚ โ”œโ”€โ”€ main.py # API endpoints
โ”‚ โ”œโ”€โ”€ models.py # Pydantic models
โ”‚ โ”œโ”€โ”€ database.py # Database client
โ”‚ โ”œโ”€โ”€ chat_models.py # Chat API models
โ”‚ โ”œโ”€โ”€ chat_routes.py # Chat API routes
โ”‚ โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”‚ โ””โ”€โ”€ sample_data_generator.py
โ”œโ”€โ”€ frontend/ # Next.js frontend
โ”‚ โ”œโ”€โ”€ app/ # App router pages
โ”‚ โ”‚ โ”œโ”€โ”€ page.tsx # Quality Studio overview
โ”‚ โ”‚ โ”œโ”€โ”€ runs/ # Run history page
โ”‚ โ”‚ โ”œโ”€โ”€ drift/ # Drift alerts page
โ”‚ โ”‚ โ”œโ”€โ”€ tables/ # Table details page
โ”‚ โ”‚ โ”œโ”€โ”€ chat/ # AI Chat page
โ”‚ โ”‚ โ””โ”€โ”€ metrics/ # Metrics page
โ”‚ โ”œโ”€โ”€ components/ # Reusable components
โ”‚ โ”‚ โ”œโ”€โ”€ Sidebar.tsx
โ”‚ โ”‚ โ”œโ”€โ”€ KPICard.tsx
โ”‚ โ”‚ โ”œโ”€โ”€ RunsTable.tsx
โ”‚ โ”‚ โ”œโ”€โ”€ DriftAlertsTable.tsx
โ”‚ โ”‚ โ”œโ”€โ”€ FilterPanel.tsx
โ”‚ โ”‚ โ””โ”€โ”€ chat/ # Chat components
โ”‚ โ”‚ โ”œโ”€โ”€ ChatContainer.tsx
โ”‚ โ”‚ โ”œโ”€โ”€ ChatInput.tsx
โ”‚ โ”‚ โ””โ”€โ”€ ChatMessage.tsx
โ”‚ โ”œโ”€โ”€ types/ # TypeScript types
โ”‚ โ”‚ โ”œโ”€โ”€ lineage.ts
โ”‚ โ”‚ โ””โ”€โ”€ chat.ts
โ”‚ โ”œโ”€โ”€ lib/ # Utilities
โ”‚ โ”‚ โ””โ”€โ”€ api.ts # API client
โ”‚ โ””โ”€โ”€ package.json
โ””โ”€โ”€ README.md # This file

๐Ÿš€ Quick Startโ€‹

Prerequisitesโ€‹

  • Node.js 18+ and npm/yarn
  • Python 3.10+
  • PostgreSQL database (Baselinr storage)
  • Existing Baselinr installation (Phase 1)

1. Backend Setupโ€‹

cd dashboard/backend

# Install dependencies
pip install -r requirements.txt

# Set environment variables (create .env file)
export BASELINR_DB_URL=postgresql://baselinr:baselinr@localhost:5433/baselinr
export API_HOST=0.0.0.0
export API_PORT=8000

# Generate sample data (optional)
python sample_data_generator.py

# Start the backend server
python main.py
# Or with uvicorn:
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Backend will be available at: http://localhost:8000

2. Frontend Setupโ€‹

cd dashboard/frontend

# Install dependencies
npm install
# or
yarn install

# Create .env.local file with:
# NEXT_PUBLIC_API_URL=http://localhost:8000

# Start the development server
npm run dev
# or
yarn dev

Frontend will be available at: http://localhost:3000

๐Ÿ”Œ API Endpointsโ€‹

Quality Studio Metricsโ€‹

  • GET /api/dashboard/metrics?warehouse=&days=30 - Get aggregate metrics

Run Historyโ€‹

  • GET /api/runs?warehouse=&schema=&table=&status=&days=30 - List profiling runs
  • GET /api/runs/{run_id} - Get detailed run results

Drift Detectionโ€‹

  • GET /api/drift?warehouse=&table=&severity=&days=30 - List drift alerts

Table Metricsโ€‹

  • GET /api/tables/{table_name}/metrics?schema=&warehouse= - Get table metrics

Warehousesโ€‹

  • GET /api/warehouses - List available warehouses

Exportโ€‹

  • GET /api/export/runs?format=json&warehouse=&days=30 - Export runs
  • GET /api/export/drift?format=json&warehouse=&days=30 - Export drift

Chat (AI Assistant)โ€‹

  • GET /api/chat/config - Get chat configuration status
  • POST /api/chat/message - Send a message to the chat agent
  • GET /api/chat/history/{session_id} - Get chat history for a session
  • DELETE /api/chat/session/{session_id} - Clear a chat session
  • GET /api/chat/tools - List available chat tools
  • GET /api/chat/sessions - List active chat sessions

๐Ÿ“Š Sample Dataโ€‹

To populate the Quality Studio with sample data for testing:

cd dashboard/backend
python sample_data_generator.py

This generates:

  • 100 profiling runs across all warehouse types
  • Column-level metrics for each run
  • Drift events for ~30% of runs

๐ŸŽจ Customizationโ€‹

Theme Colorsโ€‹

Modify tailwind.config.ts to customize colors:

colors: {
primary: {
500: '#0ea5e9', // Main brand color
// ...
},
}

Adding New Pagesโ€‹

  1. Create a new page in frontend/app/your-page/page.tsx
  2. Add navigation link in components/Sidebar.tsx
  3. Create API endpoint in backend/main.py if needed

๐Ÿ”— Integration with Baselinr Phase 1โ€‹

The dashboard connects to the Baselinr storage database to read:

  • baselinr_runs: Run history and metadata
  • baselinr_results: Column-level metrics
  • baselinr_events: Drift detection events
  • baselinr_table_state: Incremental profiling metadata (snapshot IDs, last decisions)

Ensure your Baselinr Phase 1 installation has created these tables.

๐Ÿณ Docker Setup (Optional)โ€‹

TODO: Add Docker Compose configuration for easy deployment

๐Ÿ“ˆ Roadmap / Future Enhancementsโ€‹

  • Real-time updates via WebSockets
  • Advanced filtering and saved views
  • Custom dashboards per user
  • Alert notifications (email, Slack)
  • Figma-based design refinements
  • CSV export implementation
  • Pagination for large datasets
  • Dark mode support
  • User authentication

๐Ÿค Contributingโ€‹

This is an internal MVP. For feature requests or bug reports, please contact the Baselinr team.

๐Ÿ“ Environment Variablesโ€‹

Backend (.env)โ€‹

BASELINR_DB_URL=postgresql://user:password@host:port/database
API_HOST=0.0.0.0
API_PORT=8000
CORS_ORIGINS=http://localhost:3000

# Chat/AI Configuration (optional)
LLM_ENABLED=true
LLM_PROVIDER=openai # or "anthropic"
LLM_MODEL=gpt-4o-mini # or "claude-3-5-sonnet-20241022"
OPENAI_API_KEY=sk-your-api-key
# ANTHROPIC_API_KEY=sk-ant-your-api-key # if using Anthropic
CHAT_MAX_ITERATIONS=5
CHAT_MAX_HISTORY=20
CHAT_TOOL_TIMEOUT=30

# Or use a config file
BASELINR_CONFIG=/path/to/config.yml

Frontend (.env.local)โ€‹

NEXT_PUBLIC_API_URL=http://localhost:8000
NODE_ENV=development

๐Ÿ’ฌ Chat Featureโ€‹

The Quality Studio includes an AI-powered chat assistant for data quality investigation.

Enabling Chatโ€‹

  1. Set LLM_ENABLED=true in your environment
  2. Configure your LLM provider (OpenAI or Anthropic)
  3. Provide the appropriate API key

Chat Capabilitiesโ€‹

The chat assistant can:

  • Query recent profiling runs
  • Investigate drift events and anomalies
  • Get table profiles and column statistics
  • Compare runs and analyze trends
  • Explore data lineage relationships
  • Search across tables

Example Queriesโ€‹

  • "What tables have been profiled recently?"
  • "Show me high severity drift events"
  • "Are there any anomalies I should investigate?"
  • "Compare the last two runs for the customers table"
  • "What's the trend for null rate in the email column?"
  • "What are the upstream sources for orders table?"

๐Ÿ› ๏ธ Developmentโ€‹

Backend Developmentโ€‹

cd backend
uvicorn main:app --reload --host 0.0.0.0 --port 8000

Frontend Developmentโ€‹

cd frontend
npm run dev

Visit:

๐Ÿ“ฆ Production Buildโ€‹

Frontendโ€‹

cd frontend
npm run build
npm start

Backendโ€‹

cd backend
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

๐Ÿ› Troubleshootingโ€‹

Connection Errorsโ€‹

  • Ensure Baselinr database is running
  • Check BASELINR_DB_URL environment variable
  • Verify database tables exist (baselinr_runs, baselinr_results, baselinr_events)

No Data Showingโ€‹

  • Run the sample data generator: python sample_data_generator.py
  • Or run Baselinr profiling: baselinr profile --config config.yml

CORS Errorsโ€‹

  • Check CORS_ORIGINS in backend includes frontend URL
  • Verify NEXT_PUBLIC_API_URL in frontend points to backend

๐Ÿ“„ Licenseโ€‹

Internal use only - Baselinr Project