The Self-Hosted Analytics Stack in 2026: PHP + ClickHouse + Redis

When you're building an analytics platform that needs to ingest, store, and query millions of events per day, the stack you choose matters. A lot.

Most analytics platforms use Go + Postgres. It's the "safe" choice—Go is compiled, blazingly fast, and Postgres is reliable. But there are other choices, and they have real advantages for analytics workloads.

Statalog uses PHP + Laravel in the application layer, ClickHouse for analytics storage, MySQL for user data, and Redis for caching. It's not the most obvious stack, and it's worth explaining why—both to justify our architecture decisions and to help you decide what stack is right if you're building something similar.

This deep dive covers why each component was chosen, what it enables, and when you should use this stack instead of alternatives.

The Stack Breakdown

Here's what Statalog's architecture looks like:

┌─────────────────────────────────────────────┐
│ Browser / Tracker                           │
│ (Send pageview/event)                       │
└────────────────┬────────────────────────────┘
                 │ HTTPS POST
                 ↓
┌─────────────────────────────────────────────┐
│ Laravel Application (PHP 8.3)               │
│ • API Routes (/collect, /api/stats)         │
│ • Queue Workers (Event Processing)          │
│ • Dashboard + User Management               │
└────────┬────────────────────────────────────┘
         │
         ├─→ Redis (cache, sessions, queues)
         ├─→ MySQL (user accounts, config)
         └─→ ClickHouse (events storage)
         
┌─────────────────────────────────────────────┐
│ ClickHouse Cluster                          │
│ • 100M+ events/day                          │
│ • Compressed storage (1000x reduction)      │
│ • Analytical queries (fast aggregations)    │
└─────────────────────────────────────────────┘

Each layer handles a specific job:

PHP/Laravel → Request handling, user auth, API, dashboard logic
MySQL → Persistent data (users, sites, goals, billing)
Redis → Fast cache, session dedup, background queues
ClickHouse → Analytics data (events, sessions, aggregates)

Why PHP + Laravel

This is the controversial choice. In 2026, most startups use Go, Rust, or Node. PHP sounds like 2010.

But here's the reality:

Rapid Iteration

Laravel is, objectively, the fastest framework to build features in. The ecosystem is mature, the conventions are consistent, and the syntax is readable.

Database migrations? php artisan make:migration
API routes? Built-in
Authentication? 5 lines of code
Admin dashboard? Filament generates it for you

A feature that takes 2 days in Go takes 4 hours in Laravel. That matters when you're a startup competing against established analytics platforms.

Team Familiarity

30% of web developers know PHP. That's the biggest developer pool outside of JavaScript. If you need to hire, you have a much larger talent pool than if you choose Go or Rust.

Laravel specifically has massive mindshare—it's the #1 PHP framework and by extension one of the top web frameworks globally. Documentation is world-class, and the community is active.

Eloquent ORM

Laravel's ORM is genuinely excellent. Once you've used Eloquent, working with raw SQL or Go's database/sql package feels archaic:

// Eloquent
$users = User::where('created_at', '>', now()->subDays(7))
    ->with('sites')
    ->orderBy('created_at', 'desc')
    ->paginate(50);

// vs. Go
rows, err := db.QueryContext(ctx, `
    SELECT users.id, users.email, sites.id, sites.name
    FROM users
    LEFT JOIN sites ON users.id = sites.user_id
    WHERE users.created_at > ?
    ORDER BY users.created_at DESC
    LIMIT 50 OFFSET ?
`, sevenDaysAgo, offset)

One is readable; the other is SQL boilerplate. For application logic, Eloquent wins.

Modern PHP (8.3)

PHP isn't your parents' PHP. PHP 8.3 has:

JIT compilation → performance within spitting distance of Go for typical web workloads
Typed properties → private int $eventCount; private string $siteId;
Named arguments → sendEmail(to: $user->email, subject: 'Welcome')
Attributes → metadata on classes/methods (similar to Python decorators)
Match expressions → switch on steroids

Performance-wise, modern PHP handles millions of requests/day without breaking a sweat. Instagram, Slack, and Discord all use PHP at scale.

The Trade-off

The trade-off: PHP is dynamically typed and interpreted (though with JIT compilation). It's not as fast as Go for CPU-bound work.

But analytics isn't CPU-bound. It's I/O-bound:

Accept event → write to queue → return 200 OK
The actual processing happens asynchronously in background workers

For this workload, PHP's slight performance disadvantage is irrelevant.

Why ClickHouse Over Postgres/TimescaleDB

This is the critical choice. Storage and querying are the foundation of an analytics platform.

The Problem with Postgres

Postgres is an OLTP (Online Transactional Processing) database—optimized for frequent small reads/writes with strong consistency guarantees.

Analytics is OLAP (Online Analytical Processing)—optimized for large aggregations with less need for consistency.

Using Postgres for analytics is like using a hammer for surgery. It works, but it's the wrong tool.

Specific problems with Postgres for 100M events/day:

Storage bloat. A single event in Postgres takes ~500 bytes. 100M events = 50GB/day. 365 days = 18TB/year. That's expensive.
Query speed. Aggregating 100M rows for a dashboard query is slow. You need indexes on everything, and suddenly your writes slow down because indexes need updating.
Write throughput. Postgres can handle ~10,000 writes/sec with tuning. Analytics platforms often exceed that.
No compression. Postgres stores data as-is. Time-series data (lots of repeated values) compresses incredibly well—ClickHouse achieves 100-1000x compression. Postgres can't do that.

TimescaleDB (Postgres Extension)

TimescaleDB is better than vanilla Postgres for time-series, but it's still based on Postgres's architecture:

Better compression than vanilla Postgres (~10x), but not 100x
Better write throughput (~20,000 rows/sec), but still limited
Better query performance, but row-oriented storage is still slower than column-oriented for aggregations

It's a better tool than Postgres, but ClickHouse is still 5-10x more efficient.

Why ClickHouse

ClickHouse is a columnar OLAP database purpose-built for analytics:

1. Compression (1000x reduction)

ClickHouse stores data by column, not by row. This sounds boring but enables aggressive compression:

Row-oriented storage (Postgres):
Row 1: [event_id: 1, user_id: 100, site_id: 5, timestamp: 2026-04-24T10:00:00, event: pageview, url: /pricing]
Row 2: [event_id: 2, user_id: 101, site_id: 5, timestamp: 2026-04-24T10:00:01, event: pageview, url: /pricing]
Row 3: [event_id: 3, user_id: 102, site_id: 5, timestamp: 2026-04-24T10:00:02, event: pageview, url: /pricing]

Column-oriented storage (ClickHouse):
event_id:   [1, 2, 3, 4, 5, 6, ...]
user_id:    [100, 101, 102, 103, 104, 105, ...]
site_id:    [5, 5, 5, 5, 5, 5, ...] ← Same value repeated 1M times
timestamp:  [2026-04-24T10:00:00, 2026-04-24T10:00:01, ...]
event:      [pageview, pageview, pageview, ...] ← Same value repeated 1M times
url:        [/pricing, /pricing, /pricing, ...] ← Same value repeated 1M times

Notice site_id and event are the same for millions of rows. ClickHouse's compression algorithms recognize this and encode it as: [5, repeated 1000000 times] instead of storing 5 eight million times.

Real numbers:

A year of analytics data:

Postgres: 18TB
ClickHouse: 18GB

That's 1000x reduction. You save 99.9% of storage cost.

2. Query Speed

Aggregation queries are blazingly fast because ClickHouse only reads the columns it needs:

-- "How many pageviews did site_id=5 get?"
SELECT COUNT(*) FROM events WHERE site_id = 5;

ClickHouse only reads the site_id column (1MB for 1M events) and the count (trivial). Total time: <100ms.

Postgres would read the entire row (500MB for 1M events), filter by site_id, then count. Much slower.

3. Write Throughput

ClickHouse handles millions of writes per second (with batching). Each event can be written individually or batched:

// Batch 100 events before writing (better performance)
$events = [];
for ($i = 0; $i < 100; $i++) {
    $events[] = [
        'event_id' => uniqid(),
        'site_id' => '12345',
        'user_id' => $userId,
        'event' => 'pageview',
        'timestamp' => now(),
        'url' => '/page',
    ];
}
$clickhouse->insert('events', $events);

This is built-in to ClickHouse. Postgres struggles with this without batching.

4. Time-Series Native

ClickHouse has native support for time-series operations:

-- Get 1-hour aggregates for the last 7 days
SELECT 
    toStartOfHour(timestamp) as hour,
    COUNT(*) as pageviews
FROM events
WHERE site_id = '12345'
    AND timestamp >= now() - INTERVAL 7 DAY
GROUP BY hour
ORDER BY hour DESC;

This query runs in <100ms even on years of data.

ClickHouse Limitations

ClickHouse isn't perfect:

Slow mutations. If you want to delete or update a row, it's slow (rebuilds parts of the table). This is why we never delete events—just archive old data.
No ACID transactions. Writes are atomic per table, but transactions across tables don't exist. Not a problem for analytics (you write events to one table), but it matters if you use ClickHouse for other purposes.
Learning curve. ClickHouse syntax and concepts are different from SQL databases. TTL (time-to-live), ReplacingMergeTree tables, and distributed queries are powerful but require learning.
Distributed setup is complex. If you need ClickHouse replication and sharding across multiple servers, it's more complex than Postgres replication.

For Statalog's use case (append-only events, fast analytical queries), ClickHouse's limitations don't matter. The compression and query speed far outweigh the trade-offs.

Why Redis for Sessions

Redis is an in-memory key-value store. It's used for:

Cache – Store frequently accessed data (site config, user preferences) without hitting the database
Session deduplication – When an event arrives, check if the session_id is already in Redis. If yes, reuse the session. If no, create a new one. This keeps session logic O(1) time complexity.
Background queues – Laravel's queue system uses Redis to store background jobs (process events, send emails, etc.)

Why not Memcached? Redis has richer data structures (sets, lists, hashes) and persistence options. For a small analytics platform, the difference is negligible, but Redis's features make it more flexible.

Session Deduplication

This is the killer feature for analytics:

// Tracker receives event
POST /collect?session_id=abc123

// Check Redis: Does session abc123 exist?
$sessionExists = $redis->exists("session:abc123");

if (!$sessionExists) {
    // New session
    $redis->setex("session:abc123", 1800, json_encode([
        'user_id' => null,
        'first_page' => '/pricing',
        'referrer' => 'google.com',
        'device' => 'mobile',
    ]));
} else {
    // Existing session, reuse it
    $sessionData = $redis->get("session:abc123");
}

Without this, you'd need to query the database for every event to check if the session exists. At 100K events/sec, that's 100K database queries/sec. Terrible.

With Redis, it's 100K in-memory lookups/sec. Trivial.

Why MySQL for Config

MySQL stores user data:

User accounts (email, password, subscription plan)
Sites (site_id, domain, settings)
Goals (goal definitions, conversion values)
API keys
Billing data

This data changes infrequently and needs strong consistency (ACID transactions). MySQL is perfect for this.

Could we use ClickHouse for everything? Not really. ClickHouse isn't great for:

Small, random reads (user authentication)
Transactions (billing operations)
Frequent updates (user settings)

Data flow:

Event arrives → Check user quota in MySQL
            → Write to ClickHouse queue
            → Background worker pulls from queue
            → Insert into ClickHouse
            → Increment event count in MySQL

MySQL handles the transactional stuff. ClickHouse handles the analytics stuff.

Architecture Overview

Here's how data flows through the system:

1. Event Ingestion (Request Path)

Browser tracker sends event
    ↓
PHP API Route: POST /collect
    ↓
Check site_id & API key (MySQL)
    ↓
Check rate limits (Redis)
    ↓
Add to queue (Redis)
    ↓
Return 200 OK (instant)

This all happens in <10ms. The actual processing happens asynchronously.

2. Event Processing (Background Workers)

Queue worker (Supervisor-managed PHP process)
    ↓
Pull event from queue
    ↓
Enrich event (parse user agent, geo-locate, etc.)
    ↓
Check session (Redis)
    ↓
Write to ClickHouse
    ↓
Update aggregates (MySQL cache table)
    ↓
Loop back to step 2

Multiple workers run in parallel (10-100 depending on load). Each worker processes events asynchronously, so the API never blocks.

3. Dashboard Query (Request Path)

User opens dashboard
    ↓
Frontend requests /api/stats?site_id=12345&range=7days
    ↓
PHP API Route
    ↓
Check if cached (Redis)
    ↓
If cached: return cached data
    ↓
If not cached: query ClickHouse
    ↓
SELECT COUNT(*), COUNT(DISTINCT user_id), ...
    FROM events
    WHERE site_id = '12345'
    AND timestamp >= now() - INTERVAL 7 DAY
    GROUP BY page_path
    ↓
ClickHouse returns results
    ↓
Cache results for 5 minutes (Redis)
    ↓
Return JSON to frontend

Dashboard queries hit ClickHouse directly. Caching keeps frequently accessed dashboards instant.

Performance Metrics

How does this stack handle real load?

Statalog handles:

100M+ events/day from thousands of sites
1M+ events/hour during peak traffic
API latency: <5ms (99th percentile) for event ingestion
Dashboard queries: <100ms for 7-day aggregations over 1M events
Storage: ~18GB/year per 100M events/day

Hardware:

Single PHP application server (8 CPU, 32GB RAM)
Single Redis server (same machine)
Single MySQL server (different machine, 4 CPU, 16GB RAM)
Single ClickHouse server (8 CPU, 64GB RAM)

This setup handles thousands of sites and hundreds of billions of events with room to spare. Scaling to millions of events/sec would require ClickHouse clustering and multiple application servers, but the fundamental architecture stays the same.

Cost Breakdown

Assuming 100M events/day:

Storage Costs

ClickHouse storage: 18GB/year (1 year retention)
- Cheaply stored on any object storage (S3, etc.)
- Cost: ~$0.50/month for 1-year retention
MySQL storage: 100GB/year (user data, aggregates, config)
- Cost: ~$5/month on managed service
Total storage: ~$6/month

Compute Costs

PHP application server: 8 CPU, 32GB RAM
- AWS EC2 equivalent: ~$0.50/hour = ~$360/month
ClickHouse server: 8 CPU, 64GB RAM
- AWS EC2 equivalent: ~$0.80/hour = ~$580/month
MySQL server: 4 CPU, 16GB RAM
- AWS RDS equivalent: ~$200/month
Redis server: Shared with PHP server, no extra cost
Total compute: ~$1,140/month

Total Monthly Cost for 100M events/day: ~$1,146/month

Per event cost: 0.00035 cents (less than half a cent per event)

Compare this to:

Google Analytics: Free (but limited features, you own no data)
Mixpanel: ~$100/month minimum, up to $10k/month at scale
Amplitude: ~$200/month minimum, up to $5k/month at scale

For a self-hosted platform, cost is dirt cheap. For users, it's even cheaper because they own hardware or cloud account—Statalog just provides software.

When You Should Use This Stack

Good Use Cases

Building an analytics platform (like Statalog)
- High-volume event ingestion
- Complex analytical queries
- Multiple customers (multi-tenancy)
Building a real-time dashboard platform
- Heavy analytical workloads
- Millions of events/day
- Complex aggregations
Building a data warehouse
- Need cheap storage (compression)
- Need fast analytical queries
- High volume append-only data
If you're building with Laravel
- PHP+Laravel is the fastest to iterate
- Existing Laravel expertise on the team
- Don't need 100% peak performance (good enough is fine)

Bad Use Cases

Building a real-time bidding platform (low latency needed)
- Go or Rust would be better
Building a financial transaction system (strong consistency needed)
- PostgreSQL or Oracle would be better
- ClickHouse doesn't have ACID transactions
Building a simple CRUD app
- Rails or Django is faster to build, cheaper to run
- Don't need ClickHouse's compression
If you have no PHP developers
- Go or Node would be more familiar to most teams

Alternative Stacks

Go + Postgres (The "Safe" Choice)

Why it's popular:

Go is blazingly fast (compiled, no GC pauses)
Postgres is battle-tested and reliable
Many successful startups use this (PostHog, Segment, etc.)

When to use it:

Your team knows Go
You value raw performance over iteration speed
You want strong ACID consistency

Downside:

Slower to iterate on features
Postgres doesn't compress well for analytics
Storage costs are much higher

Rust + DuckDB

Why it's interesting:

Rust is the fastest modern language
DuckDB is like ClickHouse but simpler (single-node only)
Tiny deployment footprint

When to use it:

You need extreme performance
Single-node deployment is enough
Your team loves Rust

Downside:

Rust is hard to learn and iterate in
Tiny ecosystem compared to Go or Python
Overkill for most analytics platforms

Python + Druid

Why it's interesting:

Python is familiar to data engineers
Druid is an open-source OLAP database (like ClickHouse)
Good for data science workflows

When to use it:

You have a data science team already
You need Python for machine learning or analysis
You want Druid's time-series features

Downside:

Python is slower than Go or Rust
Druid has steeper learning curve than ClickHouse
Less common in startups

Node + BigQuery

Why it's interesting:

Node is fast and familiar to web developers
BigQuery is managed (no infrastructure)
Serverless scales automatically

When to use it:

You want zero infrastructure management
You have budget for BigQuery ($1000+/month)
Your events are already on GCP

Downside:

Expensive at scale
BigQuery is slower than self-hosted ClickHouse for queries
Vendor lock-in

The Statalog Philosophy

We chose PHP + Laravel + ClickHouse because:

Speed to market matters – Laravel lets us ship features in days, not weeks
Team familiarity matters – 30% of developers know PHP; hiring is easy
Storage efficiency matters – ClickHouse compression saves 99% of storage cost
Query speed matters – Analytical queries are fast even on years of data
"Good enough" is fine – We don't need Go's extreme performance for web request handling

Is it the most "optimal" stack in some abstract sense? No. Is it the right stack for our constraints (team size, iteration speed, hardware cost)? Absolutely.

The lesson: choose the stack that matches your constraints, not the stack that's popular.

Deployment & Operations

Queue System (Supervisor)

Events are queued asynchronously. Supervisor manages worker processes:

[program:statalog-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /app/artisan queue:work redis --queue=events --sleep=3 --tries=3
autostart=true
autorestart=true
numprocs=20
redirect_stderr=true
stdout_logfile=/var/log/statalog-worker.log

This launches 20 workers, each pulling events from the Redis queue, processing them, and inserting into ClickHouse.

ClickHouse Schema

Events are stored in a ReplacingMergeTree (ClickHouse's equivalent of a table):

CREATE TABLE events (
    site_id String,
    event_id String,
    timestamp DateTime,
    session_id String,
    user_id Nullable(String),
    event String,
    page_path String,
    referrer String,
    device_type String,
    viewport_width Int32,
    country String,
    city String
) ENGINE = ReplacingMergeTree()
ORDER BY (site_id, timestamp, event_id)
TTL timestamp + INTERVAL 365 DAY DELETE;

The TTL (time-to-live) automatically deletes data older than 365 days, keeping storage costs in check.

Learning Resources

If you want to build something similar:

ClickHouse Docs: https://clickhouse.com/docs
Laravel Docs: https://laravel.com/docs
Redis Docs: https://redis.io/docs/
Statalog Open Source: https://github.com/statalog/statalog (coming soon)

FAQ

Q: Why not just use Google BigQuery or Mixpanel?

A: If you're building an analytics product for customers, you need to own the data. Using a third-party platform means you're stuck with their data export process, pricing, and feature set. Building self-hosted gives you full control.

Q: Isn't PHP slow?

A: Modern PHP (8.3+) with JIT compilation is fast enough for I/O-bound workloads. API latency is <5ms. For a web application, that's plenty fast. CPU-intensive work (video processing, ML) would be slow, but analytics is I/O-bound.

Q: What if I don't want to manage ClickHouse?

A: ClickHouse Cloud offers managed ClickHouse. You pay per query instead of managing servers. For small volumes, it's cheaper and simpler.

Q: Can I use ClickHouse without PHP?

A: Absolutely. ClickHouse has SDKs for Go, Node, Python, Rust, and more. The choice of PHP for the application layer doesn't lock you into using it for data processing. You could write workers in any language.

Q: What's the minimum scale for this stack to make sense?

A: At 1M events/day, this stack is overkill. Use Rails, Django, or Node with Postgres. At 100M events/day, this stack is necessary. At 10M events/day, it's a judgment call—Postgres might work, but ClickHouse will be cheaper long-term.

Q: Is this stack reliable in production?

A: Yes. ClickHouse powers analytics for some of the largest companies in the world (Yandex, Spotify, etc.). Laravel has been battle-tested for 15+ years. Redis is the standard for caching. This stack is boring, which is good for reliability.

Q: How do I deploy this?

A: Docker containers (Laravel app + workers), managed ClickHouse (or self-hosted with containers), managed MySQL (or self-hosted), managed Redis (or self-hosted). Most startups deploy on AWS EC2 or DigitalOcean.

Next Steps

Read the ClickHouse documentation to understand columnar storage
Explore Laravel queues for background job handling
Check out the Redis documentation for session management
Consider your scale: do you really need this stack, or should you start simpler?

Building an analytics platform is complex, but choosing the right stack makes it manageable. We chose PHP + ClickHouse because it balances performance, cost, and iteration speed.

If you're building something similar, this stack might be right for you too.

Questions? Contact us or view the docs.

Privacy-First Analytics and SEO: How One Affects the Other

Session Replay Is a Privacy Liability. Do This Instead.