The Self-Hosted Analytics Stack in 2026: PHP + ClickHouse + Redis
When you're building an analytics platform that needs to ingest, store, and query millions of events per day, the stack you choose matters. A lot.
Most analytics platforms use Go + Postgres. It's the "safe" choice—Go is compiled, blazingly fast, and Postgres is reliable. But there are other choices, and they have real advantages for analytics workloads.
Statalog uses PHP + Laravel in the application layer, ClickHouse for analytics storage, MySQL for user data, and Redis for caching. It's not the most obvious stack, and it's worth explaining why—both to justify our architecture decisions and to help you decide what stack is right if you're building something similar.
This deep dive covers why each component was chosen, what it enables, and when you should use this stack instead of alternatives.
The Stack Breakdown
Here's what Statalog's architecture looks like:
┌─────────────────────────────────────────────┐
│ Browser / Tracker │
│ (Send pageview/event) │
└────────────────┬────────────────────────────┘
│ HTTPS POST
↓
┌─────────────────────────────────────────────┐
│ Laravel Application (PHP 8.3) │
│ • API Routes (/collect, /api/stats) │
│ • Queue Workers (Event Processing) │
│ • Dashboard + User Management │
└────────┬────────────────────────────────────┘
│
├─→ Redis (cache, sessions, queues)
├─→ MySQL (user accounts, config)
└─→ ClickHouse (events storage)
┌─────────────────────────────────────────────┐
│ ClickHouse Cluster │
│ • 100M+ events/day │
│ • Compressed storage (1000x reduction) │
│ • Analytical queries (fast aggregations) │
└─────────────────────────────────────────────┘
Each layer handles a specific job:
- PHP/Laravel → Request handling, user auth, API, dashboard logic
- MySQL → Persistent data (users, sites, goals, billing)
- Redis → Fast cache, session dedup, background queues
- ClickHouse → Analytics data (events, sessions, aggregates)
Why PHP + Laravel
This is the controversial choice. In 2026, most startups use Go, Rust, or Node. PHP sounds like 2010.
But here's the reality:
Rapid Iteration
Laravel is, objectively, the fastest framework to build features in. The ecosystem is mature, the conventions are consistent, and the syntax is readable.
- Database migrations?
php artisan make:migration - API routes? Built-in
- Authentication? 5 lines of code
- Admin dashboard? Filament generates it for you
A feature that takes 2 days in Go takes 4 hours in Laravel. That matters when you're a startup competing against established analytics platforms.
Team Familiarity
30% of web developers know PHP. That's the biggest developer pool outside of JavaScript. If you need to hire, you have a much larger talent pool than if you choose Go or Rust.
Laravel specifically has massive mindshare—it's the #1 PHP framework and by extension one of the top web frameworks globally. Documentation is world-class, and the community is active.
Eloquent ORM
Laravel's ORM is genuinely excellent. Once you've used Eloquent, working with raw SQL or Go's database/sql package feels archaic:
// Eloquent
$users = User::where('created_at', '>', now()->subDays(7))
->with('sites')
->orderBy('created_at', 'desc')
->paginate(50);
// vs. Go
rows, err := db.QueryContext(ctx, `
SELECT users.id, users.email, sites.id, sites.name
FROM users
LEFT JOIN sites ON users.id = sites.user_id
WHERE users.created_at > ?
ORDER BY users.created_at DESC
LIMIT 50 OFFSET ?
`, sevenDaysAgo, offset)
One is readable; the other is SQL boilerplate. For application logic, Eloquent wins.
Modern PHP (8.3)
PHP isn't your parents' PHP. PHP 8.3 has:
- JIT compilation → performance within spitting distance of Go for typical web workloads
- Typed properties →
private int $eventCount; private string $siteId; - Named arguments →
sendEmail(to: $user->email, subject: 'Welcome') - Attributes → metadata on classes/methods (similar to Python decorators)
- Match expressions → switch on steroids
Performance-wise, modern PHP handles millions of requests/day without breaking a sweat. Instagram, Slack, and Discord all use PHP at scale.
The Trade-off
The trade-off: PHP is dynamically typed and interpreted (though with JIT compilation). It's not as fast as Go for CPU-bound work.
But analytics isn't CPU-bound. It's I/O-bound:
- Accept event → write to queue → return 200 OK
- The actual processing happens asynchronously in background workers
For this workload, PHP's slight performance disadvantage is irrelevant.
Why ClickHouse Over Postgres/TimescaleDB
This is the critical choice. Storage and querying are the foundation of an analytics platform.
The Problem with Postgres
Postgres is an OLTP (Online Transactional Processing) database—optimized for frequent small reads/writes with strong consistency guarantees.
Analytics is OLAP (Online Analytical Processing)—optimized for large aggregations with less need for consistency.
Using Postgres for analytics is like using a hammer for surgery. It works, but it's the wrong tool.
Specific problems with Postgres for 100M events/day:
-
Storage bloat. A single event in Postgres takes ~500 bytes. 100M events = 50GB/day. 365 days = 18TB/year. That's expensive.
-
Query speed. Aggregating 100M rows for a dashboard query is slow. You need indexes on everything, and suddenly your writes slow down because indexes need updating.
-
Write throughput. Postgres can handle ~10,000 writes/sec with tuning. Analytics platforms often exceed that.
-
No compression. Postgres stores data as-is. Time-series data (lots of repeated values) compresses incredibly well—ClickHouse achieves 100-1000x compression. Postgres can't do that.
TimescaleDB (Postgres Extension)
TimescaleDB is better than vanilla Postgres for time-series, but it's still based on Postgres's architecture:
- Better compression than vanilla Postgres (~10x), but not 100x
- Better write throughput (~20,000 rows/sec), but still limited
- Better query performance, but row-oriented storage is still slower than column-oriented for aggregations
It's a better tool than Postgres, but ClickHouse is still 5-10x more efficient.
Why ClickHouse
ClickHouse is a columnar OLAP database purpose-built for analytics:
1. Compression (1000x reduction)
ClickHouse stores data by column, not by row. This sounds boring but enables aggressive compression:
Row-oriented storage (Postgres):
Row 1: [event_id: 1, user_id: 100, site_id: 5, timestamp: 2026-04-24T10:00:00, event: pageview, url: /pricing]
Row 2: [event_id: 2, user_id: 101, site_id: 5, timestamp: 2026-04-24T10:00:01, event: pageview, url: /pricing]
Row 3: [event_id: 3, user_id: 102, site_id: 5, timestamp: 2026-04-24T10:00:02, event: pageview, url: /pricing]
Column-oriented storage (ClickHouse):
event_id: [1, 2, 3, 4, 5, 6, ...]
user_id: [100, 101, 102, 103, 104, 105, ...]
site_id: [5, 5, 5, 5, 5, 5, ...] ← Same value repeated 1M times
timestamp: [2026-04-24T10:00:00, 2026-04-24T10:00:01, ...]
event: [pageview, pageview, pageview, ...] ← Same value repeated 1M times
url: [/pricing, /pricing, /pricing, ...] ← Same value repeated 1M times
Notice site_id and event are the same for millions of rows. ClickHouse's compression algorithms recognize this and encode it as: [5, repeated 1000000 times] instead of storing 5 eight million times.
Real numbers:
A year of analytics data:
- Postgres: 18TB
- ClickHouse: 18GB
That's 1000x reduction. You save 99.9% of storage cost.
2. Query Speed
Aggregation queries are blazingly fast because ClickHouse only reads the columns it needs:
-- "How many pageviews did site_id=5 get?"
SELECT COUNT(*) FROM events WHERE site_id = 5;
ClickHouse only reads the site_id column (1MB for 1M events) and the count (trivial). Total time: <100ms.
Postgres would read the entire row (500MB for 1M events), filter by site_id, then count. Much slower.
3. Write Throughput
ClickHouse handles millions of writes per second (with batching). Each event can be written individually or batched:
// Batch 100 events before writing (better performance)
$events = [];
for ($i = 0; $i < 100; $i++) {
$events[] = [
'event_id' => uniqid(),
'site_id' => '12345',
'user_id' => $userId,
'event' => 'pageview',
'timestamp' => now(),
'url' => '/page',
];
}
$clickhouse->insert('events', $events);
This is built-in to ClickHouse. Postgres struggles with this without batching.
4. Time-Series Native
ClickHouse has native support for time-series operations:
-- Get 1-hour aggregates for the last 7 days
SELECT
toStartOfHour(timestamp) as hour,
COUNT(*) as pageviews
FROM events
WHERE site_id = '12345'
AND timestamp >= now() - INTERVAL 7 DAY
GROUP BY hour
ORDER BY hour DESC;
This query runs in <100ms even on years of data.
ClickHouse Limitations
ClickHouse isn't perfect:
-
Slow mutations. If you want to delete or update a row, it's slow (rebuilds parts of the table). This is why we never delete events—just archive old data.
-
No ACID transactions. Writes are atomic per table, but transactions across tables don't exist. Not a problem for analytics (you write events to one table), but it matters if you use ClickHouse for other purposes.
-
Learning curve. ClickHouse syntax and concepts are different from SQL databases. TTL (time-to-live), ReplacingMergeTree tables, and distributed queries are powerful but require learning.
-
Distributed setup is complex. If you need ClickHouse replication and sharding across multiple servers, it's more complex than Postgres replication.
For Statalog's use case (append-only events, fast analytical queries), ClickHouse's limitations don't matter. The compression and query speed far outweigh the trade-offs.
Why Redis for Sessions
Redis is an in-memory key-value store. It's used for:
- Cache – Store frequently accessed data (site config, user preferences) without hitting the database
- Session deduplication – When an event arrives, check if the session_id is already in Redis. If yes, reuse the session. If no, create a new one. This keeps session logic O(1) time complexity.
- Background queues – Laravel's queue system uses Redis to store background jobs (process events, send emails, etc.)
Why not Memcached? Redis has richer data structures (sets, lists, hashes) and persistence options. For a small analytics platform, the difference is negligible, but Redis's features make it more flexible.
Session Deduplication
This is the killer feature for analytics:
// Tracker receives event
POST /collect?session_id=abc123
// Check Redis: Does session abc123 exist?
$sessionExists = $redis->exists("session:abc123");
if (!$sessionExists) {
// New session
$redis->setex("session:abc123", 1800, json_encode([
'user_id' => null,
'first_page' => '/pricing',
'referrer' => 'google.com',
'device' => 'mobile',
]));
} else {
// Existing session, reuse it
$sessionData = $redis->get("session:abc123");
}
Without this, you'd need to query the database for every event to check if the session exists. At 100K events/sec, that's 100K database queries/sec. Terrible.
With Redis, it's 100K in-memory lookups/sec. Trivial.
Why MySQL for Config
MySQL stores user data:
- User accounts (email, password, subscription plan)
- Sites (site_id, domain, settings)
- Goals (goal definitions, conversion values)
- API keys
- Billing data
This data changes infrequently and needs strong consistency (ACID transactions). MySQL is perfect for this.
Could we use ClickHouse for everything? Not really. ClickHouse isn't great for:
- Small, random reads (user authentication)
- Transactions (billing operations)
- Frequent updates (user settings)
Data flow:
Event arrives → Check user quota in MySQL
→ Write to ClickHouse queue
→ Background worker pulls from queue
→ Insert into ClickHouse
→ Increment event count in MySQL
MySQL handles the transactional stuff. ClickHouse handles the analytics stuff.
Architecture Overview
Here's how data flows through the system:
1. Event Ingestion (Request Path)
Browser tracker sends event
↓
PHP API Route: POST /collect
↓
Check site_id & API key (MySQL)
↓
Check rate limits (Redis)
↓
Add to queue (Redis)
↓
Return 200 OK (instant)
This all happens in <10ms. The actual processing happens asynchronously.
2. Event Processing (Background Workers)
Queue worker (Supervisor-managed PHP process)
↓
Pull event from queue
↓
Enrich event (parse user agent, geo-locate, etc.)
↓
Check session (Redis)
↓
Write to ClickHouse
↓
Update aggregates (MySQL cache table)
↓
Loop back to step 2
Multiple workers run in parallel (10-100 depending on load). Each worker processes events asynchronously, so the API never blocks.
3. Dashboard Query (Request Path)
User opens dashboard
↓
Frontend requests /api/stats?site_id=12345&range=7days
↓
PHP API Route
↓
Check if cached (Redis)
↓
If cached: return cached data
↓
If not cached: query ClickHouse
↓
SELECT COUNT(*), COUNT(DISTINCT user_id), ...
FROM events
WHERE site_id = '12345'
AND timestamp >= now() - INTERVAL 7 DAY
GROUP BY page_path
↓
ClickHouse returns results
↓
Cache results for 5 minutes (Redis)
↓
Return JSON to frontend
Dashboard queries hit ClickHouse directly. Caching keeps frequently accessed dashboards instant.
Performance Metrics
How does this stack handle real load?
Statalog handles:
- 100M+ events/day from thousands of sites
- 1M+ events/hour during peak traffic
- API latency: <5ms (99th percentile) for event ingestion
- Dashboard queries: <100ms for 7-day aggregations over 1M events
- Storage: ~18GB/year per 100M events/day
Hardware:
- Single PHP application server (8 CPU, 32GB RAM)
- Single Redis server (same machine)
- Single MySQL server (different machine, 4 CPU, 16GB RAM)
- Single ClickHouse server (8 CPU, 64GB RAM)
This setup handles thousands of sites and hundreds of billions of events with room to spare. Scaling to millions of events/sec would require ClickHouse clustering and multiple application servers, but the fundamental architecture stays the same.
Cost Breakdown
Assuming 100M events/day:
Storage Costs
-
ClickHouse storage: 18GB/year (1 year retention)
- Cheaply stored on any object storage (S3, etc.)
- Cost: ~$0.50/month for 1-year retention
-
MySQL storage: 100GB/year (user data, aggregates, config)
- Cost: ~$5/month on managed service
-
Total storage: ~$6/month
Compute Costs
-
PHP application server: 8 CPU, 32GB RAM
- AWS EC2 equivalent: ~$0.50/hour = ~$360/month
-
ClickHouse server: 8 CPU, 64GB RAM
- AWS EC2 equivalent: ~$0.80/hour = ~$580/month
-
MySQL server: 4 CPU, 16GB RAM
- AWS RDS equivalent: ~$200/month
-
Redis server: Shared with PHP server, no extra cost
-
Total compute: ~$1,140/month
Total Monthly Cost for 100M events/day: ~$1,146/month
Per event cost: 0.00035 cents (less than half a cent per event)
Compare this to:
- Google Analytics: Free (but limited features, you own no data)
- Mixpanel: ~$100/month minimum, up to $10k/month at scale
- Amplitude: ~$200/month minimum, up to $5k/month at scale
For a self-hosted platform, cost is dirt cheap. For users, it's even cheaper because they own hardware or cloud account—Statalog just provides software.
When You Should Use This Stack
Good Use Cases
-
Building an analytics platform (like Statalog)
- High-volume event ingestion
- Complex analytical queries
- Multiple customers (multi-tenancy)
-
Building a real-time dashboard platform
- Heavy analytical workloads
- Millions of events/day
- Complex aggregations
-
Building a data warehouse
- Need cheap storage (compression)
- Need fast analytical queries
- High volume append-only data
-
If you're building with Laravel
- PHP+Laravel is the fastest to iterate
- Existing Laravel expertise on the team
- Don't need 100% peak performance (good enough is fine)
Bad Use Cases
-
Building a real-time bidding platform (low latency needed)
- Go or Rust would be better
-
Building a financial transaction system (strong consistency needed)
- PostgreSQL or Oracle would be better
- ClickHouse doesn't have ACID transactions
-
Building a simple CRUD app
- Rails or Django is faster to build, cheaper to run
- Don't need ClickHouse's compression
-
If you have no PHP developers
- Go or Node would be more familiar to most teams
Alternative Stacks
Go + Postgres (The "Safe" Choice)
Why it's popular:
- Go is blazingly fast (compiled, no GC pauses)
- Postgres is battle-tested and reliable
- Many successful startups use this (PostHog, Segment, etc.)
When to use it:
- Your team knows Go
- You value raw performance over iteration speed
- You want strong ACID consistency
Downside:
- Slower to iterate on features
- Postgres doesn't compress well for analytics
- Storage costs are much higher
Rust + DuckDB
Why it's interesting:
- Rust is the fastest modern language
- DuckDB is like ClickHouse but simpler (single-node only)
- Tiny deployment footprint
When to use it:
- You need extreme performance
- Single-node deployment is enough
- Your team loves Rust
Downside:
- Rust is hard to learn and iterate in
- Tiny ecosystem compared to Go or Python
- Overkill for most analytics platforms
Python + Druid
Why it's interesting:
- Python is familiar to data engineers
- Druid is an open-source OLAP database (like ClickHouse)
- Good for data science workflows
When to use it:
- You have a data science team already
- You need Python for machine learning or analysis
- You want Druid's time-series features
Downside:
- Python is slower than Go or Rust
- Druid has steeper learning curve than ClickHouse
- Less common in startups
Node + BigQuery
Why it's interesting:
- Node is fast and familiar to web developers
- BigQuery is managed (no infrastructure)
- Serverless scales automatically
When to use it:
- You want zero infrastructure management
- You have budget for BigQuery ($1000+/month)
- Your events are already on GCP
Downside:
- Expensive at scale
- BigQuery is slower than self-hosted ClickHouse for queries
- Vendor lock-in
The Statalog Philosophy
We chose PHP + Laravel + ClickHouse because:
- Speed to market matters – Laravel lets us ship features in days, not weeks
- Team familiarity matters – 30% of developers know PHP; hiring is easy
- Storage efficiency matters – ClickHouse compression saves 99% of storage cost
- Query speed matters – Analytical queries are fast even on years of data
- "Good enough" is fine – We don't need Go's extreme performance for web request handling
Is it the most "optimal" stack in some abstract sense? No. Is it the right stack for our constraints (team size, iteration speed, hardware cost)? Absolutely.
The lesson: choose the stack that matches your constraints, not the stack that's popular.
Deployment & Operations
Queue System (Supervisor)
Events are queued asynchronously. Supervisor manages worker processes:
[program:statalog-worker]
process_name=%(program_name)s_%(process_num)02d
command=php /app/artisan queue:work redis --queue=events --sleep=3 --tries=3
autostart=true
autorestart=true
numprocs=20
redirect_stderr=true
stdout_logfile=/var/log/statalog-worker.log
This launches 20 workers, each pulling events from the Redis queue, processing them, and inserting into ClickHouse.
ClickHouse Schema
Events are stored in a ReplacingMergeTree (ClickHouse's equivalent of a table):
CREATE TABLE events (
site_id String,
event_id String,
timestamp DateTime,
session_id String,
user_id Nullable(String),
event String,
page_path String,
referrer String,
device_type String,
viewport_width Int32,
country String,
city String
) ENGINE = ReplacingMergeTree()
ORDER BY (site_id, timestamp, event_id)
TTL timestamp + INTERVAL 365 DAY DELETE;
The TTL (time-to-live) automatically deletes data older than 365 days, keeping storage costs in check.
Learning Resources
If you want to build something similar:
- ClickHouse Docs: https://clickhouse.com/docs
- Laravel Docs: https://laravel.com/docs
- Redis Docs: https://redis.io/docs/
- Statalog Open Source: https://github.com/statalog/statalog (coming soon)
FAQ
Q: Why not just use Google BigQuery or Mixpanel?
A: If you're building an analytics product for customers, you need to own the data. Using a third-party platform means you're stuck with their data export process, pricing, and feature set. Building self-hosted gives you full control.
Q: Isn't PHP slow?
A: Modern PHP (8.3+) with JIT compilation is fast enough for I/O-bound workloads. API latency is <5ms. For a web application, that's plenty fast. CPU-intensive work (video processing, ML) would be slow, but analytics is I/O-bound.
Q: What if I don't want to manage ClickHouse?
A: ClickHouse Cloud offers managed ClickHouse. You pay per query instead of managing servers. For small volumes, it's cheaper and simpler.
Q: Can I use ClickHouse without PHP?
A: Absolutely. ClickHouse has SDKs for Go, Node, Python, Rust, and more. The choice of PHP for the application layer doesn't lock you into using it for data processing. You could write workers in any language.
Q: What's the minimum scale for this stack to make sense?
A: At 1M events/day, this stack is overkill. Use Rails, Django, or Node with Postgres. At 100M events/day, this stack is necessary. At 10M events/day, it's a judgment call—Postgres might work, but ClickHouse will be cheaper long-term.
Q: Is this stack reliable in production?
A: Yes. ClickHouse powers analytics for some of the largest companies in the world (Yandex, Spotify, etc.). Laravel has been battle-tested for 15+ years. Redis is the standard for caching. This stack is boring, which is good for reliability.
Q: How do I deploy this?
A: Docker containers (Laravel app + workers), managed ClickHouse (or self-hosted with containers), managed MySQL (or self-hosted), managed Redis (or self-hosted). Most startups deploy on AWS EC2 or DigitalOcean.
Next Steps
- Read the ClickHouse documentation to understand columnar storage
- Explore Laravel queues for background job handling
- Check out the Redis documentation for session management
- Consider your scale: do you really need this stack, or should you start simpler?
Building an analytics platform is complex, but choosing the right stack makes it manageable. We chose PHP + ClickHouse because it balances performance, cost, and iteration speed.
If you're building something similar, this stack might be right for you too.
Questions? Contact us or view the docs.
Previous article
Privacy-First Analytics and SEO: How One Affects the Other