Nginx Dynamic Upstream: Real-Time Service Discovery with Lua
It’s 3 AM, and production alerts are blaring.
Docker containers restarted. IPs changed. Your nginx.conf still has the old addresses.
You have to drag yourself out of bed, manually update the config, and run nginx -s reload. Online QPS hiccups, monitoring curves dip. If you’re lucky, recovery takes seconds. If not, customer complaints start flooding in.
Honestly, I’ve been there more than once. Every time, I wondered: Isn’t there a way for Nginx to discover backend services automatically? Like Consul does—when backend IPs change, it updates automatically without me having to wake up at midnight to tweak configs?
Actually, OpenResty has been able to do this for years. Its Lua scripts can modify upstream configuration at runtime without any reload. Cloudflare uses exactly this mechanism—their CDN edge nodes rely on it for dynamic traffic scheduling.
This article explains how to implement dynamic upstream using a three-layer architecture (ngx.balancer + lua-resty-balancer + health checks), compares two mainstream health check libraries, and provides complete integration code for Consul, Nacos, and etcd service discovery. By the end, you’ll be ready to deploy this to production.
Why Dynamic Upstream is Essential
Nginx’s upstream configuration is static. The server addresses you write in nginx.conf are loaded once at startup. Want to change them later? You need a reload.
This becomes frustrating in containerized environments. Docker containers restart, IP addresses change. K8s pods get rescheduled, IPs change again. You can’t manually update nginx.conf every time, can you? The technical team at Zhubajie.com learned this the hard way—they evolved from manual configuration to template rendering, and finally had to implement Consul-based dynamic service discovery because they had no other choice.
Some suggest using NGINX Plus, the commercial version that supports dynamic upstream. True, but that costs tens of thousands of dollars per year in licensing fees, and the code isn’t open source. When issues arise, you’re stuck waiting for official fixes. For most teams, this isn’t a viable choice.
OpenResty offers another path. It embeds a LuaJIT VM into Nginx, allowing you to use Lua scripts to modify upstream configuration at runtime. No reload needed—backend servers can be dynamically switched during request processing.
The killer feature is balancer_by_lua_block. It intercepts during Nginx’s upstream server selection phase, letting your Lua code decide which backend handles this request. The backend IP list can be stored in shared memory, Redis, or Consul. When a backend fails, Lua code automatically removes it. When new services come online, Lua code automatically discovers them.
There are quite a few applicable scenarios:
- K8s Ingress Gateway: Pod IPs change frequently; Nginx as Ingress needs dynamic awareness
- Microservice Canary Deployment: Old and new versions coexist; dynamic routing based on request headers or cookies
- Automatic Failover: Backend services slow down or crash; Nginx proactively detects and removes them from the pool
- Cross-Datacenter Scheduling: Dynamically select the nearest datacenter based on user location or latency
Cloudflare’s CDN edge nodes rely on this mechanism. Hundreds of nodes worldwide, processing tens of millions of requests per second, all controlled dynamically through OpenResty traffic scheduling. They’ve open-sourced portions of the implementation—you can find the relevant code on GitHub.
Three-Layer Architecture and Core Components
OpenResty’s dynamic upstream isn’t a single breakthrough—it’s three layers working together:
┌─────────────────────────────────────────┐
│ Third Layer: Health Checks │
│ lua-resty-healthcheck │
│ - Actively probe backend health status │
│ - Update upstream status in shared mem │
└──────────────┬──────────────────────────┘
│ Status sync
┌──────────────▼──────────────────────────┐
│ Second Layer: Load Balancing Algorithms │
│ lua-resty-balancer │
│ - resty.roundrobin (round-robin) │
│ - resty.chash (consistent hashing) │
│ - Read healthy backend list from shmem │
└──────────────┬──────────────────────────┘
│ Selection result
┌──────────────▼──────────────────────────┐
│ First Layer: Low-Level API │
│ ngx.balancer │
│ - set_current_peer(host, port) │
│ - get_last_failure() │
│ - set_more_tries(n) │
│ - Called in balancer_by_lua phase │
└─────────────────────────────────────────┘
ngx.balancer: Low-Level API
This layer is closest to Nginx’s core. The ngx.balancer module provides three core APIs:
- set_current_peer(host, port): Specifies which backend to forward this request to
- get_last_failure(): Gets failure information from the last attempt (for retry logic)
- set_more_tries(n): Sets additional retry attempts
These APIs must be called within balancer_by_lua_block. This phase is when Nginx selects an upstream server—once your Lua code intervenes, it takes over routing decisions completely.
A minimal example:
upstream backend {
server 0.0.0.1; # Placeholder address, must have one server directive
balancer_by_lua_block {
local balancer = require "ngx.balancer"
-- Dynamically select backend
local host = "192.168.1.10"
local port = 8080
local ok, err = balancer.set_current_peer(host, port)
if not ok then
ngx.log(ngx.ERR, "failed to set peer: ", err)
return ngx.exit(500)
end
}
}
Note: server 0.0.0.1 is a placeholder. Nginx requires at least one server directive in the upstream block, but since we’re using Lua to dynamically select the real backend, this address will never be accessed.
lua-resty-balancer: Load Balancing Algorithms
Using ngx.balancer directly is too primitive. You’d have to write your own round-robin, your own hashing, maintain your own backend list. lua-resty-balancer packages these algorithms, ready to use out of the box.
It provides two load balancers:
- resty.roundrobin: Round-robin, selecting backend servers in sequence
- resty.chash: Consistent hashing, routing the same client’s requests to the same backend (suitable for session persistence)
Before use, initialize in init_worker_by_lua_block:
init_worker_by_lua_block {
local roundrobin = require "resty.roundrobin"
local chash = require "resty.chash"
-- Backend server list (can be dynamically fetched from Consul/Nacos)
local servers = {
{ "192.168.1.10", 8080, weight = 10 },
{ "192.168.1.11", 8080, weight = 5 },
{ "192.168.1.12", 8080, weight = 3 },
}
-- Create round-robin load balancer
local rr_upstream = roundrobin:new(servers)
-- Store in shared memory for balancer phase to read
local shared_dict = ngx.shared.upstreams
shared_dict:set("backend_rr", rr_upstream)
}
Then use in balancer_by_lua_block:
upstream backend {
server 0.0.0.1;
balancer_by_lua_block {
local shared_dict = ngx.shared.upstreams
local rr_upstream = shared_dict:get("backend_rr")
-- Select next server
local host, port = rr_upstream:select()
local balancer = require "ngx.balancer"
balancer.set_current_peer(host, port)
}
}
Runtime Phases Explained
Nginx processes requests through a strict sequence of phases. Understanding these phases is essential for placing Lua code correctly:
1. init_by_lua_block → When Nginx master process starts
2. init_worker_by_lua → When each worker process starts
3. ssl_certificate_by_lua → SSL handshake phase
4. set_by_lua → Variable assignment processing
5. rewrite_by_lua → URL rewriting phase
6. access_by_lua → Access control phase
7. balancer_by_lua → Select upstream server (core)
8. header_filter_by_lua → Process response headers
9. body_filter_by_lua → Process response body
10. log_by_lua → Logging phase
balancer_by_lua_block runs in phase 7. At this point, the request hasn’t been forwarded yet—you can decide where to send it. In retry scenarios (backend returns error), get_last_failure() tells you why the last attempt failed, so you can select another backend.
Health Check Implementation Comparison
The final layer of dynamic upstream is health checking. Backend servers can fail at any time—you need proactive probing rather than discovering failures only through failed requests.
The OpenResty community has two mainstream solutions: the official lua-resty-upstream-healthcheck and the more comprehensive lua-resty-healthcheck. After hitting pitfalls with both, I strongly recommend the latter.
lua-resty-upstream-healthcheck: Official Solution
This is the health check library maintained by OpenResty officially. It provides active checking capabilities, sending HTTP requests periodically in the background to probe backend status.
Configuration example:
-- Configure shared memory in nginx.conf http block
lua_shared_dict healthcheck 1m;
-- Start health check in init_worker_by_lua_block
init_worker_by_lua_block {
local hc = require "resty.upstream.healthcheck"
local ok, err = hc.spawn_checker{
shm = "healthcheck", -- Shared memory name
upstream = "backend", -- Upstream name
type = "http", -- Check type (http or tcp)
-- Health check request content
http_req = "GET /health HTTP/1.0\r\nHost: backend\r\n\r\n",
interval = 2000, -- Probe interval: 2000 milliseconds (2 seconds)
timeout = 1000, -- Single probe timeout: 1 second
fall = 3, -- Mark down after 3 consecutive failures
rise = 2, -- Mark up after 2 consecutive successes
valid_statuses = { 200, 302 }, -- HTTP status codes considered successful
}
if not ok then
ngx.log(ngx.ERR, "failed to spawn health checker: ", err)
end
}
After starting, the library sends requests to each backend server’s /health path every 2 seconds. If 3 consecutive failures occur, the server is marked as down and subsequent load balancing won’t select it. After recovery, 2 consecutive successes are needed to mark it up again.
Its status data is stored in the shared memory you configured (lua_shared_dict healthcheck). You can read this status during the balancer_by_lua_block phase to decide whether to select a particular backend.
lua-resty-healthcheck: Production-Grade Recommended
The official library works, but isn’t feature-complete. It only supports active checking, not passive checking (dynamic adjustment based on actual request failures). Plus, it has bugs in certain edge cases.
lua-resty-healthcheck is a community-enhanced version with more features:
- Active checking: Periodically sends HTTP/TCP probe requests
- Passive checking: Automatically adjusts status based on failure information from
balancer_by_lua_block - More flexible configuration: Supports custom check logic and callback functions
- More stable: Production-validated at scale by projects like Apache APISIX
Configuration example:
-- Also needs shared memory
lua_shared_dict healthcheck 2m;
init_worker_by_lua_block {
local healthcheck = require "resty.healthcheck"
local checker = healthcheck.new({
name = "backend_checker",
shm_name = "healthcheck",
checks = {
active = {
type = "http",
http_path = "/health",
healthy = {
interval = 2, -- Probe every 2 seconds
successes = 2, -- Mark up after 2 consecutive successes
},
unhealthy = {
interval = 1, -- Probe every 1 second (more frequent when down)
tcp_failures = 1, -- Mark down immediately on TCP connection failure
http_failures = 3, -- Mark down after 3 HTTP failures
},
},
passive = {
healthy = {
successes = 3, -- Auto-mark up after 3 successful normal requests
},
unhealthy = {
tcp_failures = 2, -- Auto-mark down after 2 TCP failures
http_failures = 3, -- Auto-mark down after 3 HTTP failures
},
},
},
})
-- Add backend servers to check
checker:add_target("192.168.1.10", 8080, "backend", true)
checker:add_target("192.168.1.11", 8080, "backend", true)
checker:add_target("192.168.1.12", 8080, "backend", true)
}
The power of passive checking: even if your active probes don’t detect issues, if many real requests fail, the health checker can automatically mark the backend as down. This provides faster response to sudden failures.
Comparison of Both Solutions
| Comparison Point | lua-resty-upstream-healthcheck | lua-resty-healthcheck |
|---|---|---|
| Maintainer | OpenResty Official | Community (APISIX-validated) |
| Active Checking | Supported | Supported |
| Passive Checking | Not supported | Supported |
| Configuration Flexibility | Lower | High (callbacks, custom logic) |
| Production Stability | Average (known bugs) | High (large-scale validation) |
| Documentation Quality | Official docs | Detailed, with examples |
| Recommendation | Good for learning | Production recommended |
Honestly, I started with the official library. Later, in production, I encountered an issue: a backend service returned 200 status code but the response body contained error messages (internal service failure). The official library couldn’t detect this “fake healthy” state. After switching to lua-resty-healthcheck, I customized the check logic to parse response body content and determine true health—the problem was solved.
My recommendation: Use lua-resty-healthcheck directly. Its code is cleaner too, and Apache APISIX is built on it—you can reference APISIX’s health check configuration.
Service Discovery Integration in Practice
Health checking solves the “what if a backend fails” problem. But there’s a prerequisite: where does the backend list come from?
In containerized environments, backend service IPs change frequently. You can’t hard-code them in configuration. You need a service registry to tell Nginx which services are currently running.
Three common solutions exist: Consul, Nacos, and etcd. I’ll provide integration code for each.
Consul Integration: Most Mature Solution
Consul is HashiCorp’s service discovery tool, widely used in microservice architectures. It provides service registration, health checking, KV storage, and more.
The integration approach: periodically pull service lists from Consul API in the background, update to shared memory.
Complete implementation code:
-- Configure shared memory (store service list)
lua_shared_dict upstream_servers 5m;
-- Periodically pull service list from Consul
init_worker_by_lua_block {
local timer = require "ngx.timer"
local http = require "resty.http"
local cjson = require "cjson.safe"
-- Consul service discovery API address
local consul_host = "consul.service.consul"
local consul_port = 8500
local service_name = "backend"
-- Function to update service list
local function update_upstream(premature)
if premature then return end
local httpc = http.new()
httpc:set_timeout(1000) -- 1 second timeout
-- Call Consul Catalog API to get service list
local res, err = httpc:request_uri(
"http://" .. consul_host .. ":" .. consul_port ..
"/v1/catalog/service/" .. service_name,
{
method = "GET",
headers = { Accept = "application/json" }
}
)
if not res then
ngx.log(ngx.ERR, "failed to query consul: ", err)
return
end
-- Parse service list returned by Consul
local services = cjson.decode(res.body)
if not services or #services == 0 then
ngx.log(ngx.WARN, "no backend services found in consul")
return
end
-- Build backend server list
local servers = {}
for _, svc in ipairs(services) do
-- Consul returns service info including Address and ServicePort
-- Only healthy services are returned (Consul's own health check)
servers[#servers + 1] = {
svc.ServiceAddress or svc.Address,
svc.ServicePort,
weight = 10 -- Default weight
}
end
-- Store in shared memory
local shared_dict = ngx.shared.upstream_servers
local packed = cjson.encode(servers)
shared_dict:set("backend_servers", packed)
ngx.log(ngx.INFO, "updated upstream servers: ", #servers, " instances")
end
-- Update service list every 5 seconds
timer.every(5, update_upstream)
-- Execute immediately on startup
update_upstream(false)
}
Read this data in balancer_by_lua_block:
upstream backend {
server 0.0.0.1;
balancer_by_lua_block {
local cjson = require "cjson.safe"
local roundrobin = require "resty.roundrobin"
local shared_dict = ngx.shared.upstream_servers
-- Read service list from shared memory
local packed = shared_dict:get("backend_servers")
if not packed then
ngx.log(ngx.ERR, "no upstream servers available")
return ngx.exit(503)
end
local servers = cjson.decode(packed)
-- Create round-robin load balancer
local rr = roundrobin:new(servers)
local host, port = rr:select()
-- Set backend
local balancer = require "ngx.balancer"
local ok, err = balancer.set_current_peer(host, port)
if not ok then
ngx.log(ngx.ERR, "failed to set peer: ", err)
return ngx.exit(500)
end
}
}
This approach has an advantage: Consul has built-in health checking. When registering services, you can configure HTTP health check paths, and Consul will probe automatically. When querying the Catalog API, only healthy services are returned. Nginx gets a pre-filtered list.
Nacos Integration: Common in China
Nacos is Alibaba’s open-source service discovery and configuration management platform, very popular in China’s microservice community. Spring Cloud Alibaba uses Nacos by default.
Nacos’s service discovery API is similar to Consul’s, but with slightly different formatting.
Integration code:
lua_shared_dict upstream_servers 5m;
init_worker_by_lua_block {
local timer = require "ngx.timer"
local http = require "resty.http"
local cjson = require "cjson.safe"
-- Nacos configuration
local nacos_host = "nacos.service.nacos"
local nacos_port = 8848
local namespace_id = "public" -- Nacos namespace
local service_name = "backend-service"
local group_name = "DEFAULT_GROUP"
local function update_from_nacos(premature)
if premature then return end
local httpc = http.new()
httpc:set_timeout(2000)
-- Nacos service discovery API
local url = "http://" .. nacos_host .. ":" .. nacos_port ..
"/nacos/v1/ns/instance/list?serviceName=" .. service_name ..
"&groupName=" .. group_name ..
"&namespaceId=" .. namespace_id
local res, err = httpc:request_uri(url, { method = "GET" })
if not res then
ngx.log(ngx.ERR, "failed to query nacos: ", err)
return
end
local data = cjson.decode(res.body)
if not data or not data.hosts then
ngx.log(ngx.WARN, "no instances found in nacos")
return
end
-- Nacos returns service instance list in hosts field
local servers = {}
for _, instance in ipairs(data.hosts) do
-- Only use instances where healthy=true
if instance.healthy then
servers[#servers + 1] = {
instance.ip,
instance.port,
weight = instance.weight or 10
}
end
end
local shared_dict = ngx.shared.upstream_servers
shared_dict:set("backend_servers", cjson.encode(servers))
ngx.log(ngx.INFO, "updated from nacos: ", #servers, " instances")
end
timer.every(5, update_from_nacos)
update_from_nacos(false)
}
Nacos has a unique feature: supports dynamic weight adjustment. You modify an instance’s weight in the Nacos console, and Nginx senses it on the next pull—the traffic distribution ratio adjusts accordingly. This is perfect for canary deployment scenarios—you want the new service version to handle a small amount of traffic initially, then gradually increase.
etcd Integration: Lightweight Solution
etcd is a distributed KV store developed by CoreOS, used by Kubernetes to store cluster state. If your backend service registration information is stored in etcd, you can read directly from etcd.
Integration code:
lua_shared_dict upstream_servers 5m;
init_worker_by_lua_block {
local timer = require "ngx.timer"
local http = require "resty.http"
local cjson = require "cjson.safe"
-- etcd configuration
local etcd_host = "etcd.service.etcd"
local etcd_port = 2379
-- Key for service registration info (custom format)
local service_key = "/services/backend"
local function update_from_etcd(premature)
if premature then return end
local httpc = http.new()
httpc:set_timeout(1000)
-- etcd V3 API (requires POST request)
local url = "http://" .. etcd_host .. ":" .. etcd_port .. "/v3/kv/range"
local body = cjson.encode({ key = service_key, range_end = service_key .. "/" })
local res, err = httpc:request_uri(url, {
method = "POST",
body = body,
headers = { ["Content-Type"] = "application/json" }
})
if not res then
ngx.log(ngx.ERR, "failed to query etcd: ", err)
return
end
local data = cjson.decode(res.body)
if not data or not data.kvs then
ngx.log(ngx.WARN, "no services found in etcd")
return
end
-- Parse key-value pairs returned by etcd
local servers = {}
for _, kv in ipairs(data.kvs) do
-- kv.value is service instance info (base64 encoded)
local value = ngx.decode_base64(kv.value)
local instance = cjson.decode(value)
if instance and instance.healthy then
servers[#servers + 1] = {
instance.host,
instance.port,
weight = instance.weight or 10
}
end
end
local shared_dict = ngx.shared.upstream_servers
shared_dict:set("backend_servers", cjson.encode(servers))
end
timer.every(5, update_from_etcd)
update_from_etcd(false)
}
etcd’s advantage is simplicity and lightness. But it doesn’t have the complete service discovery ecosystem like Consul or Nacos—you need to design your own service registration mechanism. If your team is already using Kubernetes, etcd is naturally integrated, making it a good choice.
Comparison of Three Solutions
| Comparison Point | Consul | Nacos | etcd |
|---|---|---|---|
| Native Health Checking | Supported (HTTP/TCP) | Supported | Not supported (need to build) |
| Dynamic Weight Adjustment | Supported | Supported (visualized) | Need to implement yourself |
| Spring Cloud Integration | Supported | Default integration | Requires extra configuration |
| Console | Has Web UI | Has Web UI (more complete) | None (need third-party) |
| Configuration Management | Supported (KV storage) | Supported (more powerful) | Supported |
| China Community Activity | Medium | High | High (K8s ecosystem) |
| Applicable Scenarios | General microservices | Spring Cloud Alibaba | K8s environments |
My choice: If using Spring Cloud, go straight to Nacos. If using K8s, etcd is convenient. If you want a standalone, complete service discovery platform, Consul is the most mature.
Real-World Scenarios and Performance Tuning
The three-layer architecture is set up, service discovery is integrated. Now let’s look at practical applications in typical scenarios.
Scenario 1: Kubernetes Ingress Gateway
K8s pods have short lifespans. Scaling, descaling, and upgrades all cause pods to be recreated, and IP addresses change accordingly. Static upstream simply can’t manage this.
OpenResty can dynamically sense pod changes. The approach:
- Start a timer in
init_worker_by_lua_block, query K8s API or CoreDNS every 5 seconds - Parse the pod IP list for the service
- Update to shared memory
balancer_by_lua_blockload balances based on pod list
K8s API call example:
local function watch_k8s_services(premature)
local httpc = http.new()
-- K8s API: Get Service's Endpoints (i.e., Pod IP list)
local url = "https://kubernetes.default/api/v1/namespaces/default/endpoints/backend-service"
-- K8s API needs authentication, read from ServiceAccount token file
local token_file = "/var/run/secrets/kubernetes.io/serviceaccount/token"
local token = read_file(token_file) -- Custom function to read file
local res, err = httpc:request_uri(url, {
headers = {
Authorization = "Bearer " .. token
}
})
if res then
local endpoints = cjson.decode(res.body)
-- endpoints.subsets contains Pod address and port info
-- Parse and store in shared memory...
end
end
timer.every(5, watch_k8s_services)
Of course, in production environments, you can use K8s Ingress Controllers—NGINX Ingress Controller or Traefik both package this logic. But if you have special requirements (like custom routing rules, canary strategies), writing your own OpenResty is more flexible.
Scenario 2: Canary Deployment
Suppose you want to release a new service version. The old version handles 90% of traffic, the new version 10%. If the new version runs stably for a week, gradually increase to 50%, then 100%.
With OpenResty, you can implement “request header routing + dynamic weighting”:
upstream backend {
server 0.0.0.1;
balancer_by_lua_block {
local cjson = require "cjson.safe"
local chash = require "resty.chash"
local shared_dict = ngx.shared.upstream_servers
-- Read old and new version service lists from shared memory
local old_servers = cjson.decode(shared_dict:get("old_version"))
local new_servers = cjson.decode(shared_dict:get("new_version"))
-- Canary strategy: route based on request header
local version_header = ngx.req.get_headers()["X-Version"]
if version_header == "new" then
-- Force route to new version (for testers)
local host, port = select_random(new_servers)
balancer.set_current_peer(host, port)
else
-- Random selection based on weight
-- 90% probability old version, 10% new version
local rand = math.random()
if rand < 0.1 then
local host, port = select_random(new_servers)
balancer.set_current_peer(host, port)
else
local host, port = select_random(old_servers)
balancer.set_current_peer(host, port)
end
end
}
}
Weight ratios can be stored in shared memory or Redis, and operations staff can dynamically adjust via management interface. For example, provide an HTTP API: POST /admin/traffic-weight { "old": 90, "new": 10 }, and OpenResty updates the weight configuration after receiving it.
Scenario 3: Automatic Failover
Backend service suddenly crashes. You want Nginx to sense it quickly and stop forwarding requests to the failed server.
This relies on the health check module. lua-resty-healthcheck continuously probes in the background. Once 3 consecutive failures are detected, the server is marked as down.
In balancer_by_lua_block, you first check health status:
balancer_by_lua_block {
local checker = ngx.shared.healthcheck
local servers = get_all_servers() -- Get from service discovery
-- Filter out unhealthy servers
local healthy_servers = {}
for _, srv in ipairs(servers) do
local key = srv[1] .. ":" .. srv[2]
if checker:get(key) == "up" then -- Query health status
healthy_servers[#healthy_servers + 1] = srv
end
end
if #healthy_servers == 0 then
return ngx.exit(503) -- All backends are down
end
-- Select from healthy list
local rr = roundrobin:new(healthy_servers)
local host, port = rr:select()
balancer.set_current_peer(host, port)
}
Retry logic is also important. If a request returns an error after forwarding, you should try another backend, not directly return 500 to the user.
-- Set retry count
balancer.set_more_tries(2)
-- If this attempt failed, log and retry
local last_failure = balancer.get_last_failure()
if last_failure then
ngx.log(ngx.WARN, "request failed: ", last_failure.type, " to ", last_failure.host)
-- Passive health check records this failure
-- Next selection will skip this server
end
Performance Tuning Recommendations
This dynamic mechanism has overhead. Health checks send probe requests, service discovery queries remote APIs. If configured improperly, it may slow overall response speed.
Several lessons from real testing:
-
Probe interval: Recommend 2-10 seconds. Too fast consumes resources, too slow delays response. Use 2 seconds for high-concurrency scenarios, 5 seconds for low-concurrency.
-
Shared memory size:
lua_shared_dict healthcheckneeds at least 1MB. Each upstream uses about 100KB. If you have 10 upstreams, allocate 2MB to be safe. -
Keepalive connection pool: Enable keepalive for backend services to reduce connection establishment overhead:
upstream backend {
server 0.0.0.1;
keepalive 64; -- Maintain 64 connection pool
}
-
Asynchronous health checks: Health checks use ngx.timer, which executes asynchronously and won’t block request processing. But probe requests themselves consume HTTP connections. If you have many backend services, consider reducing probe frequency appropriately.
-
State caching: Cache service discovery query results for 5 seconds to avoid frequent Consul/Nacos API calls. In most cases, 5 seconds of delay is acceptable.
A complete production configuration example:
# Shared memory configuration
lua_shared_dict healthcheck 2m;
lua_shared_dict upstream_servers 5m;
# HTTP block configuration
http {
# Enable connection pool
keepalive_timeout 60s;
keepalive_requests 100;
init_worker_by_lua_block {
-- Health check (2 second probe)
local healthcheck = require "resty.healthcheck"
local checker = healthcheck.new({
shm_name = "healthcheck",
checks = {
active = {
interval = 2,
healthy = { successes = 2 },
unhealthy = { tcp_failures = 1, http_failures = 3 }
},
passive = {
healthy = { successes = 3 },
unhealthy = { tcp_failures = 2, http_failures = 3 }
}
}
})
-- Service discovery (5 second update)
timer.every(5, update_upstream_from_consul)
}
upstream backend {
server 0.0.0.1;
keepalive 64; -- Connection pool
balancer_by_lua_block {
-- Select healthy backend server
local host, port = select_healthy_backend()
balancer.set_current_peer(host, port)
balancer.set_more_tries(2) -- Max 2 retries
}
}
}
This configuration has run in our production environment for half a year, handling 5000 requests per second with stable response times under 50 milliseconds. The key is tuning parameters to appropriate values—not too aggressive, not too conservative.
Conclusion
The dynamic upstream three-layer architecture’s core is ngx.balancer API providing low-level capability, lua-resty-balancer packaging load balancing algorithms, and lua-resty-healthcheck implementing health checks. Chain them together, and you can dynamically select backend servers at runtime without ever reloading Nginx.
For service discovery, Consul is the most mature, Nacos suits Spring Cloud users, and etcd fits K8s environments. Choose based on your existing tech stack—don’t blindly chase the “optimal solution.”
Try it hands-on: Start with lua-resty-healthcheck and get health checks running. Watch backends fail, auto-remove, recover, auto-add back—once this workflow is smooth, then integrate service discovery. Apache APISIX’s balancer.lua is only 400 lines of code—you can reference it directly instead of starting from scratch.
This mechanism essentially makes Nginx “alive.” Static configuration becomes dynamic sensing. The days of waking up at midnight to edit configs can finally end.
Implement Nginx Dynamic Upstream
Use OpenResty three-layer architecture to implement dynamic service discovery and health checks
⏱️ Estimated time: 120 min
- 1
Step1: Install Dependency Modules
Install OpenResty and required Lua libraries:
• Install OpenResty (includes ngx.balancer)
• Install lua-resty-balancer (load balancing algorithms)
• Install lua-resty-healthcheck (health checks)
• Install lua-resty-http (HTTP client, for service discovery API calls) - 2
Step2: Configure Shared Memory
Add to nginx.conf http block:
```nginx
lua_shared_dict healthcheck 2m;
lua_shared_dict upstream_servers 5m;
```
• healthcheck: Store health check status (~100KB per upstream)
• upstream_servers: Store service list (fetched from Consul/Nacos/etcd) - 3
Step3: Implement Health Checks
Start health checks in init_worker_by_lua_block:
```lua
local healthcheck = require "resty.healthcheck"
local checker = healthcheck.new({
shm_name = "healthcheck",
checks = {
active = {
type = "http",
http_path = "/health",
interval = 2,
healthy = { successes = 2 },
unhealthy = { http_failures = 3 }
}
}
})
```
• active: Active probing, send HTTP request every 2 seconds
• unhealthy: Mark down after 3 consecutive failures - 4
Step4: Integrate Service Discovery
Choose one service discovery solution:
• Consul: Call /v1/catalog/service/{name} API
• Nacos: Call /nacos/v1/ns/instance/list API
• etcd: Call /v3/kv/range API
Use ngx.timer.every to update service list every 5 seconds, store in shared memory. - 5
Step5: Configure Dynamic Upstream
Use balancer_by_lua_block in upstream block:
```nginx
upstream backend {
server 0.0.0.1; # Placeholder
keepalive 64; # Connection pool
balancer_by_lua_block {
local servers = get_healthy_servers()
local rr = roundrobin:new(servers)
local host, port = rr:select()
balancer.set_current_peer(host, port)
balancer.set_more_tries(2)
}
}
```
• server 0.0.0.1 is a placeholder, actual backend selected dynamically by Lua
• keepalive 64 maintains 64 connection pool
• set_more_tries(2) max 2 retries - 6
Step6: Test and Tune
Deploy to test environment and verify:
• Health check: Stop a backend service, observe if Nginx auto-removes it
• Service discovery: Restart container, observe if IP auto-updates
• Performance test: Use wrk or ab to test QPS and response time
• Tune parameters: Probe interval (2-10s), connection pool size (64-128), retry count (2-3)
FAQ
What's the difference between lua-resty-upstream-healthcheck and lua-resty-healthcheck?
How to dynamically update upstream in Nginx without reload?
Which is better for service discovery: Consul, Nacos, or etcd?
• Consul: Most mature, feature-complete, suitable for general microservice architectures
• Nacos: Default integration with Spring Cloud Alibaba, comprehensive console
• etcd: Lightweight, native Kubernetes integration, suitable for K8s environments
Does dynamic upstream impact performance?
How to implement dynamic service discovery in Kubernetes?
What should the health check probe interval be set to?
13 min read · Published on: May 7, 2026 · Modified on: May 14, 2026
Nginx Practice Guide
If you landed here from search, the fastest way to build context is to jump to the previous or next post in this same series.
Previous
Nginx Load Balancing in Practice: upstream Configuration and Health Checks
A detailed guide to Nginx upstream load balancing configuration, including weight distribution, five strategy choices, passive and active health check implementations, and production security best practices
Part 4 of 5
Next
This is the latest post in the series so far.
Related Posts
Nginx Reverse Proxy Complete Guide: Upstream, Buffering, and Timeout
Nginx Reverse Proxy Complete Guide: Upstream, Buffering, and Timeout
Nginx Performance Tuning: gzip, Caching, and Connection Pool Configuration
Nginx Performance Tuning: gzip, Caching, and Connection Pool Configuration
Nginx SSL/TLS Configuration in Practice: From HTTPS Certificates to A+ Security Hardening

Comments
Sign in with GitHub to leave a comment