Troubleshooting

Symptoms collected from real-world walkthroughs. If you hit something not listed here, the most useful first diagnostic is always docker logs ogmara-l2 --tail 50 (or sudo journalctl -u ogmara-l2 -n 50 for source builds) — the error message almost always names the resource that failed.

Container crash-loops with Error: Permission denied (os error 13)

Two flavours, both rooted in --user $(id -u):$(id -g) meeting a host directory the operator's UID doesn't own:

  • Creating /etc/ogmara/ogmara.toml/etc/ogmara still root-owned from sudo mkdir in Server Prep. Fix: sudo chown $(id -u):$(id -g) /etc/ogmara.
  • Creating RocksDB files in /data/var/lib/ogmara/data still root-owned (or owned by the ogmara system user from the source-build chown). Fix: sudo chown -R $(id -u):$(id -g) /var/lib/ogmara/data.

The container is on --restart unless-stopped so the next automatic restart (within ~10s) picks up the corrected permissions.

Permission denied creating ./data or files appearing in /var/lib/ogmara/data/data/

ogmara-node init emits the source-build default data_dir = "./data", which inside the Docker container (WORKDIR=/data) resolves to /data/data. As of l2-node v0.47.2 the Docker entrypoint rewrites this to data_dir = "/data" automatically when generating a fresh config. If you have an older config from before v0.47.2, fix manually:

sed -i 's|^data_dir = "\./data"|data_dir = "/data"|' /etc/ogmara/ogmara.toml
docker restart ogmara-l2
curl http://127.0.0.1:41721/api/v1/health from the host returns empty

Same root cause as above for older configs: ogmara-node init emits listen_addr = "127.0.0.1" which binds the API to the container's loopback. Docker's -p 41721:41721 port forward delivers packets to the bridge interface (eth0) where nothing listens. Fix (and re-fix on each config regenerate prior to l2-node v0.47.2):

sed -i 's|^listen_addr = "127.0.0.1"|listen_addr = "0.0.0.0"|' /etc/ogmara/ogmara.toml
docker restart ogmara-l2

Inside-container diagnostic that doesn't require touching the config: use a one-shot curlimages/curl container that shares the L2 node's network namespace — 127.0.0.1 in the ephemeral container is the L2 node's loopback:

docker run --rm --network container:ogmara-l2 curlimages/curl:latest \
  -s http://127.0.0.1:41721/api/v1/health
Dashboard subdomain shows bare authentication required instead of the Connect Wallet overlay

You loaded the wrong URL. The dashboard HTML (with the login overlay) is at the full path /admin/dashboard — visiting https://stats.yourdomain.com/ (root) or https://stats.yourdomain.com/admin/ (no trailing /dashboard) hits an auth-protected route and returns the plain-text authentication required body. Always use:

https://stats.yourdomain.com/admin/dashboard
Docker container says it auto-generates a config but no file appears in /etc/ogmara/

You're running a cached l2-node-latest image that pre-dates v0.46.1 (the version that added the auto-generate entrypoint). docker image ls ogmara/ogmara:l2-node-latest --digests — if the IMAGE ID isn't dated 2026-06-01 or later, force a pull:

docker rm -f ogmara-l2
docker pull ogmara/ogmara:l2-node-latest
# then re-run the docker run from the Install — Docker page
Logs show Chain scanner rate-limited, will back off (HTTP 429 from Klever API)

Expected during fresh-node history replay. The Klever testnet/mainnet RPC is fronted by Cloudflare (error code 1015 is the Cloudflare rate-limit response) and a fresh node hammers it. The chain scanner exponential-backs-off (5s → 10s → 20s → 40s) and resumes on its own. Sync just takes longer than ideal; no action required.

Logs show Failed to query isNodeRegistered (treated as not-registered for now)

Benign. Your node hasn't run registerNode on the smart contract yet, so isNodeRegistered returns no data and the SC view's empty response confuses the decoder. The node treats it correctly as "not registered" and the warning is informational. If you want to register (so other nodes can discover you via getActiveNodes and you can participate in anchoring quorum), see the Admin Dashboard's Anchoring tab or the [anchoring.metadata] config block.

Logs show Snapshot bootstrap aborting: only 0/1 peer(s) reachable within discovery timeout, need at least 3

On a small testnet (e.g. only 2–3 active operators) the default quorum-min of 3 can't be met. The node falls back to chain-scanner-driven sync, which works but is slower. To allow snapshot bootstrap from fewer peers:

# In /etc/ogmara/ogmara.toml
[snapshot]
quorum_min_peers = 1   # default 3 — trades sybil resistance for faster small-network sync

Only do this on testnet or small private deployments — reducing the quorum lets a single hostile peer feed you bogus snapshot state.

Logs show Failed to trigger bootstrap: No known peers immediately at startup

Expected on a fresh node before the SC-discovery (tier 3) fan-out completes. The node has an empty bootstrap_nodes = [] by default and relies on getActiveNodes from the Ogmara KApp smart contract to discover peers. Within ~60 seconds you should see sc_discovery: fetched candidate set, fetching metadata followed by Connection established log lines. If those don't appear within a few minutes, check that [klever] node_url is reachable from inside the container.

curl https://node.yourdomain.com/api/v1/health returns Apache's default HTML 404

Symptom: the response body is the literal Not Found / The requested URL was not found on this server. HTML page (with Server: Apache/2.4.x (Debian) Server at ... Port 443 in the footer), not a JSON error from the L2 node. This means Apache is returning 404 itself without forwarding the request to the proxy upstream.

Confirm the L2 node itself is healthy (bypass Apache entirely):

docker run --rm --network container:ogmara-l2 curlimages/curl:latest \
  -s http://127.0.0.1:41721/api/v1/health

If that returns the expected {"status":"ok",...} JSON, the L2 node is fine and Apache is the problem. Two common causes:

  • Certbot's -le-ssl.conf shadow vhost — an auto-generated SSL companion file without the ProxyPass rules is shadowing your real vhost. See Reverse Proxy → Certbot's -le-ssl.conf shadow vhost for the detection + disable steps.
  • The public-API vhost for node.yourdomain.com simply doesn't exist (you may have set up only the admin-dashboard subdomain in Admin Dashboard). Add a separate vhost per the Reverse Proxy template — both subdomains can co-exist.

Confirm the fix:

sudo apache2ctl -S 2>&1 | grep "namevhost node.yourdomain.com"
# Should show exactly ONE namevhost line per port (not two).

curl -s https://node.yourdomain.com/api/v1/health
# Should return JSON, not HTML.
My node has a red dot (unreachable) on ogmara.org/network.html even though the API works

The network page's reachability probe is a browser-side fetch() from ogmara.org against your https://node.yourdomain.com/api/v1/network/identity. That's a cross-origin request, and the browser enforces CORS: if your node's [api] cors_origins list doesn't include https://ogmara.org, the browser blocks the response body and the probe sees the failure as "did not respond within 5 seconds" (red dot).

Add https://ogmara.org to your cors_origins array:

sed -n '/^cors_origins/p' /etc/ogmara/ogmara.toml
# If the array doesn't include "https://ogmara.org", edit and add it:
sudo nano /etc/ogmara/ogmara.toml
# Change:
#   cors_origins = ["https://node.yourdomain.com"]
# To:
#   cors_origins = ["https://node.yourdomain.com", "https://ogmara.org"]

docker restart ogmara-l2

Hard-reload the network page (Ctrl+Shift+R / Cmd+Shift+R) to clear the in-memory probe cache. The dot turns green within ~5 seconds.

After upgrading to l2-node v0.48.0+, the [network.presence] block is missing from ogmara.toml

The Docker entrypoint only auto-generates ogmara.toml the FIRST time a container starts — existing configs are never modified, so a config from a pre-v0.48.0 release won't gain the new [network.presence] section automatically. Without that block the presence subsystem stays off (default), and your node won't appear on the network page if you're not on-chain registered. Append the block manually:

sed -n '/^\[network\.presence\]/,/^\[/p' /etc/ogmara/ogmara.toml | head -3
# Empty? Append the block:
sudo tee -a /etc/ogmara/ogmara.toml <<'EOF'

[network.presence]
enabled = true
record_ttl_secs = 86400
rebroadcast_interval_secs = 21600
denylist = []
EOF

docker restart ogmara-l2
# Verify broadcasting=true after restart:
docker run --rm --network container:ogmara-l2 curlimages/curl:latest \
  -s http://127.0.0.1:41721/api/v1/network/presence | python3 -m json.tool | head -10
My node appears as two rows on ogmara.org/network.html — once as 3Y5BpfC... and once as 12D3KooW...

Same physical node in two different ID encodings: 3Y5BpfC... is the Ogmara short form (sha256-truncated-bs58 of the libp2p pubkey, returned by /api/v1/network/nodes), and 12D3KooW... is the standard libp2p PeerId (multihash of the same pubkey, used by presence-gossip records). Both derive from the same key but can't string-match. The website's dedup logic was patched in tandem with l2-node v0.48.1 (which adds a peer_id field to /api/v1/network/nodes) to collapse them into one row. If you see this on v0.48.0 or older: upgrade to v0.48.1+ and hard-reload the page.

/api/v1/network/presence shows broadcasting: true but records: [] — and other presence-enabled nodes have empty caches too

Three compounding bugs in early v0.48 releases. Up through v0.48.2, maybe_publish_initial_presence required ≥ 3 connected peers (later lowered to ≥ 1), AND the publish fired on ConnectionEstablished — which happens BEFORE the gossipsub subscription handshake completes. Combined with the original 6h rebroadcast_interval_secs default, freshly-restarted nodes were effectively invisible for hours. The publish would log error: NoPeersSubscribedToTopic, topic_subscribers: 0, set the "broadcast done" flag anyway, and never retry.

Two ways to recover:

  • Upgrade to v0.48.3+ (recommended). Threshold ≥ 1 peer, default rebroadcast 1h, AND the initial-broadcast now retries on the gossipsub::Event::Subscribed event — i.e. the moment we KNOW a peer is in the topic mesh, so NoPeersSubscribedToTopic can't recur. The done-flag also stays unset on publish failure so retries are unhindered.
  • Workaround for older versions: manually lower rebroadcast_interval_secs in ogmara.toml (e.g. to 600 = 10 minutes) and restart. Every node that does this re-publishes every 10 min regardless of mesh size; caches fill in < 20 min.
sed -i 's/^rebroadcast_interval_secs = .*/rebroadcast_interval_secs = 600/' /etc/ogmara/ogmara.toml
docker restart ogmara-l2
# Wait ~15 min, then verify both ends:
docker run --rm --network container:ogmara-l2 curlimages/curl:latest \
  -s http://127.0.0.1:41721/api/v1/network/presence | python3 -m json.tool | head -10
curl -s https://peer.example.org/api/v1/network/presence | python3 -m json.tool | head -10
← Back to Run a Node   Quick Reference →