HA & DR

RAC Node Eviction: A Troubleshooting Checklist That Starts With "Why"


A node disappears from your cluster at 3am. crsctl stat res -t shows it down, the surviving node logged a reconfiguration, and someone is already asking whether you lost data. You didn’t — and that’s the entire point of eviction. The harder question is the one you actually have to answer: why, and will it happen again tonight.

This is a checklist for that second question. Not “restart Grid Infrastructure and hope” — a way to read the cluster’s own logs and land on the real cause: the interconnect, the voting disks, or a node that starved itself to death.

The short version. A RAC node is evicted when it can no longer prove it is healthy to the rest of the cluster — it stopped answering the network heartbeat over the interconnect (misscount, default 30s), it lost access to a majority of the voting disks (disktimeout, default 200s), or its own local guardian processes found ocssd hung. Eviction is not the bug; it is the cluster protecting your data from split-brain. The fix is always upstream — find which heartbeat failed, and why.

Why eviction exists: split-brain is worse than downtime

Picture the interconnect between two nodes going dark. Node 1 can’t see Node 2; Node 2 can’t see Node 1. Each concludes it is the lone survivor. Both keep opening the shared database and writing to the same datafiles on shared storage — with no coordination of locks or buffer state between them. That is split-brain, and it doesn’t cause downtime; it causes corruption, the kind you discover weeks later in a block that two nodes overwrote independently.

Eviction is Clusterware’s refusal to let that happen. When membership becomes uncertain, it forcibly removes nodes until exactly one consistent cluster remains. You trade one node’s availability for the integrity of the database. That is always the right trade — which is why the goal of troubleshooting is never “stop the evictions,” it’s “remove the condition that made membership uncertain.”

The three heartbeats — this is the whole mental model

Cluster Synchronization Services (CSS), via the ocssd daemon on each node, keeps three heartbeats alive. Understand these and most evictions diagnose themselves.

HeartbeatWhat it provesOverTimeoutA miss triggers
Network”other nodes can still reach me”private interconnectmisscount (≈30s)suspected split-brain → the losing side is evicted
Disk”I can still see the cluster’s source of truth”voting disksdisktimeout (≈200s)the node evicts itself
Local”my own ocssd is alive and responsive”cssdagent + cssdmonitor on the nodeshort, internalreboot — or a rebootless restart of the stack

The network heartbeat is sent every second across the interconnect. Miss it for misscount seconds and CSS assumes the node is gone or partitioned. The disk heartbeat is written to the voting files every second; lose access to a majority of them for disktimeout seconds and the node removes itself, because a node that can’t see the voting majority cannot safely claim membership. The local heartbeat is the subtle one: cssdagent and cssdmonitor watch ocssd on the same machine, so a node frozen by CPU starvation or an OS hang — where ocssd is alive but can’t get scheduled — gets put down by its own guardians.

flowchart TD
A[ocssd: heartbeats every 1s] --> B{Network heartbeat<br/>over interconnect?}
B -- "missed > misscount (~30s)" --> SB[Split-brain suspected]
B -- OK --> C{Disk heartbeat to<br/>majority of voting disks?}
C -- "lost > disktimeout (~200s)" --> EV[Node evicts itself]
C -- OK --> D{Local: is ocssd<br/>alive and responsive?}
D -- "hung / starved" --> RB[cssdagent reboots<br/>or restarts the stack]
D -- OK --> H[Healthy member]
SB --> V{Does my sub-cluster see a<br/>majority of voting disks?}
V -- "yes (larger / lowest node#)" --> H
V -- no --> EV
The eviction decision, per node. Any one heartbeat failing for its timeout is enough. Network-heartbeat loss escalates to a split-brain vote resolved by the voting disks.

Split-brain resolution: who actually survives

When the cluster splits, CSS doesn’t flip a coin. The sub-cluster that can see a majority of the voting disks wins. Between otherwise-equal partitions, the larger sub-cluster survives; on a true tie (e.g., a two-node cluster split clean down the middle), the node with the lowest node number survives and the other is evicted. The losing nodes are fenced.

This is exactly why voting disks come in odd numbers (1, 3, 5) and should sit across independent failure groups: a node must reach more than half of them to stay in the cluster. Three voting files on three separate storage paths means a node can lose one path and still vote.

The usual causes, ranked by how often they’re the culprit

  1. The private interconnect — the number-one cause, by a wide margin. Dropped or corrupted packets, a NIC flapping, a flaky switch port, or — the classic intermittent gremlin — an MTU / jumbo-frame mismatch where 9000-byte frames work until something fragments. A saturated interconnect (sharing a NIC with backup or application traffic) starves the heartbeat the same way a dead link does.
  2. Voting disk / storage I/O. A lost SAN path, multipath flapping, or storage latency that exceeds disktimeout makes a node unable to write its disk heartbeat. If the ASM disk group holding the voting files goes offline, every node that loses the majority self-evicts.
  3. Node hang / resource starvation. A node pinned at 100% CPU, or thrashing in swap, can’t schedule ocssd — so it misses heartbeats it is technically “up” to send. This looks like a network problem in the logs but is really a performance problem. (Diagnose the starvation itself the way you would any slow database — see How to Read an AWR Report Without Drowning — alongside OS-level data.)
  4. Time synchronization drift. Large clock skew between nodes destabilizes membership. Grid Infrastructure runs the Cluster Time Synchronization Service (ctssd) in observer mode when NTP/chrony is configured, active mode when it isn’t; a broken time setup undermines both.
  5. Hardware faults and known bugs. A failing NIC/HBA, bad memory, or a Clusterware bug fixed in a later Release Update. Always check the eviction signature against current GI patches.

Diagnosis: read the logs in this order

Work top-down. The first log tells you when and that; the rest tell you why. (Paths are Oracle-Base/version-dependent — 12c+ uses the ADR trace layout; 11.2 uses $GRID_HOME/log/<host>/....)

  1. GI alert log — start here. Find the eviction timestamp and the reconfiguration message. This anchors every other log to a moment in time.
  2. ocssd trace (ocssd.trc / ocssd.log) — the heartbeat story. Search for phrases like “missed checkin”, “Polling”, and eviction/kill messages around the alert-log timestamp.
  3. cssdagent / cssdmonitor logs — read these if the node rebooted; they record the local guardian’s decision to put the node down.
  4. OS messages (/var/log/messages, journalctl) — the reboot time, hardware/driver errors, and any OOM-killer activity. A reboot with no Clusterware reason in the GI logs points at the OS or hardware.
  5. Cluster Health Monitor (CHM / oclumon) and OSWatcher — the single most useful evidence: per-second CPU, memory, and network counters from the seconds before the eviction. If you don’t have these running, the smoking gun is already gone by morning.
What you see in the logsLikely causeConfirm with
”missed checkin” / “Polling” on the interconnectnetwork loss or saturationOSWatcher netstat/ifconfig, switch logs, MTU test
disk-heartbeat / voting-file I/O timeoutstorage path loss or latencymultipath status, ASM disk state, SAN latency
node rebooted, ocssd “hung”, killed by cssdagentCPU/memory starvation or OS hangCHM/OSWatcher CPU & memory at eviction time
clock-skew / ctssd warningsbroken time syncchronyc/ntpq, ctssd trace
reboot, nothing in GI logshardware fault or OS panic/var/log/messages, IPMI/ILOM, vendor diagnostics

Want to practice this? The RAC node-eviction forensics lab gives you five realistic scenarios — interconnect, voting disk, starvation, time drift, and one that isn’t an eviction at all — as raw logs to diagnose. No cluster required; it’s text and a grade.sh self-check.

Rebootless restart: why the node sometimes doesn’t reboot

On 11.2.0.2 and later, Grid Infrastructure tries a rebootless restart: instead of bouncing the whole OS, it attempts to gracefully stop and restart just the GI stack when the failure is inside the stack and I/O can be safely halted. When it can’t safely stop I/O — a kernel-level hang, for instance — it falls back to a full node reboot to guarantee fencing. So “the node restarted Clusterware but didn’t reboot” and “the node power-cycled” are two outcomes of the same protective logic, and the logs distinguish them.

Fix and prevent, by cause

CauseFix it nowStop it recurring
Interconnectrepair the link/switch; verify MTU is consistent end-to-endredundant private NICs (HAIP or bonding), a dedicated interconnect, validated jumbo frames, no app/backup traffic sharing it
Voting / storagerestore the storage path; check ASM disk-group stateodd number of voting files across independent failure groups; monitor I/O latency; healthy multipathing
Starvationrelieve the CPU/memory pressureheadroom on every node; don’t co-locate greedy workloads; deploy CHM; diagnose the load like any perf issue
Time syncfix chrony/NTP on all nodeskeep time sync healthy cluster-wide, or let ctssd run active consistently
Bug / hardwareapply the relevant GI Release Update; replace the faulty partstay current on GI RUs; monitor hardware health proactively

What teams get wrong

  • Treating the eviction as the failure. The node did its job. The bug is whatever made membership uncertain — chase that.
  • Raising misscount to “fix” it. This usually masks the real problem and widens the window where split-brain is possible. Oracle generally advises against changing it; reach for it last, if ever.
  • No CHM or OSWatcher running. Without per-second history, the CPU/network spike that caused the eviction is unrecoverable by the time you log in. Deploy them before the next incident — they are the difference between a root cause and a guess.
  • A single interconnect NIC with no redundancy — one cable or port becomes a cluster-wide outage.
  • Voting disks on the same fragile storage path as everything else — the tiebreaker shouldn’t share a failure domain with the data.

RAC is the layer that keeps a node failure from becoming an outage — see where it fits among the other HA options in The Oracle HA Decision Tree. And to drill the skill this whole post is about — reading the logs and naming the cause — work through the five scenarios in the no-Docker RAC node-eviction forensics lab. For a real cluster to break and recover (you’ll want a 32 GB+ host and your own Enterprise Edition binaries), Oracle’s official RAC-on-Docker and Vagrant projects are the supported, license-compliant path.

Frequently asked questions

What is node eviction in Oracle RAC?

Node eviction is when Oracle Clusterware forcibly removes a node from the cluster because that node can no longer prove it is healthy and reachable. It is a protective action that prevents split-brain — two nodes independently writing to the same shared database and corrupting it.

What is the most common cause of RAC node eviction?

Private interconnect problems are the most common cause: dropped or corrupted packets, a flapping NIC, a bad switch port, an MTU or jumbo-frame mismatch, or a saturated interconnect that shares bandwidth with other traffic. Storage and resource starvation are the next most common.

What is the difference between misscount and disktimeout?

Misscount is the network-heartbeat timeout (default about 30 seconds): how long a node can miss interconnect heartbeats before CSS treats it as gone. Disktimeout is the voting-disk timeout (default about 200 seconds): how long a node can fail to access a majority of voting disks before it evicts itself.

Does a node eviction always reboot the server?

Not always. On 11.2.0.2 and later, Grid Infrastructure attempts a rebootless restart — gracefully stopping and restarting only the GI stack when the failure is within the stack and I/O can be safely halted. It falls back to a full reboot when it cannot safely stop I/O, such as a kernel-level hang.

How do I find out why a node was evicted?

Read the logs in order: the Grid Infrastructure alert log for the eviction time, then the ocssd trace for the heartbeat misses, then the cssdagent and cssdmonitor logs if the node rebooted, then the OS messages, and finally Cluster Health Monitor and OSWatcher data for the CPU, memory, and network state in the seconds before the eviction.

How do voting disks prevent split-brain?

A node must be able to see a majority of the voting disks to remain a cluster member. When the cluster splits, the sub-cluster that sees the voting majority survives and the other is evicted. That is why voting disks are deployed in odd numbers across independent failure groups.

Can high CPU or memory pressure cause a node eviction?

Yes. A node pinned at 100% CPU or thrashing in swap may be unable to schedule the ocssd process, so it misses heartbeats even though the machine is technically up. This often appears as a network-heartbeat miss in the logs but is really a performance problem, and the local guardian processes may reboot the node.

Should I increase misscount to stop frequent evictions?

Generally no. Raising misscount usually masks the underlying problem and widens the window in which split-brain could occur. Oracle advises against changing it in most cases. Fix the root cause — interconnect, storage, time sync, or resource starvation — instead.

Have a question or some feedback?

I write here in a personal capacity and enjoy comparing notes with other Oracle folks. Say hello.

Get in touch