Occasionally a Juniper SRX device running Junos will have a high CPU. Here are some tips for troubleshooting these incidents.
Check the routing engine (control plane). Check the CPU status by doing
show chassis routing-engine.
Above you can see that the CPUs are 4% idle which means it’s 96% utilized. I would say anything over 90% is considered bad. Once the CPU gets gets to 100% utilization it will start dropping packets and possibly overheating.
Next you want to look further and see what processes are running high. Do this with the command
show system processes extensive.
Usually even under good conditions, there will be processes that are running at well over 100% utilization. The Junos does a terrible job at adding in this case, something to do with multi core processors confusing the output. You can do
start shell then
top -H to see the actual utilization per core.
Analyze the processes
Now that you know what processes are running high, we can look into why it’s causing it.
If you see the process
httpd as one of the first three processes with the highest CPU, chances are the web UI is having issues and needs to be restarted. Restarting this process only impacts any user that are currently in the web UI of this SRX.
To restart the httpd process run the following command:
This will immediately restart the process without confirmation. After doing so, look at
show chassis routing-engine over and over to see if the percent idle has gone up over 30%. If so, that has fixed your problem.
I very frequently see this process get stuck at a high percent. I’m not sure what causes it, but the fix is quick and easy so that’s nice.
If the process
eventd is running high (over 20%) then this is probably something worth looking into. This process handles the events on the Juniper device itself which includes:
- Storing internal syslog messages
- Sending syslog messages to another system
- Sending/responding to SNMP traps/polls
- Sampling handling
- Traceoptions handling
If this is running high check if any of the above are turned on a little too high. Perhaps too many traceoptions are on, or too much sampling is turned on. Try turning these off and see if the CPU goes back to normal.
There are two modes for syslogs, event and stream. Perhaps changing it to stream will reduce the CPU utilization.
flowd_octeon seems to always run over 200%. This is normal. Usually this isn’t the problem and try looking at the next highest CPU hog as the culprit.
This processes is responsible for packet handling, data processing, or flow processing. The flow processing is all done on the data plane.
Check the packet forwarding engine (data plane)
The following two commands shows us what’s happening on the data plane.
If the CPU utilization here is low, then you don’t have a problem with the data plane.
The data plane (aka forwarding plane) is where the SRX decides what to do with the packet. This is where the SRX looks at the forwarding table and routing table to determine where to send the packet. If your CPU here is high, then it’s possible you are reaching the capacity of this device. Start looking at things like how many packets and bytes each interface is receiving and comparing it with the model specifications.
To examine the throughput of each interface use the following command:
show interfaces detail | match "link is Up| bps| pps" | except "0 bps|0 pps"
To examine the number of sessions use the following command:
show security flow statistics
Check the model for limitations here: