As Linux remains the backbone of most enterprise servers, cloud infrastructure, and DevOps pipelines, strong troubleshooting skills are critical for ensuring system reliability, performance, and security. Recruiters must identify professionals skilled in diagnosing and resolving Linux system issues efficiently to minimize downtime and operational risks.
This resource, "100+ Linux Troubleshooting Interview Questions and Answers," is tailored for recruiters to simplify the evaluation process. It covers topics from basic command-line troubleshooting to advanced system diagnostics and performance tuning, including networking, storage, and process management issues.
Whether hiring for System Administrators, DevOps Engineers, or Linux Support Engineers, this guide enables you to assess a candidate’s:
/var/log
), process management (ps
, top
, htop
), disk space issues (df
, du
), and service status (systemctl
, service
).ping
, netstat
, ss
, tcpdump
, traceroute
), performance bottleneck analysis (vmstat
, iostat
, sar
), permissions and ownership issues (chmod
, chown
), and boot/recovery troubleshooting (GRUB, single-user mode).For a streamlined assessment process, consider platforms like WeCP, which allow you to:
✅ Create customized Linux troubleshooting assessments aligned to your infrastructure and support needs.
✅ Include hands-on practical tasks, such as analyzing log files, fixing configuration errors, or resolving connectivity issues within a simulated terminal environment.
✅ Proctor assessments remotely with AI-based integrity safeguards.
✅ Leverage automated grading to evaluate command accuracy, problem-solving approach, and adherence to Linux best practices.
Save time, enhance technical vetting, and confidently hire Linux troubleshooting experts who can maintain system uptime, security, and performance from day one.
The dmesg command in Linux stands for "diagnostic message" and is primarily used to print or control the kernel ring buffer. This ring buffer contains messages from the kernel, especially during system boot-up, and can help in troubleshooting hardware issues, kernel panics, or drivers. The messages include hardware detection logs (e.g., disk drives, network interfaces, USB devices), system resource allocation, and any errors or warnings related to the kernel or hardware devices.
Use Cases and Troubleshooting:
To use dmesg, simply type dmesg in the terminal to see all the system messages. You can also filter for specific issues like:
dmesg | grep -i error
dmesg | grep -i fail
Additionally, dmesg output is often written to system log files, and you can monitor it in real-time using:
dmesg -w
In Linux, the status of a service can be checked using various tools depending on the init system (either systemd, SysVinit, or upstart) in use. In modern Linux distributions that use systemd, the command to check the status of a service is:
systemctl status <service-name>
For example, to check the status of the Apache HTTP server (httpd or apache2), you would run:
systemctl status apache2
This command will provide the current status of the service, including whether it is active (running), inactive (stopped), or failed. Additionally, you can see the last log entries related to the service, which can help in troubleshooting if the service isn’t working as expected.
For older systems using SysVinit, you would use:
service <service-name> status
For example:
service apache2 status
If the service isn't running, you can try to restart it with:
systemctl restart apache2 # Using systemd
service apache2 restart # Using SysVinit
The top command is a real-time system monitoring tool in Linux that provides information about running processes, CPU usage, memory usage, disk I/O, and more. It is particularly useful for troubleshooting performance issues or identifying processes that are consuming excessive resources (CPU, memory, etc.).
When you run top, you get an interactive view of system processes sorted by various resource usage metrics. Some key features include:
To use top for troubleshooting:
top
Logs in Linux are typically stored in the /var/log directory. This directory contains various log files for system events, services, applications, and hardware-related logs.
Some common log files include:
You can use various commands to view and monitor logs:
cat: To display a log file content.
cat /var/log/syslog
tail: To view the most recent lines of a log file. Using -f with tail will allow you to follow logs in real-time:
tail -f /var/log/syslog
grep: To search for specific entries in log files. For example, to check for SSH login attempts:
grep sshd /var/log/auth.log
You can check available disk space using the df (disk free) command. This command provides an overview of disk usage across all mounted filesystems.
df -h
The -h flag displays the disk space in a human-readable format (e.g., KB, MB, GB). The output will show the total space, used space, available space, and mount points for each disk partition.
Example output:
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 12G 36G 25% /
/dev/sdb1 100G 85G 15G 85% /data
If you're investigating space issues on a particular directory, you can use the du command:
du -sh /path/to/directory
This will give you the total size of the specified directory and its contents.
The free command is commonly used to check memory usage in Linux. It displays the total, used, free, shared, buffered, and cached memory on the system.
free -h
The -h flag makes the output human-readable (e.g., in MB or GB). Here's what the columns represent:
Example output:
total used free shared buff/cache available
Mem: 16G 4G 8G 1G 4G 10G
Swap: 2G 0G 2G
You can also use top or htop (a more user-friendly version) for real-time memory usage.
To troubleshoot network connectivity issues, there are several commands that can be useful:
ping: To check basic network connectivity between your machine and another device (IP address or hostname):
ping 8.8.8.8
ping google.com
ifconfig or ip a: To view network interfaces and their status (IP address, subnet mask, etc.).
ifconfig
ip a
netstat: To check open ports and network connections. Use it to verify if the system can connect to specific network services.
netstat -tuln
traceroute: To check the path packets take to reach a destination. It helps identify network bottlenecks or failures.
traceroute google.com
nslookup or dig: To check DNS resolution. This helps verify if DNS issues are causing connectivity problems.
nslookup google.com
dig google.com
The /etc/passwd file is a critical system file in Linux that stores information about all user accounts on the system. Each line in the file represents a user, with fields separated by colons (:). The file contains essential information for user authentication and account management.
The format is as follows:
username:password:UID:GID:comment:home_directory:shell
Example:
john:x:1001:1001:John Doe:/home/john:/bin/bash
This file is crucial for user account management, and any errors in /etc/passwd can cause login issues or system failures.
The ping command is used to test network connectivity between your system and another device (either by IP address or hostname). It works by sending ICMP Echo Request packets to the target and waiting for an ICMP Echo Reply. This helps determine if the target is reachable and if there are any network issues.
Common usages:
Basic Connectivity Check:
ping 8.8.8.8
Hostname Resolution:
ping google.com
To stop pinging, use Ctrl + C.
The ps (process status) command is used to display information about running processes on a Linux system. It provides details such as process IDs (PID), CPU usage, memory usage, user, and the command that started the process.
Common usages:
Display all processes:
ps aux
Filter for a specific process:
ps -ef | grep apache2
In troubleshooting, ps is used to identify rogue processes, find the PID of a process to kill it, or check resource usage.
To restart a service in Linux, the command you use depends on the init system your distribution uses. The most common init systems today are systemd and SysVinit.
For systemd (used by most modern distributions like Ubuntu, CentOS, Fedora, etc.):
sudo systemctl restart <service-name>
For example, to restart the Apache web server (apache2 on Ubuntu or httpd on CentOS):
sudo systemctl restart apache2 # On Ubuntu/Debian
sudo systemctl restart httpd # On CentOS/RHEL
For SysVinit (older systems or some distributions): You can use the service command:
sudo service <service-name> restart
Example:
sudo service apache2 restart
For Upstart (used in some older versions of Ubuntu):
sudo restart <service-name>
Note: Restarting a service typically stops the service and then starts it again. This is useful when you want to reload configuration files, resolve issues, or apply updates to the service.
To check for running processes in Linux, you can use a variety of commands:
ps (process status): The ps command shows the processes running on the system. By default, it shows processes running under the current user.
ps
To see all processes running on the system, including those from other users, use:
ps aux
top: The top command is an interactive, real-time process monitor that displays running processes along with their resource usage (CPU, memory, etc.).
top
htop: htop is an enhanced, user-friendly version of top with a more colorful, interactive interface. It shows similar information, but you can use the arrow keys to navigate through processes and sort them by resource usage.
htop
In Linux, syslog refers to a standard for logging system messages. It is a logging system that captures messages generated by the kernel, system services, and applications. The syslog system is crucial for system monitoring and troubleshooting, as it stores detailed logs of system events, errors, warnings, and other information.
The main points about syslog:
Example command to view syslog:
tail -f /var/log/syslog
To check if a specific port is open on a Linux system, you can use several tools:
ss (Socket Stat): A modern replacement for netstat, it can be used to display open ports and socket connections.
ss -tuln | grep :<port-number>
For example, to check if port 80 (HTTP) is open
ss -tuln | grep :80
netstat (Network Statistics): netstat can also be used to check for open ports. It is more widely used on older systems.
netstat -tuln | grep :<port-number>
lsof (List Open Files): This command lists open files and processes associated with them. It can be used to check if a specific port is in use by a process.
sudo lsof -i :<port-number>
For example, to check if port 22 (SSH) is open:
sudo lsof -i :22
nmap (Network Mapper): A more advanced tool, nmap is used for scanning open ports on your system or remote hosts.
nmap -p <port-number> localhost
To view the system’s uptime in Linux, you can use the following commands:
uptime: The uptime command displays how long the system has been running, as well as the current time, number of users, and system load averages for the last 1, 5, and 15 minutes.
uptime
Example output:
15:32:51 up 3 days, 4:12, 2 users, load average: 0.01, 0.05, 0.02
top: The top command also shows the system’s uptime along with other performance metrics like CPU and memory usage.
top
To check the version of the Linux kernel running on your system, you can use the uname command:
uname -r: This command shows the kernel version, including the major, minor, and patch level.
uname -r
Example output:
5.4.0-66-generic
hostnamectl: On systems using systemd, the hostnamectl command can also show the kernel version along with other system details.
hostnamectl
The free command provides an overview of the system’s memory usage, including total memory, used memory, free memory, shared memory, memory used by buffers and cache, and swap memory usage.
Here is the breakdown of the free command output:
To display memory usage in a human-readable format:
free -h
Example output:
total used free shared buff/cache available
Mem: 16G 4G 8G 1G 4G 10G
Swap: 2G 0G 2G
To troubleshoot an application that is not starting on Linux, follow these steps:
Standard output/error: Run the application from the terminal to see if it produces any errors:
./myapp
If the application is running as a service, use systemctl status to check if the service is failing:
systemctl status <service-name>
To check which user is currently logged into the system, you can use the following commands:
who: The who command shows information about users who are currently logged in.
who
Example output:
john tty1 2024-11-20 09:33 (:0)
w: The w command provides more detailed information about who is logged in and what they are doing, such as the idle time and the current processes.
w
Example output:
15:43:47 up 1 day, 3:22, 2 users, load average: 0.05, 0.10, 0.15
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
john tty1 :0 09:33 1.00s 0.13s 0.01s -bash
whoami: If you want to check the current user (the one you're logged in as), simply use:
whoami
To check CPU usage in Linux, you can use several commands:
top: The top command provides a real-time display of CPU usage, including per-process CPU usage.
top
mpstat: The mpstat command provides detailed CPU usage statistics. For example, to display CPU usage for all cores:
mpstat -P ALL
vmstat: The vmstat command shows CPU usage along with memory and process information.
vmstat 1
sar: The sar command (System Activity Report) can also be used to monitor CPU usage over time.
sar -u 1 5
The /var/log directory in Linux is where most of the system’s log files are stored. Logs are crucial for troubleshooting, monitoring system health, and auditing user activities. These logs capture important information about the system, kernel, services, and applications.
Key points about /var/log:
Logs can be checked using tools like cat, less, tail -f, or grep to filter log entries based on specific criteria (e.g., errors or warnings).
Example to view logs in real-time:
tail -f /var/log/syslog
In Linux, links allow you to create references to files, and there are two types: hard links and soft links (also known as symbolic links).
Hard Links:
Creating a hard link:
ln <original-file> <hard-link-name>
Soft (Symbolic) Links:
Creating a soft link:
ln -s <target-file> <symlink-name>
Example:
ln -s /usr/bin/python3 /usr/bin/python
To troubleshoot hardware-related issues on Linux, you can use several tools to gather information about hardware components and check for errors:
dmesg: The dmesg command prints kernel messages, which often include hardware detection information and errors. It is helpful for identifying problems related to drivers, devices, or hardware initialization.
dmesg | grep -i error
lspci: This command lists all PCI devices (e.g., network cards, video cards, etc.) connected to your system.
lspci
lsusb: Use this to list USB devices connected to your system. This is useful for troubleshooting USB-related hardware issues.
lsusb
lshw: This command provides detailed information about all hardware components, including CPU, memory, disk, and network interfaces. It may require sudo to show all details.
sudo lshw
smartctl: For checking the health of hard drives or SSDs, the smartctl command (part of smartmontools) can show S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) data, which can help detect impending disk failures.
sudo smartctl -a /dev/sda
inxi: This is a powerful tool for displaying detailed information about your system hardware, including CPU, memory, and storage devices.
inxi -Fxz
To troubleshoot DNS resolution issues, follow these steps:
Check DNS configuration: Ensure the /etc/resolv.conf file contains the correct DNS servers. The file should have lines like:
nameserver 8.8.8.8
nameserver 8.8.4.4
Test DNS with nslookup or dig: Use the nslookup or dig command to query DNS servers directly and check if they resolve domain names correctly.
nslookup google.com
dig google.com
Check if systemd-resolved is running (for systems using systemd): If your system uses systemd, check if the systemd-resolved service is active:
systemctl status systemd-resolved
Check the firewall and routing: Ensure that your firewall settings or network routes are not blocking DNS queries. Use the iptables command to review firewall settings:
sudo iptables -L
Check /etc/nsswitch.conf: This file determines how various databases, including DNS, are resolved. Ensure that the hosts line looks like this:
hosts: files dns
The ifconfig (interface configuration) command is used to configure or display network interfaces in Linux. It shows the current state of the network interfaces, such as IP addresses, MAC addresses, network statistics, and more.
Some common uses of ifconfig:
Display network interfaces and their IP addresses:
ifconfig
Bring an interface up or down: To bring an interface up (e.g., eth0):
sudo ifconfig eth0 up
To bring an interface down:
sudo ifconfig eth0 down
Assign an IP address to an interface:
sudo ifconfig eth0 192.168.1.10
View network statistics:
ifconfig eth0
Note: In modern Linux systems, ifconfig is being replaced by the ip command, which is more powerful and flexible. For example, ip addr shows IP addresses:
ip addr show
The lsof command (List Open Files) is used to list information about files that are currently open by processes. In Linux, almost everything is treated as a file (e.g., devices, network connections, directories, etc.), so lsof is a powerful tool for finding open files, checking which process is using a file, and troubleshooting issues.
Common use cases for lsof:
List all open files:
lsof
List files opened by a specific user:
lsof -u username
Check which process is using a specific file:
lsof /path/to/file
Check for open network connections:
lsof -i
Find processes that are using a specific port:
lsof -i :8080
To view the system’s processes in real-time, use the top or htop commands:
top: The top command shows real-time information about system processes, including CPU and memory usage, as well as the process IDs (PIDs). It updates every few seconds.
top
htop: htop is an enhanced, user-friendly version of top with a colorful, interactive interface. It is more convenient and provides better process management features.
htop
To check for disk errors, use the following tools:
dmesg: Look for disk-related errors in the kernel logs.
dmesg | grep -i error
smartctl: Check the SMART status of a disk using the smartctl tool (part of the smartmontools package). This tool gives an early warning about potential disk failures.
sudo smartctl -a /dev/sda
fsck: Use fsck (file system check) to check and repair filesystem errors. This is useful if you suspect corruption on the filesystem.
sudo fsck /dev/sda1
badblocks: Scan a disk for bad sectors.
sudo badblocks -v /dev/sda
To list all installed packages on a Linux system, use the following package manager commands:
On Debian/Ubuntu-based systems:
dpkg --get-selections
Or using apt:
apt list --installed
On Red Hat/CentOS-based systems:
rpm -qa
Or using yum or dnf:
yum list installed
or
dnf list installed
The chmod command is used to change the permissions of files or directories in Linux. File permissions determine who can read, write, or execute a file. Permissions are represented by three categories: owner, group, and others.
Syntax:
chmod [permissions] <file>
Examples:
Grant read, write, execute to the owner and read-only to group and others:
chmod 744 myfile
Grant execute permission to everyone:
chmod +x myfile
To troubleshoot an application consuming excessive memory, you can take the following steps:
top
ps aux --sort=-%mem
free -h
ulimit -a
dmesg | grep -i oom
To check the system’s hostname in Linux, you can use the following commands:
hostname: The simplest way to display the hostname is by running the hostname command:
hostname
hostnamectl: On systems using systemd, the hostnamectl command can provide detailed information about the hostname and other related settings.
hostnamectl
Check /etc/hostname: The system hostname is usually stored in the /etc/hostname file, and you can view it using cat:
cat /etc/hostname
Check /etc/hosts: You can also check /etc/hosts for mappings of the system's hostname to IP addresses:
cat /etc/hosts
To disable a service from starting at boot, you can use the following commands, depending on your system's init system (SysVinit vs. systemd):
For systems using systemd: Use systemctl to disable the service from starting automatically at boot.
sudo systemctl disable <service-name>
Example to disable Apache:
sudo systemctl disable apache2
To ensure the service is not running, you can stop it with:
sudo systemctl stop <service-name>
For systems using SysVinit (older systems): Use chkconfig or update-rc.d to disable services.
sudo chkconfig <service-name> off
Or for Debian-based systems:
sudo update-rc.d <service-name> disable
The /etc/fstab file in Linux defines how disk partitions, file systems, and other devices should be mounted at boot time. It contains information such as the device name, mount point, file system type, and mount options.
Format of /etc/fstab:
<device> <mount-point> <filesystem-type> <options> <dump> <pass>
Example entry:
/dev/sda1 / ext4 defaults 1 1
The /etc/fstab file allows for automatic mounting of devices and file systems without manual intervention after boot.
To check the status of SELinux (Security-Enhanced Linux) on your system:
Check with sestatus: The sestatus command provides detailed information about SELinux status.
sestatus
Check the /etc/selinux/config file: You can also check the SELinux configuration file to see if it’s set to enforcing, permissive, or disabled.
cat /etc/selinux/config
To resolve a full disk issue, follow these steps:
Check disk usage with df: Use the df command to view disk usage for all mounted file systems.
df -h
Identify large files or directories: Use the du (disk usage) command to identify large files or directories.
sudo du -sh /var/*
find / -type f -size +100M
sudo apt-get clean
sudo yum clean all
To find files in Linux, you can use the find command. The syntax is as follows:
find <path> -name <filename>
For example, to search for a file named myfile.txt starting from the root directory:
find / -name myfile.txt
Common find options:
Example to delete files older than 7 days:
find /path/to/directory -type f -mtime +7 -exec rm -f {} \;
To check if a process is running in the background, you can use the following methods:
Use ps: The ps command displays the status of running processes. To see all processes, including background jobs:
ps aux
Use jobs: If you started a process in the current shell session, you can use the jobs command to view background jobs.
jobs
Use pgrep: If you know the name of the process, you can use pgrep to check if it’s running. It will return the PID if the process is running.
pgrep <process-name>
To view detailed hardware information on a Linux system, you can use several commands:
lshw: The lshw (list hardware) command provides detailed information about the system’s hardware components, such as CPU, memory, storage devices, and network interfaces
sudo lshw
lscpu: The lscpu command displays information about the CPU architecture.
lscpu
lsblk: Use lsblk to list all block devices (like hard drives, SSDs, partitions).bash
lsblk
inxi: The inxi command is a comprehensive tool to display system hardware information in a human-readable format.
inxi -Fxz
To check the system’s IP address:
Use ip command: The ip command is the modern way to check the system’s IP address.
ip addr show
Use ifconfig (deprecated on some systems): On older systems or systems not using ip, the ifconfig command can display IP information.
ifconfig
Check with hostname command: To view the system’s IP address associated with its hostname, use:
hostname -I
This command will return the IP address assigned to the machine.
To troubleshoot a server that is running slow in Linux, follow these steps:
CPU: Use top, htop, or mpstat (part of the sysstat package) to check for high CPU usage. Look for processes consuming more than their fair share of CPU.
top
Memory: Check memory usage with free or top. If the system is using swap space heavily, it could indicate memory pressure.
free -h
Disk I/O: Use iostat (also part of sysstat) to check for disk I/O bottlenecks. High wait times on disks can slow down the server.
iostat -xz 1
Check running processes: Use ps aux to find processes consuming excessive resources. Sort by CPU or memory usage.
ps aux --sort=-%cpu
Check disk space: Use df -h to check if the disk is full, especially the root (/) partition or /var where logs and other files might accumulate.
df -h
In Linux, system logs are essential for troubleshooting, and you can analyze and manage them using the following tools:
journalctl: For systems using systemd, logs are stored in the systemd journal, which you can access with journalctl. You can filter logs by service, date, or severity.
journalctl -u <service-name> # View logs for a specific service
journalctl --since "2024-11-01" # View logs since a certain date
journalctl -p err..alert # View logs with errors and more critical levels
Use cat, less, or grep to analyze these logs:
less /var/log/syslog
grep "error" /var/log/syslog
When a network interface is down on a Linux system, follow these steps:
Check the status of the network interface: Use the ip or ifconfig command to check the status of the network interface.
ip link show eth0
Bring the interface up: If the interface is down, try bringing it back up using ip or ifconfig.
sudo ip link set eth0 up
Or
sudo ifconfig eth0 up
Check network configuration: Verify the network interface's IP address, netmask, gateway, and DNS settings. Use ip addr or ifconfig to check the IP address configuration.
ip addr show eth0
Check the status of network services: Ensure that networking services are running (NetworkManager or systemd-networkd).
sudo systemctl status NetworkManager
Check for hardware issues: If the interface is still down, check for hardware-related issues with the dmesg or lspci commands.
dmesg | grep eth0
Check firewall settings: Ensure that no firewall rules are blocking the network interface.
sudo iptables -L
To troubleshoot a service failing to start on boot, follow these steps:
Check the service status: Use systemctl to check the status of the service and see if it provides any errors.
sudo systemctl status <service-name>
Examine logs: Check the logs for error messages. Use journalctl to view logs specific to that service.
journalctl -u <service-name>
Manually start the service: Try to start the service manually and observe any error messages.
sudo systemctl start <service-name>
To debug a stuck or unresponsive process in Linux:
Check the process status: Use ps to check the process's current status.
ps aux | grep <process-name>
Use strace: Attach strace to the process to trace system calls and signals. This can provide insight into where the process is hanging.
strace -p <pid>
Kill the process: If all else fails, use kill or kill -9 to terminate the process.
kill -9 <pid>
strace is a diagnostic tool used to trace system calls and signals in a running process. It helps you understand what a program is doing behind the scenes, which is particularly useful for troubleshooting.
Trace a running process: Attach strace to a running process to see the system calls it is making.
strace -p <pid>
Trace a program from the start: You can also use strace to start a program and trace its execution from the beginning.
strace <command>
Redirect output to a file: To capture the trace output for further analysis, redirect it to a file.
strace -o trace.log <command>
Filter specific system calls: Use the -e option to filter specific system calls. For example, to trace only file-related system calls:
strace -e trace=file <command>
To investigate high CPU usage on a Linux server:
Use top or htop: Check which processes are consuming the most CPU. Sort by CPU usage with top.
top
Use ps with sorting: List processes sorted by CPU usage.
ps aux --sort=-%cpu | head
journalctl is used to query and display logs from the systemd journal, which collects logs from system services, the kernel, and other sources. It's an important tool for diagnosing issues in a systemd-based Linux environment.
View all logs:
journalctl
View logs for a specific service:
journalctl -u <service-name>
Filter by time: View logs since a specific date:
journalctl --since "2024-11-01"
Show logs in real-time:
journalctl -f
Filter by severity level: You can filter logs by severity levels (emerg, alert, crit, err, etc.):
journalctl -p err..alert
View logs for the current boot:
journalctl -b
To check the integrity of a file system in Linux, use the following steps:
Use fsck (File System Check): The fsck command checks and repairs file system inconsistencies. First, unmount the file system (if possible) and then run fsck.
sudo fsck /dev/sda1
Check file system with dmesg: You can also check dmesg for file system-related errors or corruption messages.
dmesg | grep -i ext4
File descriptor issues in Linux can occur when a process runs out of available file descriptors, causing it to fail to open new files or sockets. Here's how to identify and resolve such issues:
Check file descriptor usage: Use the lsof command to list open files and their corresponding file descriptors.
lsof
Check the limits for open files: The number of file descriptors a process can use is limited by system settings. Check the current limits using ulimit.
ulimit -n # Shows the maximum number of open file descriptors
Increase the limit: If the limit is too low, increase the maximum number of open file descriptors using ulimit.
ulimit -n 65536 # Temporarily increase the limit
The netstat (network statistics) command is a tool used to display various network-related information, such as active connections, listening ports, routing tables, interface statistics, and multicast memberships. This is useful for diagnosing networking issues.
Check active network connections:
netstat -tuln # Shows all active listening ports and their respective processes
Check established connections:
netstat -tn # Show only TCP connections
Check network routing table:
netstat -r
Display network interface statistics:
netstat -i
Check for any open ports and which process is using them:
netstat -tulpen # Includes process ID (PID) of services listening on ports
netstat is now considered deprecated in favor of the ss (socket statistics) command, which is more efficient and provides similar functionality.
Excessive I/O wait can indicate disk bottlenecks or problems with storage devices. Here's how to identify it:
Use top or htop: The %wa (I/O wait) column in the top command shows the percentage of CPU time spent waiting for I/O operations to complete. If this value is high, it indicates I/O bottlenecks.
top
Use iostat: The iostat command (from the sysstat package) provides more detailed statistics on I/O performance and system load. Specifically, the %iowait field shows the percentage of time the CPU is waiting for I/O operations.
iostat -x 1
Check for slow disks using dstat: The dstat command provides real-time statistics for various system resources, including disk activity.
dstat -d
Check disk activity using iotop: The iotop command (similar to top but for I/O usage) allows you to monitor real-time disk activity. It shows which processes are causing high disk I/O.
sudo iotop
Examine dmesg logs: Check the dmesg logs for any disk errors or issues with your storage devices.
dmesg | grep -i error
To monitor and troubleshoot network traffic in Linux, you can use several tools:
iftop: iftop is a real-time command-line utility that displays bandwidth usage on an interface. It helps you identify which processes or IP addresses are using the most bandwidth.
sudo iftop
netstat: As mentioned earlier, netstat can be used to display network connections, open ports, and network statistics.
netstat -tuln
nload: nload is another command-line tool that provides real-time traffic statistics for incoming and outgoing network traffic.
sudo nload
tcpdump: tcpdump is a powerful tool for capturing and analyzing network packets in real time. It allows you to see detailed network traffic and diagnose issues like dropped packets or improper packet routing.
sudo tcpdump -i eth0
ping and traceroute: Use ping to check network connectivity and traceroute to identify where packet loss or delays occur in the network path.
ping 8.8.8.8
traceroute google.com
ss: ss (socket statistics) is a modern alternative to netstat and is used to display detailed information about sockets and network connections.
ss -tuln
When troubleshooting a failed SSH connection:
Check the SSH service status: Ensure the SSH service (sshd) is running:
sudo systemctl status sshd
Review SSH logs: Check the /var/log/auth.log or /var/log/secure logs for any error messages related to SSH login attempts.
tail -f /var/log/auth.log
Verify SSH port and firewall: Ensure that SSH is running on the correct port (default: 22) and that no firewall (e.g., ufw or iptables) is blocking it.
sudo ufw status
sudo iptables -L
Test network connectivity: Ensure that the server is reachable via ping or other network tests.
ping <server-ip>
Check for correct SSH configuration: Inspect the /etc/ssh/sshd_config file for any misconfiguration that might prevent connections, such as incorrect PermitRootLogin or PasswordAuthentication settings.
sudo nano /etc/ssh/sshd_config
To manage services with systemd, use the systemctl command:
Check service status: To check the status of a service:
sudo systemctl status <service-name>
Start a service:
sudo systemctl start <service-name>
Stop a service:
sudo systemctl stop <service-name>
Restart a service:
sudo systemctl restart <service-name>
Enable a service to start at boot:
sudo systemctl enable <service-name>
Disable a service from starting at boot:
sudo systemctl disable <service-name>
View service logs: Use journalctl to view logs for a specific service:
journalctl -u <service-name>
List all active services:
sudo systemctl list-units --type=service
To troubleshoot a systemd service that is failing to start:
Check the service status: Use systemctl to get detailed information about the service:
sudo systemctl status <service-name>
Review service logs with journalctl: Use journalctl to check logs for error messages related to the service.
journalctl -u <service-name>
Check for missing dependencies: Verify that any dependent services are running using:
sudo systemctl list-dependencies <service-name>
Test starting the service manually: Attempt to start the service manually and watch for any error messages.
sudo systemctl start <service-name>
A coredump is a file that captures the memory of a running process when it crashes. It helps developers analyze the state of a program at the time of the crash, including stack traces, memory contents, and more.
Enable coredumps: Ensure coredumps are enabled. Check the current ulimit for core files:
ulimit -c
If it's set to 0, you can increase the size limit:
ulimit -c unlimited
Analyze a coredump: To analyze a coredump, use gdb (GNU Debugger) to load the executable and the core file:
gdb /path/to/executable /path/to/corefile
Configure coredump handling: On modern systems using systemd, coredump handling is managed through the coredump.conf configuration file. You can configure where to store coredumps, the size limits, and more.
sudo nano /etc/systemd/coredump.conf
Use strace: Run the application with strace to trace system calls and identify what the application was doing when it crashed.
strace -f /path/to/application
Run under a debugger: Use gdb to run the application and catch crashes in real-time. This will allow you to inspect the stack trace and variables.
gdb /path/to/application
run
To analyze performance bottlenecks in Linux, follow these steps:
Check disk I/O performance: Use iostat, iotop, or dstat to analyze disk activity and identify I/O bottlenecks.
iostat -xz 1
Analyze memory usage: Use free, vmstat, or top to check if the system is swapping or using excessive memory. Swapping can cause significant performance degradation.
free -h
To trace the system calls made by a process, you can use tools like strace and ltrace.
Trace an already running process by its PID:
strace -p <PID>
Start a new process with strace to trace its system calls:
strace -f -o output.txt /path/to/your/application
Filter specific system calls:
strace -e trace=open,read,write -p <PID>
Trace a process's library calls:
ltrace -p <PID>
To disable a service temporarily, you can stop it or prevent it from starting automatically at boot using systemd or service commands.
Stop a service temporarily (it won't start again until you manually start it):
sudo systemctl stop <service-name>
Disable a service from starting at boot:
sudo systemctl disable <service-name>
To enable it again at boot:
sudo systemctl enable <service-name>
To start the service again:
sudo systemctl start <service-name>
To stop a service temporarily:
sudo service <service-name> stop
To disable it at boot:
sudo service <service-name> disable
vmstat (virtual memory statistics) is a command-line tool used to report information about processes, memory, paging, block IO, traps, and CPU activity. It’s extremely useful for diagnosing system performance issues, particularly those related to memory and CPU utilization.
General syntax:
vmstat [delay [count]]
Example output:
vmstat 1 5
A kernel panic is a critical error in the Linux kernel that usually results in the system halting. Troubleshooting kernel panics involves the following steps:
Kernel panic messages are often logged in dmesg or /var/log/messages.
dmesg | grep -i panic
Alternatively, check the full system log for kernel messages:
sudo less /var/log/syslog
If kernel crash dumps are enabled, analyze the core dump using kdump and crash.
sudo crash /var/crash/vmcore /usr/lib/debug/boot/vmlinux-<version>
Kernel panics can be caused by recent updates or changes (e.g., updated drivers or kernel modules). Try booting with a previous kernel version to check if the panic persists.
sudo grub2-reboot <previous_kernel_version>
Swap space is used to extend the physical memory by using disk space when RAM is full. To manage swap space in Linux:
Use free to check swap usage:
free -h
Or, use swapon to display swap devices and their usage:
swapon --show
To add a swap file:
sudo dd if=/dev/zero of=/swapfile bs=1M count=1024 # 1 GB swap file
sudo mkswap /swapfile
sudo swapon /swapfile
To make the change permanent, add it to /etc/fstab:
/swapfile none swap sw 0 0
To resize swap partitions, first turn off swap:
sudo swapoff /dev/sdX
Resize the partition using gparted or fdisk, and then reactivate swap:
sudo mkswap /dev/sdX
sudo swapon /dev/sdX
To disable swap:
sudo swapoff /swapfile
LVM (Logical Volume Management) allows for flexible disk management by abstracting physical storage devices into logical volumes. It provides the ability to resize, extend, and manage storage volumes dynamically.
Key benefits of LVM:
Basic commands:
View existing LVM setup:
sudo lvdisplay
sudo vgdisplay
sudo pvdisplay
Extend a logical volume:
sudo lvextend -L +10G /dev/vg_name/lv_name
To extend or shrink an LVM volume, follow these steps:
First, ensure there is free space in the volume group:
sudo vgdisplay
To extend the logical volume (e.g., adding 10 GB):
sudo lvextend -L +10G /dev/vg_name/lv_name
After extending, resize the filesystem to use the new space:
sudo resize2fs /dev/vg_name/lv_name
First, reduce the filesystem size before shrinking the LV:
sudo resize2fs /dev/vg_name/lv_name 20G
Then, reduce the logical volume size:
sudo lvreduce -L 20G /dev/vg_name/lv_name
To configure and troubleshoot NFS issues:
Install the NFS server package:
sudo apt-get install nfs-kernel-server
Edit /etc/exports to share a directory:
sudo nano /etc/exports
/srv/nfs *(rw,sync,no_subtree_check)
Export the NFS shares:
sudo exportfs -a
Check if NFS server is running:
sudo systemctl status nfs-kernel-server
Test NFS mount:
sudo mount -t nfs server_ip:/srv/nfs /mnt
Check firewall settings: Ensure NFS-related ports (2049 for NFS, 111 for rpcbind) are open in the firewall:
sudo ufw allow from <client_ip> to any port nfs
To identify and resolve package dependency issues:
Check for missing dependencies:
sudo apt-get check
Fix broken packages:
sudo apt-get install -f
Check for dependency issues:
sudo yum check
Resolve dependencies:
sudo yum install <package-name>
For dpkg:
sudo dpkg --configure -a
For rpm:
sudo rpm --rebuilddb
To troubleshoot a slow network connection:
Check for high network traffic: Use iftop or nload to monitor real-time network traffic and identify heavy usage.
sudo iftop
Test with ping: Test network latency to the destination server:
ping <destination_ip>
Use traceroute: Identify where delays occur along the network path.
traceroute <destination_ip>
Check for network interface issues: Use ethtool to examine the interface speed and settings.
sudo ethtool eth0
When investigating a DNS issue, you should follow these steps:
Verify the contents of /etc/resolv.conf to ensure the correct DNS server is configured.
cat /etc/resolv.conf
Ensure the system can reach the DNS server. Use ping to check if the DNS server is reachable.
ping <DNS_server_IP>
Use nslookup or dig to query DNS servers directly:bash
nslookup google.com
dig google.com
The /etc/nsswitch.conf file controls the order in which services like DNS, local files, and NIS are used for name resolution. Ensure it’s properly configured:
cat /etc/nsswitch.conf
If you suspect the DNS server is the issue, temporarily use a public DNS server (e.g., Google’s DNS at 8.8.8.8 or Cloudflare's DNS at 1.1.1.1) and test the resolution:
sudo nano /etc/resolv.conf
# Add nameserver 8.8.8.8
Ensure that DNS queries (UDP port 53) are not blocked by the firewall. You can check the firewall status with:
sudo firewall-cmd --state
sudo iptables -L
Review system logs for any DNS-related errors.
journalctl -xe | grep dns
If a system is running out of memory, the following steps can help:
Use free, top, or htop to check memory and swap usage.
free -h
Use top or htop to identify processes consuming large amounts of memory.
top
Ensure that swap space is enabled and properly used. If not, consider adding more swap.
swapon --show
free -h
Modify the overcommit memory settings to control how the kernel handles memory allocation.
echo "vm.overcommit_memory=2" >> /etc/sysctl.conf
sudo sysctl -p
Firewalld is the default firewall management tool on many modern Linux distributions. To check its status:
sudo systemctl status firewalld
You can also check the runtime configuration with:
sudo firewall-cmd --state
iptables is used for managing network traffic filtering and can be checked using:
sudo iptables -L
For detailed logging, use:
sudo iptables -L -v
List active firewalld rules:
sudo firewall-cmd --list-all
To ensure a process is restarted automatically if it fails, use systemd to manage the service. systemd provides mechanisms to restart services automatically upon failure.
Add or modify the Restart option in the [Service] section of the service configuration file:
[Service]
Restart=always
RestartSec=5
Reload and restart the service:
sudo systemctl daemon-reload
sudo systemctl restart myapp.service
Enable the service to start at boot:
sudo systemctl enable myapp.service
When troubleshooting a slow database, you can use the following tools:
top
Unlike file systems like NTFS in Windows, Linux file systems like ext4, XFS, and Btrfs handle fragmentation automatically and do not generally suffer from the same issues.
However, if you suspect fragmentation issues, here’s how to approach it:
For ext4, you can use the e2fsck tool:
sudo e2fsck -f /dev/sdX
You can defragment files with e4defrag:
sudo e4defrag /dev/sdX
The /etc/hosts file is used to map IP addresses to hostnames locally on the system. It is one of the first places the system checks when resolving domain names.
Format: The file consists of IP addresses followed by hostnames and optional aliases:
127.0.0.1 localhost
192.168.1.10 myserver.local myserver
Identify the PID of the application using ps or top.
ps aux | grep <application_name>
Attach gdb to the running process:
sudo gdb -p <PID>
Once in gdb, use the backtrace command to get a stack trace of the application:
(gdb) backtrace
Use mount or df to check if the file system is mounted properly:
mount | grep /dev/sdX
df -h
Use dmesg or journalctl to look for any disk-related errors:
dmesg | grep -i error
If the file system is corrupted, unmount it and run fsck to fix the issues:
sudo umount /dev/sdX
sudo fsck /dev/sdX
Ensure the file system has enough free space to operate:
df -h
Kernel panics are severe errors in the Linux kernel that prevent the system from continuing operation. To resolve a kernel panic, follow these steps:
Review kernel panic logs using dmesg or system logs:
dmesg | less
journalctl -xe | grep -i panic
Kernel panics can be caused by hardware failures, such as bad RAM or failing hard drives. Run hardware diagnostic tools (e.g., memtest86 for memory, SMART tools for hard drive health):
sudo smartctl -a /dev/sda
sudo memtest86+
Ensure your system is running the latest stable kernel and drivers, as kernel bugs are often fixed in newer versions.
sudo apt update && sudo apt upgrade
sudo apt-get install linux-image-<version>
To troubleshoot intermittent network connectivity issues:
Ensure the network interface is up and running using:
ip link show
ifconfig
Use ping to check if the server is losing packets or experiencing high latency:
ping -c 10 <destination_ip>
Check the system logs for network-related messages:
journalctl -xe | grep -i network
dmesg | grep -i eth
Use ethtool to monitor network interface status and check for flapping (i.e., frequent up/down events).
sudo ethtool eth0
Check for active network connections and open ports to ensure the server is listening properly:
netstat -tuln
ss -tuln
Use traceroute or mtr to identify any hops causing delays or packet loss.
traceroute <destination_ip>
Ensure that iptables or firewalld is not blocking necessary traffic.
sudo iptables -L
sudo firewall-cmd --state
To identify and mitigate disk I/O bottlenecks:
Use iostat to monitor disk performance, including read/write operations and disk utilization.
iostat -x 1
iotop helps identify processes that are causing high disk I/O.
sudo iotop -o
Use vmstat to monitor virtual memory, processes, and I/O activity.
vmstat 1
Run smartctl to check for disk errors and SMART status.
sudo smartctl -a /dev/sda
Ensure that your file system is not overfilled, as it can lead to performance degradation.
df -h
Use the appropriate I/O scheduler for your workload. For example, deadline or noop can be better for SSDs, while cfq may be better for spinning disks.
sudo nano /sys/block/sda/queue/scheduler
To troubleshoot memory leaks:
Use tools like top, htop, or free to track memory usage over time.
free -h
top
valgrind can detect memory leaks and improper memory usage.
valgrind --leak-check=full ./your_application
Enable core dumps to capture application state when it crashes, then analyze the dump with gdb.
ulimit -c unlimited
gdb ./your_application core
pmap provides a detailed view of memory allocation for a process.
pmap -x <PID>
dstat is a versatile tool that provides real-time performance statistics, helping to diagnose bottlenecks in various subsystems like CPU, memory, I/O, and networking.
Run dstat with no arguments to display the default set of statistics.
dstat
Use flags to monitor specific resources. For example, to monitor CPU, memory, and disk I/O:
dstat -cdng
You can monitor network throughput with:
dstat -n
Combine options to focus on particular metrics and customize the time intervals for output.
dstat -tcdng --output /tmp/dstat_output.csv
Attach gdb to the process and use info threads to examine the threads:
gdb -p <PID>
(gdb) info threads
Once you identify a problematic thread, use thread <thread_id> to fo
(gdb) thread 2
Trace system calls made by the application, which can reveal issues like deadlocks or resource contention.
strace -p <PID>
Memory management issues like race conditions and uninitialized memory can often be revealed by running the application under valgrind.
valgrind --tool=memcheck ./your_app
The load average represents the average system load over the last 1, 5, and 15 minutes:
uptime
top
Use ls -l to inspect file permissions and ownership:
ls -l /path/to/file
Correct file ownership with chown:
sudo chown user:group /path/to/file
Modify permissions with chmod:
sudo chmod 755 /path/to/file
Files may have extended ACLs that override traditional permissions. Use getfacl to check ACLs:
getfacl /path/to/file
If SELinux is enabled, check if security contexts are causing access restrictions:
ls -Z /path/to/file
Verify SELinux status with getenforce:
getenforce
Check the logs for SELinux-related denials in /var/log/audit/audit.log:
ausearch -m avc -ts recent
Use sealert to analyze and suggest solutions for SELinux denials:
sealert -a /var/log/audit/audit.log