What I Found in 10 Linux Server Audits: Common Vulnerabilities
By Arun Valecha, AV Services | Linux Infrastructure Expert since 1999
Before I Begin
In 25 years of managing Linux servers for businesses across India, the US, and Europe, I have conducted hundreds of infrastructure audits. The servers belong to funded startups, D2C brands, digital agencies, manufacturing companies, and professional services firms. The industries are different. The tech stacks are different. The cities are different.
The vulnerabilities are almost always the same.
This article documents what I actually found across ten recent audits — anonymised, but unembellished. No theoretical risks. No academic CVE lists. Real findings from real production servers that real businesses were running, believing them to be reasonably secure.
Some of what follows will be uncomfortable reading if you are running a Linux server without active management. That is intentional. The purpose of this article is not to alarm you. It is to give you a precise picture of what neglected Linux infrastructure actually looks like from the inside — so you can make an informed decision about what to do about it.
If you recognise your own server in any of these findings, that is a good sign. It means you are paying attention. The businesses that should worry are the ones reading this and assuming it does not apply to them.
How These Audits Work
Each audit I conduct follows a consistent methodology. I request read-only SSH access — no write permissions, no root access during the audit phase. I examine the OS and kernel version, installed packages and their patch status, running services and open ports, user accounts and SSH configuration, firewall rules, cron jobs, backup configuration, and log files covering the preceding 30 to 90 days.
The audit takes between 4 and 8 hours depending on the complexity of the environment. The client receives a written report within 5 business days, with findings categorised by severity: Critical, High, Medium, and Low.
Across the ten audits referenced in this article, here is what I found.
Finding 1 — Root SSH Login Enabled on 8 of 10 Servers
This is the finding that surprises people the most, because it is also the most basic.
Eight out of ten servers had SSH configured to permit direct root login. This means anyone attempting to brute-force SSH access only needs to crack the password for a single, universally known username — root — to gain complete administrative control over the server.
The default configuration of the official WordPress Docker image, the default DigitalOcean droplet, the default AWS EC2 instance — none of them disable root SSH login out of the box. Most system administrators set up the server, do what they came to do, and move on without addressing this.
What I found in practice: on three of these eight servers, the root account was protected only by a password. No SSH key requirement. Passwords ranging from 8 to 12 characters, set at server provisioning and never changed.
On one server, the auth.log showed 47,000 failed root login attempts over 30 days from 23 distinct IP addresses. The server owner had no idea. They had never looked at the auth.log.
The fix is straightforward — disable PasswordAuthentication, enforce key-based login, and set PermitRootLogin to no in sshd_config. It takes 10 minutes. It closes one of the most exploited attack vectors in existence.
Finding 2 — Unpatched Kernels Running in Production
Seven of ten servers were running kernels that were between 6 months and 3 years behind the current stable release for their distribution.
This is not a theoretical risk. Kernel vulnerabilities with public CVEs and working exploits exist for every major release cycle. Running an unpatched kernel in production is the server equivalent of leaving your front door unlocked because you have not been burgled yet.
The most common reason I hear for not patching: “We were afraid a kernel update would break something.” This is a legitimate concern, but it is not a reason to avoid patching indefinitely. It is a reason to have a proper patch testing and rollback procedure — which none of the seven servers did.
One server was running a kernel from 2021 on an Ubuntu 20.04 installation. In the three years since that kernel was installed, 14 high-severity CVEs had been published for it, including two with CVSS scores above 8.0. The server was running a customer-facing e-commerce application processing live transactions.
The fix requires a patching schedule — not daily, not ad hoc, but a structured monthly cycle where packages are updated, kernel updates are applied, and the system is rebooted in a controlled maintenance window. On a properly managed server, this is routine. On an unmanaged server, it simply does not happen.
Finding 3 — fail2ban Not Installed, or Installed and Misconfigured
Nine of ten servers had no effective brute-force protection on SSH.
Three had fail2ban installed but not running — the service had crashed at some point and nobody had noticed. Two had fail2ban running but configured with the default settings: 5 failed attempts triggers a 10-minute ban. This is trivially circumvented by any modern brute-force tool, which simply throttles attempts to stay below the threshold.
Four had nothing installed at all.
The consequence is visible in the logs. On one server, I found a single IP address from Eastern Europe that had been attempting SSH logins at a rate of approximately one attempt every 12 seconds for 19 consecutive days. That is over 130,000 login attempts from a single source. The server had not blocked it because nothing was configured to do so.
On another server — the one mentioned in Finding 1 with 47,000 root login attempts — the source IPs were spread across 23 addresses in a pattern consistent with a distributed credential stuffing operation. fail2ban with default settings would not have stopped this because no single IP crossed the threshold.
Effective brute-force protection requires fail2ban configured with appropriate findtime, maxretry, and bantime values, combined with a recurring review of the banned IP list and the underlying auth.log. It also requires the service to be monitored — a stopped fail2ban is worse than no fail2ban, because it creates a false sense of security.
I wrote recently about an incident on my own server infrastructure where fail2ban itself caused a system outage by consuming all available CPU while scanning a bloated btmp log. Proper configuration includes log rotation to prevent exactly this failure mode.
Finding 4 — Open Ports Nobody Could Explain
Every audit includes a port scan and a conversation with the client about every open port found. On six of ten servers, there were open ports that the client’s technical team could not explain.
Not “open ports we forgot to close.” Open ports where nobody present — not the developer, not the CTO, not the freelancer who built the server two years ago — could say with confidence what was listening on that port or why.
On one server, port 8080 was open and serving a management interface for a monitoring tool that the company had trialled 18 months earlier and then stopped using. The trial software had been abandoned but never uninstalled. The management interface was accessible from the public internet, protected by a default username and password that had never been changed.
On another server, I found a Redis instance listening on its default port with no authentication configured. Redis with no authentication on a publicly accessible port is one of the most commonly exploited misconfigurations in production Linux environments — it allows an attacker to read all data in the cache, write arbitrary data, and in some configurations achieve remote code execution. This server was processing user session data.
The principle of least privilege applies to network services exactly as it applies to user accounts. Every port that is open and not strictly necessary is an attack surface. Every service that is running and not actively used is a liability. An audit maps this surface. Ongoing management keeps it minimal.
Finding 5 — Backup Configuration That Would Not Survive a Real Disaster
All ten servers had some form of backup configured. This sounds reassuring until you look at the details.
On four servers, backups were writing to the same physical disk as the primary data. A disk failure — the single most common cause of data loss — would destroy both the primary data and the backup simultaneously. These clients believed they had backups. They did not.
On three servers, backups were running successfully to a remote destination, but the backup had never been tested. A backup is not a backup until you have successfully restored from it. One of these three servers had a backup configuration that had been silently failing for six weeks — the backup job ran, wrote a file, and reported success, but the file was being written to a path that no longer existed due to a storage reorganisation. The backup file was zero bytes. Nobody knew.
Two servers had backups running but no retention policy configured. The backup destination was accumulating files indefinitely. One had 847 backup files spanning three years, consuming 2.3TB of storage on a 2TB volume that was at 94% capacity. The backup job was about to start failing due to lack of space.
One server had excellent backup configuration — automated, remote, tested monthly, with a clear retention policy and documented restore procedure. That server belonged to a client who had suffered a data loss incident four years earlier and rebuilt their infrastructure from scratch. Experience is an expensive teacher.
Finding 6 — Default Credentials on Management Services
Six of ten servers had at least one management service — a control panel, a database management tool, a monitoring dashboard, a mail interface — accessible via the web with default or trivially guessable credentials.
The most common offenders: phpMyAdmin with admin/admin or root with no password. Webmin with the default installation credentials. Grafana with admin/admin — which Grafana itself prompts you to change on first login, and which six out of ten people apparently do not.
One server had three separate management interfaces accessible from the public internet: phpMyAdmin, Webmin, and a WordPress admin panel, all with default or weak credentials. This server was running a B2C application with a live customer database.
Default credentials are not a sophisticated vulnerability. They do not require exploit code or specialised tools. They require only that an attacker try the obvious. In 2024, automated scanners probe every public IP address continuously, trying default credentials for every known management interface. If your service is listening and the credentials are default, it will be found and it will be exploited. The question is when, not whether.
Finding 7 — No Monitoring, or Monitoring That Nobody Watches
Eight of ten servers had no real-time monitoring. Two had monitoring configured but with alert destinations — email addresses — that were no longer actively checked.
The practical consequence of no monitoring is that you learn about problems from your users, not from your systems. Your customer service inbox becomes your alerting system. By the time a user reports that the application is slow or unavailable, the problem has typically been present for minutes or hours.
On one server, log analysis revealed a disk that had been running at 98% capacity for 11 days before the audit. The application had been intermittently failing to write logs, silently discarding data, for nearly two weeks. Nobody knew because nothing was monitoring disk usage.
On another server, a memory leak in the application was causing the server to consume all available RAM and begin swapping approximately every 72 hours. The server would become unresponsive, the application would time out, and at some point — the client was not sure exactly when or how — it would recover. This had been happening for approximately four months. The client had attributed it to “the server being a bit unreliable sometimes.”
Monitoring is not complicated. A basic setup covering CPU, memory, disk, and service health, with alerts routed to a channel that someone actually reads, catches the overwhelming majority of production incidents before they become outages. The absence of monitoring is a choice to be surprised.
Finding 8 — Overprivileged User Accounts
Seven of ten servers had user accounts with more access than their function required.
The most common pattern: a developer account created during a debugging session 18 months ago, granted sudo access for the duration of that session, and never had those privileges revoked. On three servers, former employees or former contractors still had active accounts. On one, an account belonging to a developer who had left the company 14 months earlier still had full sudo access and an active SSH key.
Access management is not a one-time task. It is an ongoing discipline. Every person who gains access to a production server should have the minimum access necessary for their specific task, and that access should be reviewed and revoked when the task is complete or the engagement ends. In practice, this almost never happens without a formal process to enforce it.
The risk is not only external. Former employees with lingering access represent an insider threat — not necessarily malicious, but a risk that the organisation no longer controls. A developer who left on bad terms, who still has an active SSH key to your production database server, is a liability that a simple audit would immediately surface.
Finding 9 — Firewall Rules That Had Grown Organically Into Incoherence
Six of ten servers had firewall configurations that had been added to incrementally over months or years, without any corresponding removal of rules that were no longer needed.
The result is a firewall ruleset that nobody fully understands. Rules added for a service that was later decommissioned. Rules added to allow traffic from an IP address that has since changed. ALLOW rules that supersede DENY rules in ways that were unintentional. On two servers, I found rules that directly contradicted each other — a DENY rule for a port range followed by an ALLOW rule for a specific port within that range, with the net effect depending entirely on rule ordering that nobody had explicitly designed.
One server had 47 separate iptables rules. When I asked the CTO to walk me through the logic, he could explain 11 of them with confidence. The other 36 were unknown — present, active, potentially allowing or blocking traffic in ways that the business had no visibility into.
A firewall that nobody understands is not a security control. It is security theatre. The annual review of firewall rules — removing what is no longer needed, documenting what remains, and verifying that the net effect matches the intended policy — is one of the most impactful and most consistently neglected maintenance tasks in Linux infrastructure management.
Finding 10 — No Documented Recovery Procedure
Ten out of ten servers had no documented disaster recovery procedure.
This was the finding that was true without exception. Every server, every client, every industry. Not a single one had a written, tested procedure for what to do if the server failed completely and needed to be rebuilt from scratch.
The absence of documentation is not always negligence. In early-stage companies, the person who built the server carries the recovery knowledge in their head. This works until that person is unavailable — on leave, sick, or no longer with the company — at exactly the moment a disaster occurs, which is precisely when disasters tend to occur.
On one server, the original infrastructure engineer had left the company eight months before the audit. His replacement had inherited the server but had never been briefed on the backup locations, the configuration specifics, or the dependencies between services. If that server had failed on the day of the audit, the company’s best estimate was that recovery would take three to five days — possibly longer, because some of the configuration was documented only in the original engineer’s personal notes, which the company no longer had access to.
A disaster recovery procedure does not need to be a 50-page document. It needs to answer three questions: where are the backups, how do you restore them, and in what order do services need to be brought back up. One hour of documentation work, done once and reviewed quarterly, is the difference between a two-hour recovery and a two-day one.
The Pattern Across All Ten Audits
Looking at the findings in aggregate, the pattern is consistent and clear.
None of these vulnerabilities required sophisticated attacks to exploit. None of them required zero-day exploits or advanced persistent threats. Every single finding — open ports, default credentials, unpatched kernels, no fail2ban, overprivileged accounts — is exploitable with tools that have been freely available for years and are used routinely by automated scanners operating at internet scale.
The servers were not compromised — at least, not in ways that were detectable at the time of the audits. But several of them were close. The brute-force attempts were ongoing. The open Redis instance had been accessible for months. The former employee’s SSH key was still active.
The common thread across all ten was not malicious intent or sophisticated adversaries. It was the absence of ongoing, structured management. Each of these servers had been built by a competent person who made reasonable decisions at the time. The problem was that nobody came back to review those decisions as the environment changed, as services were added and removed, as personnel turned over, as vulnerabilities were published for software that was no longer being updated.
Linux server security is not a project. It is not something you do once at launch and revisit when something breaks. It is an ongoing discipline — a monthly cycle of patching, reviewing, testing, and documenting — that keeps the gap between your server’s current state and the current threat landscape from growing into something exploitable.
What a Post-Audit Server Looks Like
For context, here is what the same server typically looks like 30 days after an audit and remediation engagement.
Root SSH login is disabled. Key-based authentication is enforced. fail2ban is configured, running, and monitored. The kernel is current. All packages are patched to the latest stable release. Open ports match a documented list of intentional services. No management interfaces are accessible from the public internet without VPN or IP restriction. User accounts have been audited and overprivileged access revoked. Former employee accounts are closed. Backups are running to a remote destination, have been tested with a successful restore, and have a retention policy. A monitoring setup is active with alerts going to a channel someone reads. A one-page disaster recovery procedure exists.
This is not an exotic standard. It is the baseline. It is what your production server should look like as a matter of course. Achieving and maintaining it requires not just the initial remediation, but a monthly cycle of review that keeps it there.
If Your Server Is on This List
If you recognised your own infrastructure in any of the ten findings above, the appropriate response is not panic. The appropriate response is an audit.
An audit tells you precisely where you stand. Not approximately, not based on assumptions — precisely, based on what is actually running on your actual server right now. From that baseline, remediation is straightforward. The fixes for every finding listed above are well-understood and implementable by any experienced Linux administrator.
What an audit will not do is fix itself. And the longer the gap between recognising the risk and addressing it, the larger the attack surface that automated scanners are probing, around the clock, every day.
Book a free 30-minute Infrastructure Audit with AV Services. No write access required. No obligation. You will know exactly where your server stands within five business days.
About the Author
Arun Valecha has managed Linux infrastructure for businesses across India, the US, and Europe since 1999. AV Services provides proactive Linux infrastructure retainers starting at ₹15,000 per month, covering ongoing security, maintenance, monitoring, and incident response. Certified partner of Pyramid Computer GmbH, Germany. Approved vendor for US-based technology companies since 2013.
Book a free Infrastructure Audit
Mumbai – India