How Badly Can AV Scanning Impact Your SharePoint Farms Performance

Let me talk about the issue first, and how we can fix it. Later, we will discuss the lessons we learned from it.

Issue we faced

A couple of days back, we received a huge number of support incidents from the users across the globe stating that they were not able to access SharePoint portal. For some reason, the issue was intermittent. The site used to load fine at times, but all of a sudden, it stopped loading. We decided to login to each WFE servers in the farm and identify the one which was throwing the bad request. Listed below are the steps which we did initially to identify the root cause.
the troubleshooting steps done by us, initially:

  1. Tried loading the problematic site from our end and checked whether we’re able to reproduce the issue from our end.

  2. Once we confirmed that we were able to reproduce the issue, we changed the host file entry to point to all the WFE’s in the farm and tried to load the site. This was done to identify which WFE server threw the bad request.

  3. During this process, we happened to notice some abnormal behavior in two servers (i.e. WFE1 & WFE 2) of the SharePoint farm. The CPU/RAM utilization in these two servers was continuously hitting 100% and because of that, all the requests going to these servers were failing. The server was almost in an unresponsive state.

  4. We took a look at the event viewer and found many entries related to McAfee Anti-virus software update process getting failed. Then, I opened the McAfee console to understand what’s happening; as expected, I could find many update failures. I pulled the McAfee logs and found many entries related to that.

  5. In addition to that, we also noticed entries in the ULS logs about the SharePoint farm trying to run a configuration change by itself. It was also invoking an upgrade process. The w3wp.exe SharePoint worker process was also consuming heavy RAM.

  6. Now, given the fact that we noticed so many weird entries in the logs, we planned to reboot the server and see if that helped. And yes, it helped and the issue was resolved.

  7. However, the server reboot was just a week away and we wanted to identify what exactly triggered this because we noticed some weird entries in the SharePoint logs about automatic configuration change and upgrade process. Hence, we decided to open a support case with Microsoft for a detailed RCA.

Now, let’s take a look on what Microsoft had to say to us about this issue ….

Troubleshooting steps done by Microsoft

We captured the ULS logs on the exact time the issue was reported and shared the same to Microsoft (please note that this issue which we are currently talking about is a non-reproducible one, meaning: we were not able to reproduce the same behavior to Microsoft as this happened only once. After the server reboot, everything looked normal). Microsoft analyzed the logs and this is what they found ….

A huge performance issue was identified, as you can see in the logs below:

logs

AppDomain recycling was happening very frequently (shown below).

logs

logs

The App domain recycling was happening on both the WFE’s, as shown in the ULS logs screenshot below:

logs

What we identified after analyzing the logs?

Now, based on the above analysis, we identified that the root cause of this issue was the AppDomain recycling happening very frequently. This is an isolation process within the W3WP process of the web application. This process went on recycling and that caused the performance issue of the environment.

The possible root cause for this App Domain recycle can be because of the below mentioned two reasons:

  • AV exclusions are not implemented in your SharePoint farm as per the article below. Certain folders may have to be excluded from antivirus scanning when you use file-level antivirus software in SharePoint.

  • The application restarts may occur in some situations when any processes accessing Web.config file in the root of the application, the Machine.config file, the Bin folder, or the Global. asax file.

    In our case, it’s the first one where we didn’t exclude the necessary files/folders from AV scanning and hence we decided to exclude the folders/files, as mentioned in the aforementioned article. These are SharePoint system related files/folders and they have to be excluded from AV scanning. Else, when a scheduled full scan kicks off in your SharePoint farm, it will start scanning these files too. Well, this will impact the performance of the SharePoint farm.

Lessons learned

If you’re planning to install Antivirus software in your SharePoint farm, please make sure that all the folders mentioned in this article are excluded from getting scanned. These are SharePoint system related files & folders; and every time the AV scan engine tries to scan these files, it puts the farm on risk as the scanning process will interfere the SharePoint’s operations.