Today we had a very strange situation as I came into to office usually around 7:30am and as there are not much people are around, I have around one hour quite time to work on things that I don’t want too much disturbance..
Around 8:00, a fellow worker from Finance came to my office and asked if I knew why he couldn’t access the ‘Accounting’ folder anymore ? You have to know that our business is running it’s Intranet over a SharePoint 2013 instance that is configured to run across three servers. As it is very cumbersome for the users, especially accounting people that deal a lot with Excel files, to drag files around on SP, they do map their ‘main’ SP folders thru Windows Explorer with drive letters (i.e. F:\ ).
I started with the usual questions about “have you rebooted your computer ? did you change your password recently ?, and so on”.. As nothing seemed to be wrong, I fired up my IE browser which normally starts straight on our Intranet home page.. and allas, all I got what a typical SharePoint error message (the famous one with the yellow background).
As not much people from IT are in the office around this time, I started to look at the SP front-end server first by opening a remote session into the system. The event log was red of alerts of all sorts, but one that caught my eye was an ‘invalid login’ towards one of the service accounts pointing to the SP database server. In particular was the “SharePoint_Config” database mentioned in the errors, which led me to believe there was a serious issue going on the SQL server where SP hosts all the DB’s (and there are a ton, believe me… I think at least 20 or so).
As a SQL DB admin who manages quite a few SQL servers in the company, my first thought was to restart the SQL services, as nothing seemed to be unusual (disk space was OK, first thing to check, services were all running… ). At this time I tried to reach out our SP administrator that was en-route, but not aware at all about the whole mess.. In the mean time at least a half-dozen employees had already opened Help Desk tickets about the non-working intranet. He said that he wasn’t far from the office and said that I should just try to reboot all the servers, that would usually help to bring things back to normal 🙂
I didn’t had a chance to reboot all the servers, but one thing that caught my attention was the SharePoint_Config database that had a strange icon that was not looking normal. As my SP Admin had in the meantime reached his desk and called me, we started to look into the SQL DB status. We quickly found a blog post related to the exact same issue and started working on it.
The SharePoint_Config database is a vital component of a SharePoint server farm, as it contains all the configuration for the various sites composing the farm. In our case the database ran into an issue over the week-end and was put in ‘suspect‘ mode by the SQL server, cutting simply all access to the DB from any application.. thus the SP website could not run anymore.
With the help of the blog post we were able to quickly fix the corruption (fortunately for us) and within an hour or so had the server back on operation. At last resort, we would have had to reach out to the server backups to restore the faulty database, which would have taken probably much more time, as the whole backup is very huge.
Lesson to Learn from this event :
- the same could have happened to any Dynamics GP or CRM databases, not something specific to SharePoint
- if you manage your own backup jobs on the server for SQL DB’s, make sure you get an alert if anything unusual happens during the backup. In this case, the faulty DB could not be backed up, but the CommVault application that we use wasn’t triggering any alert about that …
- I’ve programmed my own sets of backup reports for all the SQL servers I’m responsible for (which was not the case for SharePoint), so I get in my inbox every day an overview of what was fully backed up
- if anything happens during my SQL maintenance jobs, I immediately get a notification on my cell/inbox.
- if you don’t manage yourself your DB backups (i.e. IT takes care of it), make sure you have some tools (like Idera, Apex or Red-Gate) to monitor the status of your SQL servers and DB’s.. Such situation should not have left a server stranded for several hours and could have avoided a stressful Monday morning in the office.
I hope this provides some food for thought about how you manage your daily SQL servers.
Until next post…
Disclaimer : I have no relations or interests in any ways of the above mentioned tools, but I’ve used some of them that are free for quite a few years.