It’s the first time when I’m participating in T-SQL Tuesday event. I’m so excited I can share my thoughts with you. This month’s topic is:
In the <N> years I have been using SQL Server, we’re still dealing with <some problem>
That’s very interesting topic and when I read Andy Mallon (b|t) post I already knew what I will write about. My version of this topic is:
In the 8 years I have been using SQL Server, we’re still dealing with a lack of cooperation.
During my career I’ve already had plenty of situations when a different teams which supposed to work on the same side, doesn’t do that in proper way.
My favorite example is when one team, like first level of Product Support Team or Customer Care Team works on database related issue. It doesn’t matter what kind of issue is that. It can be database slowness, query timeout, transactional log growth, etc. You can put here everything you would like or what you worked recently on. After all, when issue is somehow fixed, the case is handover to the second team, like third level of Database Administrators support, which should do a root cause analysis and close the case with a detailed explanation what exactly happened.
There is a catch!
In my case the first team doesn’t provide any input for a second team. They never collect a data for analysis during incident. I understand it in some stage. Customer calls them and wants to have an issue fixed as soon as possible. But on the other side, they are later obligated to provide an explanation what happened. Due to lack of detailed SQL Server knowledge they need to reach out to the more experienced and specialized team of Database Administrators. But even the most experienced Database Administrator is not able to provide issue root cause without analyzing data captured during this incident. Right? You’re probably wondering how such communication between teams may look like. Here you go:
“Dear DBA Team,
Today customer reported database slowness in XYZ environment. Issue was solved by first level support by failing over instance to the second cluster node. Please take a look on this and provide explanation what caused this issue.”
“Did you collect any data during incident?”
“Because of very limited time no data was collected. Please check server and provide explanation for customer. It’s very urgent.”
Oh come on, I already saw that too many times. It’s very frustrating when you have to explain again and again that you’re not able to provide any explanation because the first team doesn’t collect any valuable data. Unfortunately in this case you’re always the bad guy and they are good guys. They solved issue! Even if they had to restart SQL Server to solve long running query issue. You’re the database specialist which is not able to explain what happened.
What can be done to fix this situation?
Well… You can put a monitoring tool in place which will trace and capture everything what happens on SQL Servers you’re responsible on. Or… you can try to improve work culture and teach other teams they just need to collect data. Due to the environment limitations I had no choice and my team works on this second option. It’s up to you what will work better for you.