How we almost killed our Azure Deployment by diagnosing it!

We recently learnt a lot of new lessons in Azure Websites deployments and diagnostics and will plan to write a set of blogs to hopefully help some fellow programmers but this one lesson is most important to remember before you even start diagnosing!

Recently while checking the health of one of our production deployments we saw a red pie chart!

filesystemfull

There was no warning from Azure and you frankly do not see this File System storage option when while checking health of your deployments independently. You see this option only by clicking on your App Service Plan( which shows you the health of the VM hosting your azure websites)

We were storing system diagnostics logs into the FileSystem but just about a few hours back the FileSystem was empty so it made no sense that the logs were suddenly 10GB. Also when we downloaded the logs using Kudu they were just a few MBs

image

The site was still running beautifully. The only visible side effects was!

  • When we attempted to do a diagnosis on our instance using scm it would say it can’t find diagnostics settings
  • We also saw a lot of errors when trying to log into the system.

On further reading we realized that the file system becoming full was pretty dangerous since the system can no long write but only read. It will NOT override what is already stored and will not remove data ( logs will get removed if you have set retention( which you can’t set from the new portal ). We think if we had tried to publish the site again it would have failed also since there was no space left!

On tracking back our steps during the day we realized what had gone wrong! We were trying to find the a bug causing the CPU to spike intermittently and were following the instructions at https://azure.microsoft.com/en-us/documentation/articles/app-service-web-troubleshoot-performance-degradation/ specifically the one’s regarding the support portal at http://<your app name>.scm.azurewebsites.net/Support and the Kudu portal at https://<Your app name>.scm.azurewebsites.net/.

  • We had repeatedly diagnosed our installation using the support portal during the day to find the CPU spike

image

  • Every-time you diagnose the app, apart from collecting logs the tool also creates a pretty heavy memory dump and stores it on the file system
  • Apart from the portal you can also see the file system storage in the Kudu environmental settings. Our settings prior to doing a diagnostic session shows

image

  • After the diagnostic session the numbers are

image

So do this a couple of times more and you are going to fill up your file storage!

After a little more tinkering we realized we can delete these logs and dumps( once analyzed ) using Kudu itself

  • Open the Kudu debug console and move to D:\home\data\DaaS>
    • DaaS is a webjob the diagnostic tool creates on your website which collects the analysis data.
  • You can pretty much delete all the folders once you are done analyzing your logs but most important to delete is the Logs folder. Also ideally you don’t want to delete your settings.
  • Note that the Heartbeats folder will get recreated since the job is on!

image

  • Now again check the File system storage in Environment Settings. We’ve suddenly got a lot of space to play with!

image

  • Ideally if you are done analyzing your system. You should shut the DaaS webjob also so that it stops collecting heartbeat logs and populating your FS slowly. It will automatically restart when you diagnose next time!

image

The Azure support portal and Kudu are awesome tools and we could not  have resolved our CPU spike ( another blog hopefully about that one!) and should definitely be used but remember to clean up once you are done!!

Till next time( hopefully soon)

Team Cennest!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>