Blog Post: Deep Thread: Tracing HPC Deployments on Windows Azure

One of the things we’ve worked hard to do here in the Windows HPC team is make it easy for the HPC cluster admin to deploy and manage Windows Azure nodes.  By and large, admins that are familiar with the HPC paradigms and processes should be comfortable deploying and managing their Windows Azure nodes. 

 

Most of the time things work well, however there can be occasions where the cluster admin has to troubleshoot and analyze where deployments don’t quite work as expected.  The Diagnostics suite that ships with Windows HPC Pack 2008 R2 (and which has been beefed up in SP2 for Windows Azure deployments) should be the first point of troubleshooting.

 

For the admin who wants to dig a bit deeper, or is working in tandem with Microsoft Support to diagnose issues, we ship a command-line tool with Windows HPC Pack 2008 R2 that allows you to operate under the covers of what is happening inside of their HPC deployment on Windows Azure.   This tool is hpcazurecmd, and below are some details of how to use it to get traces from Windows Azure.

Warning: The hpcazurecmd tool is a powerful tool, intended for advanced users.  Usage without knowing what you are doing could have un-intended consequences for your Azure deployments.  Please exercise appropriate care when using this tool.

One of the coolest things about Windows Azure is the powerful and dynamic tracing toolset that it provides.  For those not familiar with it, you can see some details here:  http://msdn.microsoft.com/en-us/magazine/ff714589.aspx

 

Windows HPC takes full advantage of this dynamic tracing.  By default, Windows Azure tracing is turned off.  It can be turned on, on-the-fly, with hpcazurecmd.  Turning on the tracing copies over traces to a table in the Windows Azure storage that was previously configured for use with a HPC Node Template, for a given deployment.  And hpcazurecmd also provides a way to retrieve the traces from the table right to the desktop, so the entire process of turning on and retrieving traces becomes very streamlined.

 

The steps below assume you have already created a Node Template for your Windows Azure deployment (let’s say, named ‘hpcazuretemplate’), have added some nodes based on this template, and started the deployment.

 

Start by opening up an elevated command prompt on the Head Node, navigating to the HPC bin folder, and setting an environment variable to point to the template name.

C:\Program Files\Microsoft HPC Pack 2008 R2\Bin>set TemplateName=hpcazuretemplate
 

Now, issue a command to get the deployment information:

C:\Program Files\Microsoft HPC Pack 2008 R2\Bin>hpcazurecmd /GetDeployments
 

This should give some output such as the below.  Note the Name field, as you will need to copy and paste it into the next command as a parameter.

Get Windows Azure configuration information from template "hpcazuretemplate"  1 Windows Azure deployment found  Deployment: 4865f261-af99-4bc5-9a68-9320d1234567  Name: hpcdeploymentmachinenamehnservicename75c5e023db8346758531fd6123456  Url:  label: RGVwbG95bWVudCBmb3IgTWljcm9zb2Z0IFdpbmRvd3MgSFBDIDIwMDggUjIgQ2x1c3Rlcjogc  2FsaW1hLWRldi1obg==  Status: Deploying

 If the deployment has not yet started, you will see something like:

Get Windows Azure configuration information from template "hpcazuretemplate"  0 Windows Azure deployments found

 Wait until the deployment has started, and you can see the deployment information as above.  Now issue the command to set the logging level that you desire – e.g.:

C:\Program Files\Microsoft HPC Pack 2008 R2\Bin>hpcazurecmd /settracinglevel /name: hpcdeploymentmachinenamehnservicename75c5e023db8346758531fd6123456 /Level:verbose

 Note that the ‘name’ parameter in the command should be the same as the ‘Name’ from the output of the ‘hpcazurecmd /getdeployments’ command – this is what identifies the specific deployment you will get the traces for.  You can copy and paste it from within the command line window.  The trace level here is set to Verbose, which produces the most detailed trace data.

 

Next, you need to wait for your deployment to move through the provisioning stage.  This can generally take a few minutes.  You can then type in the following command to retrieve the current trace log (with the correct name, of course):

C:\Program Files\Microsoft HPC Pack 2008 R2\Bin>hpcazurecmd /gettracinglog /name: hpcdeploymentmachinenamehnservicename75c5e023db8346758531fd6123456 /output:log.txt

 The output will tell you the number of trace entries that were retrieved, e.g.:

Get Windows Azure configuration information from template "hpcazuretemplate"  Deployment Id is 4865f261af994bc59a689320d1234567  ....................................  Total 36 entries

Now you can open up ‘log.txt’ in your favorite editor and examine the contents of the trace.

 

Note that if you get the following error output, you need to wait for the deployment to move beyond the initial provisioning stage, as the trace table is not yet available.  You can check the Node State in HPC Cluster Manager – if the nodes are still in the “Provisioning” stage, they may not have yet had any opportunity to log anything.

Get Windows Azure configuration information from template "hpcazuretemplate"  Deployment Id is 4865f261af994bc59a689320d1234567  System.Data.Services.Client.DataServiceQueryException: An error occurred while p  rocessing this request. ---> System.Data.Services.Client.DataServiceClientException:  <?xml version="1.0" encoding="utf-8" standalone="yes"?>  <error xmlns="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata">    <code>TableNotFound</code>    <message xml:lang="en-US">The table specified does not exist.  RequestId:1a158cad-0441-446e-95dd-8a5421234567  Time:2011-07-14T19:48:02.0425613Z</message>  </error>     at System.Data.Services.Client.QueryAsyncResult.Execute(MemoryStream requestContent)     at System.Data.Services.Client.DataServiceRequest.Execute[TElement](DataServiceContext context, Uri requestUri)     --- End of inner exception stack trace ---     at System.Data.Services.Client.DataServiceRequest.Execute[TElement](DataServiceContext context, Uri requestUri)     at System.Data.Services.Client.DataServiceQuery`1.Execute()     at System.Data.Services.Client.DataServiceQuery`1.GetEnumerator()     at Microsoft.Hpc.AzureCmd.GetTracingLog.RunCommand(IDictionary`2 propertyMap, IAzureManagementBroker broker)     at Microsoft.Hpc.AzureCmd.HpcAzureCmd.Main(String[] args)

A note on trace levels: there are a number of trace levels that can be specified, such as Error, Warning, etc. (you can get the full list and usage info by just typing ‘hpcazurecmd’).   Generally, the trace output will get larger as you go from Error on up.  Verbose traces can spew very large amounts of data, so be extremely careful when turning on Verbose, especially for deployments with large sets of nodes as they can generate multi-gigabytes worth of data in your Azure storage (you may want to delete the WADLogsTable from your Azure storage once you are done with your deployment).

 

Hopefully this post helped you gain more understanding of Windows HPC and Windows Azure under the covers.  Good luck in your investigative journeys!

 

Salim Alam, Windows HPC Development Lead

Brittny Gastineau Ashley Tisdale Rachel Blanchard Sienna Guillory Tricia Vessey