Debugging on SharePoint is much the same as debugging any other ASP.Net application...you can use the same techniques. Here are some of my tips for diagnosing & solving your problems...
Debugging starts with your first line of code
As you start writing some code you need to bear in mind that at some point you will need to debug it. This may mean attaching a debugger (easy) or looking at a crash dump (hard). This means debugging should always be at the back of your mind when writing any code...ask yourself "when this is live how am I going to find out what has gone wrong?"
In practice this means making full use of the features of .Net ... System.Diagnostics, try...catch, exception logging, etc. One thing in particular to remember is that if you do ever have to look at a crash dump your variable & method names will be in there too. The more descriptive they are the easier it is going to be to analyse the dump. Also bear in mind that the less code in each method the easier it will be to pin-point the location of the error. Les Smith has some good points to make about refactoring...I would add that refactoring making debugging easier!
Use Trace.Write & Debug.Write
This is the first thing I would recommend. Adding these method calls at suitable places in your code can really help to solve problems. Used in conjunction with DebugView these methods can give you great insight to what is happening within your code. Personally I use Trace whenever something unexpected or 'out of the ordinary' occurs, normally exceptions, but it could be anything that may help solve a problem. Calls to the Trace methods remain in release code and so will add an overhead to the code execution. I generally use Debug to enable me to see code execution paths during development and as these do nothing in a release build can be used much more liberally.
Using the Trace statements can really help when you cannot attach a debugger (on a live server) to give you more information of what has gone wrong. If you ensure you use try...catch...finally blocks throughout your code, using Trace to log the exception makes it easy to see where the error has occurred (including the call stack) simply by running DebugView on the server.
Use try...catch...finally
When developing ASP.Net controls and WebParts I like to add try...catch blocks when overriding methods like OnInit, OnLoad, CreateChildControls and Render. I do this because I don't what the whole page to fail to render simply because one control has a problem...If the search box control throws an exception, why shouldn't the rest of the page display? Therefore my Render method would look like this...
protected override void Render(HtmlTextWriter writer)
{
   try
    {
      // Do something to render the control, which may cause an exception
   }
   catch (Exception ex)
   {
       Trace.Write(ex);
      Trace.WriteLine(HttpContext.Current.Request.Url.ToString());
      // Maybe render an error message
   }
}
This way if an exception is thrown the page will still render...the user may not get the full functionality of the page, but at least they get something. Essentially wrapping your code in a try...catch block has a negligible hit in terms of performance and so you should look to using them wherever something may throw, even if you just Trace and re-throw the exception. Also you should bear in mind that throwing an exception is a big overhead and should only be used when a real exception has occurred...something you really did not expect and cannot recover from.
Check the SharePoint log files
These will be located in the /12/LOGS folder. The log files contain errors & messages generated by SharePoint. I have to say that generally they don't give you as much information as you would like (sometimes nothing), but occasionally they point you straight to the problem. Either way you should always check them just in case.
You should also know that you can increase or decrease the amount of logging SharePoint does in Central Administration. Here you can change the verbosity of the logging within different parts of the application. Its not the easiest interface to use, but you can increase the number of messages logged...doing this has solved problems for me in the past.
You change the settings under 'Logging and Reporting' in the operations tab...
Clicking on diagnostic logging shows the following page...
Under 'Event Throttling' you can change the verbosity. I have to say that I normally just set all of the categories to verbose when I have a problem as it can be difficult to identify the categories which you may need to change.
Here you also see the 'Trace Log' settings with the default settings...generally I like to make these smaller, say 30 files and 5 minutes. This makes the log files smaller and easier to view.
Use all the tools available to you
There are a lot of tools out there to help you resolve your problems, get to know them, know what they can do and when they can help. This is a list of tools I use on a regular basis...
There are more tools avaiable, what you use depends on the problem you are trying to solve, in particular tools like U2U's CAML Builder can help you to replicate problems.
Don't try to guess
Don't try guessing what the problem might be and randomly changing your code to try and fix it (although this maybe a last resort!!!). Attach a debugger, look at the code, trace through the code and find the cause. If you need to, you can even break the debugger in SharePoint code, it maybe IL, but it is better than nothing. Reflector can really help in this situation as it gives you a chance to relate the IL code back to C#.
I had an example of this recently, the authoring console was giving me the following error...
No errors were logged anywhere and I couldn't see what could be causing the problem, but it was preventing us from authoring. To solve the problem I attached the VS.Net debugger to the w3wp process, disabled 'Just My Code'Â and set the debugger to break on all exceptions...
Â
I then refreshed the page and broke into the SharePoint code when the exception occurred...
Once in the debugger we can see that we have a NullReference exception thrown in a method called ConfigurationXml() in the ConsoleXmlUtilities class. None of these details were logged, but now I know exactly where the problem is manifesting itself. Now we have the class and the method we can use Reflector to see what is going on...
Looking at the ConfigurationXml method we can see where a null reference 'may' occur. Reading through the code shows that the most likely suspect is that file2 is being used without checking to see if it is null (digging into the configFile method shows us this is a valid return value). Using reflector shows us that this class is dealing with the configuration files for the consoles located in the Master Pages gallery. A quick check showed that none of the XML files in the Editing Menu folder had un-published versions (I still don't know how that happened). Publishing those files solved the problem!
This shows how simple it can be to resolve a problem, even when it is occurring within the SharePoint code base.
Re-produce & isolate the problem
Most of the time your problem will occur within the complex sequence of events of a webpage request. You can try to narrow down the possibilities by executing the same code in a different context. SharePoint actually gives you a really quick and easy way of doing this without having to deploy binaries...in the form of the _layouts folder.
You can easily create a quick .aspx page containing the code which is causing problems. This you can place in the _layouts folder to test. In the .aspx page you can add whatever Trace, Debug or Response.Write calls you like to see what is going on...you can see exactly what is going on without having to attach a debugger. This technique can work really well on live servers where you haven't got VS.Net and Notepad is the only tool available.
If the .aspx doesn't give you what you want, a console application or Snippet Compiler can also work well. The general idea here is to re-produce the problem in a different environment which should help to narrow down the cause.
Debug on the live server
Sometimes, no matter how hard you try, you cannot reproduce an error within a test environment...it only happens on the live server. If your Trace statements are not giving you the details you require then you are going to have to debug the problem on the server causing the problem. Fortunately there are a number of solutions to this problem.
If you have a connection for which the firewall allows remote debugging you can copy some files across to the live server and attach the debugger from your development machine...this will obviously work best with a debug build deployed. Debugging remotely actually works very well, but it will prevent the server from processing requests, this means the server will be unavailable whilst you perform your debugging...which might be early Sunday morning...the only time you can make the server unavailable!
If you can't get through the firewall then you can use WinDbg to debug on the server. This will allow you to attach to processes and step through code. It is actually more powerful than VS.Net, but is harder to use as, even though it has a UI, it relies on a cryptic set of commands to get it to do what you want. Even so, it is well worth using as it can give you access to valuable information.
To give you a comparison, the following is the same debugging process in WinDbg as the previous example in VS.Net.
Firstly attach to the w3wp process...
Once attached you will need to load the SOS.DLL to give you access to the debugging functions you'll need. You can use the following command to achieve this...
.load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\sos.dll
You now need to tell the bugger to stop on .Net exceptions. Do this with the following command...
sxe CLR
You can now continue the debugging by entering go (or click on the icon) and wait for the exception. When the exception is thrown the debugger will break and you will see the following in the command window...
As you can see we have hit an exception on the same line of code as we did in VS.Net. We can now look at the call stack using the command !clrstack, which produces the following results...
This again shows us the WSS class and method in which the error occurred. Here you could also enter !clrstack -p, which would show you the parameters and the memory addresses. If you want you can look at the method parameters using the command !dumpobject...
0:015> !clrstack -p
OS Thread Id: 0x50d4 (15)
ESPÂ Â Â Â Â Â EIPÂ Â Â Â
01c7ebe4 0bd81a4b Microsoft.SharePoint.Publishing.WebControls.ConsoleXmlUtilities.ConfigurationXml(System.String, Boolean)
   PARAMETERS:
       configProvider = 0x0e843368
       isBuiltInConfigFile = 0x00000000
0:015> !dumpobj 0x0e843368
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 52(0x34) bytes
 (C:\WINDOWS\assembly\GAC_32\mscorlib\2.0.0.0__b77a5c561934e089\mscorlib.dll)
String: CustomEditingMenu
Fields:
     MT   Field  Offset                Type VT    Attr   Value Name
790fed1c 4000096       4        System.Int32 0 instance      18 m_arrayLength
790fed1c 4000097       8        System.Int32 0 instance      17 m_stringLength
790fbefc 4000098       c         System.Char 0 instance      43 m_firstChar
790fa3e0 4000099      10       System.String 0  shared  static Empty
   >> Domain:Value 000db050:790d6584 000fe470:790d6584 <<
79124670 400009a      14       System.Char[] 0  shared  static WhitespaceChars
   >> Domain:Value 000db050:01d61438 000fe470:01d65140 <<
By using the !dumpobj (or !do for short) you can look at all the objects in memory...providing you can find their memory address. Other useful commands include...
!dumpstackobjects (shows all objects currently on the stack)
!printexception (!pe)
!dumpallexceptions (!dae)
Now we know the class & method we can look at the code using the command !u...
!u 0bd81a4b
0bd81a0e e9b2000000     jmp    0bd81ac5
0bd81a13 8b8d54ffffff   mov    ecx,dword ptr [ebp-0ACh]
0bd81a19 ba02000000     mov    edx,2
0bd81a1e ff150c7ee20b   call   dword ptr ds:[0BE27E0Ch]
0bd81a24 8bf0           mov    esi,eax
0bd81a26 89b540ffffff   mov    dword ptr [ebp-0C0h],esi
0bd81a2c 8b8d54ffffff   mov    ecx,dword ptr [ebp-0ACh]
0bd81a32 ba01000000     mov    edx,1
0bd81a37 ff150c7ee20b   call   dword ptr ds:[0BE27E0Ch]
0bd81a3d 8bf0           mov    esi,eax
0bd81a3f 89b53cffffff   mov    dword ptr [ebp-0C4h],esi
0bd81a45 8b8d3cffffff   mov    ecx,dword ptr [ebp-0C4h]
>>> 0bd81a4b 3909           cmp    dword ptr [ecx],ecx
0bd81a4d ff15a891b00a   call   dword ptr ds:[0AB091A8h]
0bd81a53 8bf0           mov    esi,eax
0bd81a55 8bce           mov    ecx,esi
0bd81a57 3909           cmp    dword ptr [ecx],ecx
0bd81a59 e8a24e2900     call   0c016900 (Microsoft.SharePoint.SPListItem.get_ListItems(), mdToken: 060035ef)
This is an abbreviated version, but you get the same IL as you do in VS.Net, actually its better as you get some method names. We can now use reflector as before to solve the problem.
Debug using a crash dump
Most of the time your not really going to be able to attach a debugger to a live server, but that still doesn't mean you can't debug exceptions in SharePoint. Microsoft provide a utility called ADPlus, which will create mini dumps of the exceptions within your SharePoint application. These dumps can then be opened in WinDbg to look at the dump in exactly the same way as you would using WinDbg live.
ADPlus is a console application which attaches to the process and waits for dumps to occur, taking a dump when they do. Once captured the dumps can be transferred back to your development machine diagnosed for as long as you want without tying up the live server. This is particularly useful when your SharePoint site is being managed by a hosting company and you do not have RDP access to the live server. You can easily script the commands so a support engineer at the hosting provider can create the dumps and email them to you.
Useful commands include
adplus -hang -pn w3wp.exe
and
adplus -crash -pn w3wp.exe
When debugging SharePoint I have only ever got -crash to give me anything useful, but I am sure -hang will be useful one day.
Note: By default you do not get a memory dump for first chance exceptions (because they can occur frequently), however adplus can be configured to do this (see http://support.microsoft.com/kb/q286350/). Not having a memory dump only means you can't see the contents of parameters & objects...you may not always need them.
More on ADPlus can be found within these articles...
http://blogs.msdn.com/tess/archive/2006/01/11/511773.aspx
http://support.microsoft.com/kb/q286350/
http://blogs.msdn.com/johan/archive/2007/01/11/how-to-install-windbg-and-get-your-first-memory-dump.aspx
Further Reading
The links below give you some things to do which can make your SharePoint debugging life easier.
A solution to "An unexpected error has occurred" in WSS v3
Using IISAPP to get the process ID of a SharePoint application
Test your SPSiteDataQuery parameters using Snippet Compiler
Anonymous SharePoint Publishing site forcing login
Make your SharePoint debugging experience a little less painful
Personally I have found If broken it is, fix it you should by Tess Ferrandez to be one of the most interesting blogs around debugging code and .Net in particular. I would recommend heading over there if you ever need to track down a difficult problem, you will get some good ideas and maybe a solution.
These are some links which I find useful when using WinDbg...
http://blogs.msdn.com/tess/archive/2006/05/18/601002.aspx
http://dotnetdebug.blogspot.com/2005/12/new-commands-in-net-20-sos-windbg.html
http://blogs.msdn.com/tess/archive/2006/10/13/asp-net-2-0-investigating-net-exceptions-with-windbg-compilation-and-load-exceptions.aspx
Hopefully this has given you some ideas as to how to approach debugging your problem, I'm sure there are more, but these should give you a good start.
-Vince