Improving App Performance with Timeline Profiling. Memory Traffic

Skip to end of metadata
Go to start of metadata

It is well known that memory traffic impacts heavily on application performance: The higher the traffic, the slower your app. The problem is not in how frequently your app allocates memory (that is almost free as far as performance goes), but in how your app collects memory that's no longer needed. The convenience of the garbage collection (GC) mechanism that does this, unfortunately, has its cost.

*By default, the so-called background GC is enabled for desktop apps. Background GC is performed by a separate GC thread and does not suspend managed threads during the "heaviest" Gen2 collections. Nevertheless, during Gen0 and Gen1 collections managed threads have to be suspended. Therefore, background GC involves short "blocking GC" intervals.

First, GC itself demands some CPU time. For example, one of its stages is the detection of unused objects, a complex operation involving building object reference graphs. Second, to perform the Gen0 and Gen1 GCs*, garbage collector must obtain exclusive access to a part of the managed heap. This, in turn, suspends all managed threads except the one that triggered the "blocking GC." As the user interface thread is also suspended, app users may experience UI freezes at these moments.

That is why you should always try to optimize your app to reduce memory traffic and minimize the impact of GCs on app responsiveness.

In this tutorial, we will learn how to use timeline profiling to detect excessive GCs and their causes.

Contents

Sample App
Step 1. Running the Profiler and Getting a Snapshot
Step 2. Starting Analysis
Step 3a. Where Does Memory Traffic Come From? Analyzing Garbage Collections
Step 3b. Where Does Memory Traffic Come From? Analyzing Memory Allocations
Step 4. Improving the Code
Step 5. Verifying the Fix

Sample App


*If you’re new to timeline profiling, consider reading Getting Started before proceeding with this tutorial.

We use the same basic sample app as in the Getting Started* tutorial. This app is used to reverse lines in text files, e.g. “ABC” > “CBA”.

The source code for the app is available here.

With the Select Files button, a user chooses the text files to be processed. The Process Files button runs a separate BackgroundWorker thread (named FileProc) which reverses lines in the files. A label in the left-hand corner of the main window displays the progress. After the processing is finished, the label shows All files were successfully processed.

Imagine the following scenario: When testing the app, you find out that text files are processed not as fast as you expected. Moreover, in some cases you experience minor lags during file processing.

Let’s use timeline profiling to analyze these performance drawbacks!

Step 1. Running the Profiler and Getting a Snapshot


  1. Open the MassFileProcessing.sln solution in Visual Studio.
  2. If you have dotTrace integrated with Visual Studio, run the profiler by choosing ReSharper | Profile | Profile Startup Configuration Performance Profiling.
    Otherwise, run dotTrace from the Windows Start menu. In dotTrace, select New Session | Local, Profile Application | Standalone.
  3. In Profiling type, select Timeline. In case you ran dotTrace as a standalone app, you should also specify the path to the release executable of our sample app in Application.
  4. After you click Run, dotTrace runs our app and gives us a special controller window used to control the profiling process.

    Let's reproduce a performance issue in our app.
  5. Click the Select Files button and choose five text files that come with the app in the Text Files folder. 
  6. Click the Process Files button to start file processing.
  7. Collect a timeline profiling snapshot by clicking Get Snapshot’n’Wait in the controller window. The snapshot will be opened in Timeline Viewer.
  8. Close the app. This will also close the controller window.

Step 2. Starting Analysis


  1. *Named threads are highlighted bold on the diagram.

    Looking at the Threads diagram, you can see that dotTrace has detected several managed threads. These are the Main app thread, the Garbage Collection thread that is used to perform background GC, and the FileProc thread* used to process files. In addition, there are two threads that do no work: the Finalizer thread and an auxiliary Thread Pool (ID 12688) thread. Let’s hide these threads as irrelevant for our analysis.

  2. Select the Finalizer and Thread Pool (ID 12688) threads in the Threads diagram.
  3. Right-click and select Hide Selected Threads.
  4. Look at the Threads diagram and the status bar. The following filter is currently applied to the snapshot data: Select lifetime intervals of all threads excepting hidden.

    Note how the data in other filters were affected. For example, state times in Thread States are now calculated for all threads except the hidden ones. Top Methods and Call Tree have changed too, showing calls only from the filtered threads. 
  5. As we are concerned with slow file processing, let’s zoom into the period where the FileProc thread processes files. To do this, use Ctrl+Mouse Wheel on the Threads diagram.
    This automatically adds the Visible 3445 ms filter. Note how this filter affects others: all values are recalculated for the visible time range. The filter now applied to the snapshot data is Select all time intervals within the visible time range for all threads excepting hidden.
    Let’s remember the 3445 ms value as the approximate time required for processing files.
  6. On the Process Overview diagram, look at the bar called Blocking Garbage Collection. It appears that many GCs are performed during file processing. This indicates large memory traffic that, without any doubts, impacts on app performance.

    *Note that both ways are indirect, though quite reliable. The problem is that in theory, there could be cases when a thread running during a GC didn’t toggle it. Also, GCs could be toggled for a number of reasons, and resizing the heap is only one of them.

    Now, let’s find out what's really behind this traffic. There are two ways* this can be done in Timeline Viewer:

    • Identify the threads and methods that are running during the GCs. These must be the threads and methods that toggled these collections.
    • Identify the methods that allocate the major part of memory. The logic is simple: the main reason for toggling GCs is resizing a heap due to memory allocation. Thus, if a method allocates memory a lot, it triggers GCs a lot as well.

Looking ahead, we can tell that the second way is much easier and slightly more reliable. Nevertheless, for educational purposes, let’s try both.

Step 3. Where Does Memory Traffic Come From? Analyzing Garbage Collections


  1. In the Filters window, select the Blocking GC value in the Blocking GC filter. The resulting filter will be Select all time intervals within the visible time range for all shown threads where blocking GCs take place. Therefore, all other filters now show what was happening in the Main, FileProc, and Garbage Collection threads during blocking GCs. 
  2. *In a small app like ours, the answer lies on the surface. On large solutions it won’t be so obvious.

    Now, let’s identify the thread where the GCs happen.* For example, we can suppose that this was the Main thread.
    Select the Main thread on the Threads diagram. The resulting filter is now Select all time intervals within the visible time range for the Main thread where blocking GCs take place.

  3. Look at the Thread States filter in the Filters window.
    Now, it shows what the main thread did during all those GCs. Most of the time (98.5%) it was Waiting. This means that GC took place on some other thread (obviously on the FileProc thread) and the main thread had to wait until the GCs were finished.

    Now, let’s find out what method in the FileProc thread is to blame for all those GCs.
  4. Remove the Threads #15744 (Main) filter by clicking the corresponding button in the list of filters. The list of filters will look as follows:
  5. Select FileProc in the Threads diagram. The resulting filter is now Select all time intervals within the visible time range for the FileProc thread where blocking GCs take place.
  6. Look at the Thread State filter in the Filters window. It shows that 99.6% of the GC time the FileProc thread was Running. This confirms that the FileProc thread is responsible for these GCs.
  7. *By execution time. The execution time of user methods is calculated as a sum of the own user method time and the own time of all system methods it calls (down to the next user method in the stack).

    Select the Running state in Filters | Thread State. Taking into account all applied filters, Top Methods now shows the top methods* that triggered the GCs.

    As you can see, StringReverser.Reverse is the method that probably generates most of the memory traffic in the app.

Step 3. Where Does Memory Traffic Come From? Analyzing Memory Allocations


Now let’s try a simpler way of analyzing memory traffic. As we mentioned earlier, the idea is to identify methods that allocate large amounts of memory.

  1. In the list of filters, remove all filters except Visible 3445 ms.
  2. * In addition to event tracing data (like states and times of operations, call stack data, so on), dotTrace also collects data about memory allocation. This data can be selected as a separate subject for analysis.

    In the Analysis Subject filter, select Memory Allocation*.

  3. Look at the Threads diagram. The FileProc thread allocates enormous amounts of memory – 6090 MB. There is no doubt it’s to blame for high memory traffic in our app.
  4. Look at Top Methods. Now, it sorts methods by the amount of allocated memory.
    The StringReverser.Reverse method is far behind with 5448 MB.

Step 4. Improving the Code


Let’s look at this method in the code and try to find out what is wrong with it.

  1. Select the StringReverser.Reverse method by clicking on it in Top Methods.
  2. The Source View window shows you the method’s source code.
    It appears that this method is used to reverse lines from a text files. It takes a string on input and returns the reversed string.

    Apparently, the problem is in the way it reverses a string. The thing is that string is an immutable type – its contents cannot be changed after it is created. This means that each time we add a character to stringResult with the += operation, a new string is allocated in memory. It would be much more effective to employ a special StringBuilder class or handle the string as an array of char and use the Reverse method. Let’s try the second option.

  3. Switch to Visual Studio and correct the StringReverser.Reverse method as shown below.

Step 5. Verifying the Fix


  1. Rebuild the solution and perform profiling one more time as described in Step 1.
  2. After the snapshot is open in Timeline Viewer, switch Analysis Subject to Memory Allocation.
  3. Look at the Threads diagram. Our improvement works! Memory traffic was reduced from 6 GB to 164 MB in FileProc.

    If you zoom into the time interval where FileProc processed files, you’ll see that file processing has sped up significantly. Now, it takes only 370 ms to process all files, compared to 3445 ms before the fix.
  • No labels