Must Have Tooling for .NET Core Development

Here’s a great set of tools for smoothing your transition to developing in .NET Core.

IDE

  • VSCode – cross platform IDE; great for coding .NET Core

Portability

Porting

Perils of Async: Locking Out Performance

In a previous post, Perils of Async: Data Corruption, we saw the consequences of inadequate concurrency control in asynchronous code.  The first implementation using Parallel.ForEach did not protect shared data, and its results were wrong.  The corrected implementation used C#’s lock for the necessary protection from concurrent access.

Parallel.ForEach(input, kvp =>
{
    if (0 == kvp.Value % 2)
    {
        lock (mrr)
        {
            ++mrr.Evens;
        }
    }
    else
    {
        lock (mrr)
        {
            ++mrr.Odds;
        }
    }
    if (true == AMT.Math.IsPrime.TrialDivisionMethod(kvp.Value))
    {
        lock (mrr)
        {
            ++mrr.Primes;
        }
    }
});

Some may ask, “Why lock so many times? Can’t the code just lock once inside the loop?”

Parallel.ForEach(input, kvp =>
{
    lock (mrr)
    {
        if (0 == kvp.Value % 2)
        {
            ++mrr.Evens;
        }
        else
        {
            ++mrr.Odds;
        }
        if (true == AMT.Math.IsPrime.TrialDivisionMethod(kvp.Value))
        {
            ++mrr.Primes;
        }
    }
});

Although moving the lock just above the first if clause seems to have some benefits – it simplifies the code, shared data access is still synchronized, etc.  But it also kills performance – making it slower than even the non-parallel SerialMapReduceWorker.

9999999 of 9999999 input values are unique
[SerialMapReduceWorker] Evens: 5,000,533; Odds: 4,999,466; Primes: 244,703; Elapsed: 00:00:51.6998025
[ParallelMapReduceWorker] Evens: 5,000,533; Odds: 4,999,466; Primes: 244,703; Elapsed: 00:00:30.6871152
[ParallelMapReduceWorker_SingleLock] Evens: 5,000,533; Odds: 4,999,466; Primes: 244,703; Elapsed: 00:01:35.0778434

This situation highlights the common rule of thumb, “lock late.”  Locking late (or “low” in the code) implies that code should lock just before accessing shared data, and unlocking just afterwards.  This approach reduces the amount of code which executes while the lock is held, so it provides contenders (the threads) with more opportunities to acquire the lock.

 

Perils of Async: Data Corruption

One of the most common bugs occurring in any multi-threaded or multi-process code is corrupting shared data due to poor (or lack of) concurrency control.  Concurrency is one term used to describe code interactions that are not sequential in nature (equivalent or companion terms include parallel, multi-threaded and multi-process).  Within this context, concurrency control indicates the tactics used to ensure the integrity of shared data.

To demonstrate this problem, we’ll use a fairly simple example: Counting even, odd and prime numbers in a large set.  We’ll use different strategies over the same data set for serial and parallel processing.  The serial implementation, SerialMapReduceWorker, is straight-forward since it involves no concurrency issues.

foreach (var kvp in input)
{
    if (0 == kvp.Value % 2)
    {
        ++mrr.Evens;
    }
    else
    {
        ++mrr.Odds;
    }
    if (true == AMT.Math.IsPrime.TrialDivisionMethod(kvp.Value))
    {
        ++mrr.Primes;
    }
}

SerialMapReduceWorker iterates over the set of integers, kvp, determines whether each integer is even, odd or prime, and increments the appropriate counter in the MapReduceResult instance, mrr.  (Although no map-reduce is involved, SerialMapReduceWorker is named for consistency with concurrent workers)
.NET’s Task Parallel Library (TPL) makes it very easy (too easy?) to convert this code to run concurrently.  All a developer has to do is change foreach to Parallel.ForEach, include some lambda syntax, and voila!, the code magically runs much faster!

Parallel.ForEach(input, kvp =>
{
    if (0 == kvp.Value % 2)
    {
        ++mrr.Evens;
    }
    else
    {
        ++mrr.Odds;
    }
    if (true == AMT.Math.IsPrime.TrialDivisionMethod(kvp.Value))
    {
        ++mrr.Primes;
    }
});

Just look at these results – the parallel version executed almost twice a fast!

9999999 of 9999999 input values are unique
[SerialMapReduceWorker] Evens: 5,000,533; Odds: 4,999,466; Primes: 244,703; Elapsed: 00:00:51.6998025
[ParallelMapReduceWorker_Unprotected] Evens: 4,996,020; Odds: 4,994,704; Primes: 244,662; Elapsed: 00:00:30.5742845

Unfortunately, this conversion is also a dangerously naive implementation of concurrent code.  Did you notice the problems?  The parallel code found a different number of even, odd and prime numbers within the same set of integers.  How is that possible?  Answer: data corruption due to a lack of concurrency control.

The implementation in ParallelMapReduceWorker_Unprotected does nothing to protect the MapReduceResult instance, mrr.  Each thread involved increments mrr.Evens, mrr.Odds and mrr.Primes.  In effect, the threads might increment mrr.Evens from 4 to 5 simultaneously when the expectation is that one will increment from 4 to 5 and another from 5 to 6.  As you can see in the results above, this unexpected data corruption causes ParallelMapReduceWorker_Unprotected’s count of even integers to be wrong by about 4,000.

In this case, correcting the error is fairly simple.  The code just needs to protect access to MapReduceResult to ensure that only one thread can increment at a time. The corrected ParallelMapReduceWorker:

Parallel.ForEach(input, kvp =>
{
    if (0 == kvp.Value % 2)
    {
        lock (mrr)
        {
            ++mrr.Evens;
        }
    }
    else
    {
        lock (mrr)
        {
            ++mrr.Odds;
        }
    }
    if (true == AMT.Math.IsPrime.TrialDivisionMethod(kvp.Value))
    {
        lock (mrr)
        {
            ++mrr.Primes;
        }
    }
});

Each time this code determines it needs to increment one of the counters, it uses concurrency control by:

  1. Locking the MapReduceResult instance
  2. Incrementing the appropriate counter
  3. Unlocking the MapReduceResult instance

Since only one thread can lock mrr, other threads must wait until it is unlocked to proceed.  This locking now guarantees that, continuing our previous case, mrr.Evens is correctly implemented from 4 to 5 only once.  ParallelMapReduceWorker correctly calculates the counts (as compared to SerialMapReduceWorker), and does with almost the same performance as the unprotected version.

9999999 of 9999999 input values are unique
[SerialMapReduceWorker] Evens: 5,000,533; Odds: 4,999,466; Primes: 244,703; Elapsed: 00:00:51.6998025
[ParallelMapReduceWorker_Unprotected] Evens: 4,996,020; Odds: 4,994,704; Primes: 244,662; Elapsed: 00:00:30.5742845
[ParallelMapReduceWorker] Evens: 5,000,533; Odds: 4,999,466; Primes: 244,703; Elapsed: 00:00:30.6871152

 

NOTE: The count of primes appears to be incorrect. The IsPrime.TrialDivisionMethod implementation is intentionally slow to ensure multiple threads contend for access to the same data.  Unfortunately, and unintentionally, it also appears to be incorrect. (c.f., Count of Primes)

Perils of Async: Introduction

As application communications over lossy networks and “in the cloud” have grown, the necessity of performing these communications asynchronously has risen with them. Why this change has been occurring may be an interesting topic for another post, but a few simple cases demonstrate the point:

  • Web browsers make multiple, asynchronous HTTP calls per page requested. Procuring a page’s images, for example, have been asynchronous (“out-of-band”) operations for at least decade.
  • Many dynamic websites depend on various technologies’ (AJAX, JavaScript, jQuery, etc.) asynchronous capabilities – that’s what makes the site “dynamic.”
  • Similarly, most desktop and mobile applications use technologies to communicate asynchronously.

Previously, developing asynchronous software – whether inter-process, multi-threaded, etc. – required very talented software developers. (As you’ll see soon enough, it still does.) Many companies and other groups have put forward tools, languages, methodologies, etc. to make asynchronous development more approachable (i.e., easier for less sophisticated developers).

Everyone involved in software development – developers, managers, business leaders, quality assurance, and so on – need to be aware, however, that these “tools” have a down-side. Keep this maxim in mind: Things that make asynchronous software development easier also make bad results Ibugs!) easier. For example, all software involving some form of asynchronicity

  • Not only has bugs (as all software does), but the bugs are much, much more difficult to track down and fix
  • Exhibits higher degrees of hardware-based flux. Consider, for example, a new mobile app that is stable and runs well on a device using a Qualcomm Snapdragon S1 or S2 (single-core) processor. Will the same app run just as well on a similar device using (dual-core) Snapdragon S3 or above? Don’t count on it – certainly don’t bet your business on it!

This series of posts, Perils of Async, aims to discuss many of the powerful .NET capabilities for asynchronous and parallel programming, and to help you avoid their perilous side!

Debugging .NET-based Windows Services

We haven’t needed to implement a Windows Service in a while, but certainly have experienced the pain of debugging them when run by Service Control Manager (SCM).  Here are a couple of links to good tools and discussion on the topic:

Run Windows Service as a Console Program by Einar Egilsson provides a good example of how to debug your service easily as a console app.  It also includes some good discussion of single- and multi-service hosting services, various ways to end or break out of the service process when running as console app, etc.

Windows Service Helper on CodePlex is a SCM replacement which provides for F5 debugging from Visual Studio. The UI it provides gives you the SCM-esque start, stop, pause capabilities to facilitate debugging the associated functionality in your service.

Hopefully these may be useful to you, but if nothing else, this will be a good reminder for the next time we do a Windows Service.

Avoid ACTK’s ModalPopupExtender Inside UpdatePanel

We were having some really strange behavior with the AjaxControlToolkit’s ModalPopupExtender – everything was fine on one page, but not on another.  One major purpose of using MPE was to wrap a common control. So, as you can imagine, we got pretty frustrated.  If two pages can’t achieve consistent results with shared components, development and maintenance cost projections are less predictable (although “lower cost” goes right out the window).

After quite a bit of painstakingly slow walk-throughs of the .aspx pages, we finally discovered that one of the pages had the MPE declaration within an UpdatePanel.  Since this was the misbehaving page, we simply moved the MPE outside the UpdatePanel and voilà! – it misbehaved no more.

We didn’t really dig into why our MPE inside a UpdatePanel behaved as it did – we had already burned A LOT of time on the problem and needed to push on to our deadline.  One of the devs found that this post seemed to be on track, so we’ll re-visit it if we find ourselves in need of making an MPE work within a UpdatePanel.

It may be worthwhile to mention that using the debugger didn’t help a bit in this case.  We spent untold hours making a little change here, checking the behavior, adding some extra debug code there, checking the behavior, etc.  Painfully frustrating.  But in the end it came down to hawkish eyeballing of the .aspx files.

 

Azure 1.4 SDK and Tools

Yesterday (3/9/11) Microsoft announced updates for the Windows Azure SDK and Tools (VSCloudService.exe).  As the download page indicates, this SDK release’s primary purpose is to address stability issues.  Other improvements include:

  • Azure Management Portal – Improved responsiveness
  • Azure Connect – Multi-admin support and installation for non-English Windows
  • Azure Content Delivery Network (CDN) – Provides for delivery of secure content via HTTPS

Be aware also that the 1.4 SDK installer automatically uninstalls previous versions.  So, if you need 1.3 for a while still, you’ll need to keep it safe.

Download the SDK & tools here.

Better Management Certificates for Azure

As I wrote earlier this week, the Silverlight-based Management Portal may be the best feature of Azure’s 1.3 release.  It is a handsome UI, Azure components are far easier to access, and (most importantly) the amount of time required for common tasks is sharply reduced.

This post is only partly about the Management Portal, however.  The certificate generating code in the Azure Tools for VS 2010 is also involved.

The Problem

Although it’s nice to be able to create the necessary management certificate from within Visual Studio, the resulting certificate naming is confusing.  The following dialog shows an existing certificate and the selection for creating a new certificate.

AzVSTools - Certificate Selection

When this tool creates a certificate, it sets the Friendly Name you entered and sets Issued By to Windows Azure Tools.  No big deal, right?  Right – until you add the certificate via the Azure Portal….

Azure Mgmt - Mgmt Certs with Comments

This view of the portal shows the Management Certificates, but you can’t really tell which is which.  For example, which of the two certificates corresponds to the one with Friendly Name: Deployment Credentials in the Azure Tools dialog?  You really can’t tell unless you are able to distinguish them by their thumbprints or validity dates.  Why doesn’t Deployment Credentials  appear in one of the fields?  Well, let’s take a quick look at the certificate in Certificate Manager (certmgr.msc).

CertMgr - Personal Certs - Windows Azure Tools-Deployment Credentials

When Azure Tools created the certificate, it set Windows Azure Tools in the Issued To and Issued By fields.  The name I provided the tool appears in the Friendly Name field.  I’m glad that I can distinguish the certificate in my local store with the friendly name, but it’s only known in my local store.  That’s the problem: Friendly Name is not part of the certificate; it’s metadata associated with the certificate, and only locally.

What’s A Better Way?

Instead of using the Azure Tools to create a certificate, use the MakeCert tool.  Azure only accepts certain certificates (X.509, 2k-bits, SHA1, etc.), so you have to provide a few specific parameters.  Here’s a sample command line:

makecert -sky exchange -r -n “CN=<CertName>” -pe –a sha1 -len 2048 -ss My “<CertFileName>”

where CertName specifies the name you want to appear in Name field in Management Certificates of the management portal, and CertFileName specifies where to store the certificate file on your local drive.

Now, when you upload the certificate to the management portal, you can easily distinguish the certificates.

Azure Mgmt - Mgmt Certs - Better Names2

Then, when you Publish from Visual Studio, simply choose the appropriate certificate from the list.

AzVSTools - Certificate Selection2

Admittedly, the Friendly Name isn’t set, but you have no trouble distinguishing between certificates in either Visual Studio or Azure’s Management Portal.

BIG Release of Azure Components This Week!

Windows Azure SDK 1.3 (a.k.a., November release) has just been released.  You can download just the SDK, but if you’re using Visual Studio, use the Windows Azure Tools for Visual Studio instead.  This package, VSCloudService.exe, includes the SDK package.

The major features / benefits of this release include:

  • Management Portal – the new Silverlight-based portal may be the most significant improvement of this release.  Managing Roles, Storage, Service Bus, Access Control, etc. are so much easier to access, and the portal’s performance improvements make a substantial impact on management tasks.
  • Full IIS – Finally! Each Web Role can host multiple sites – web apps, services.  Additionally, developers can now install IIS modules as well (some apps haven’t been migrated due to dependence on 3rd party or custom modules)
  • Remote Desktop – I’ve been looking forward to this for a while!  Being able to connect to Azure Roles and VMs via RDP is going to make a huge difference in so many ways – configuration, deployment, debugging, etc.
  • Windows Server 2008 R2 – Azure Roles and VMs can now be based on R2 which brings in IIS 7.5, ability to restrict what apps can run via AppLocker, PowerShell 2.0 for better administration and automation.
  • Elevated Role Privileges – I’m not so sure this is a really good idea, but it’s in now.  Azure Roles allow running with administrator privileges (sounds like “running with scissors”).  I can imaging some scenarios in which a Worker Role does a bit of admin level work, or a Web Role hosting a custom administrative portal.  But, in general, devs need to be very careful with this “feature.”
  • Multiple Admins – Multiple Live IDs can be assigned admin privileges in an Azure account.  This provides better traceability when you’re doing around-the-clock administration.  But it may also introduce risk of “stepping on each other’s toes” problems.

Also in this round of updates are a couple of betas and CTP.

  • Extra Small Instance – in BETA – at just 5 cents per compute hour, the Extra Small Instance is less than half the cost of the Small Instance (12 cents per compute hour). At the time of this writing, the Extra Small Instance is comprised of 1.0 GHz CPU, 768 MB RAM, 20 GB local storage and “low” I/O Performance.
  • Virtual Machine Role – in BETA – Now you can define and manage your own virtual machine. Based on the (very) little info I have right now, the VM is based on a differencing disk over a Windows 2008 Server R2 VM.  That limits the options of what to run in the VM.  IMO, this is the last check-box for Azure qualifying as Infrastructure as a Service (IaaS).
  • Azure Connect – in CTP – Connect provides the ability to create a virtual network between multiple devices. For example, if companies A & B want two of their systems to communicate with each other, those systems connect to Azure, establish the private network, and then communicate directly between A & B.  I really want to test this one out!

Clemens Vasters’ Scheduler-Agent-Supervisor Pattern

Microsoft’s Clemens Vasters published a pattern the Azure team is using which he calls the Scheduler-Agent-Supervisor pattern (http://blogs.msdn.com/b/clemensv/archive/2010/09/27/cloud-architecture-the-scheduler-agent-supervisor-pattern.aspx)

This pattern is reminiscent of the solution we architected and implemented a few years ago (at a much smaller scale, obviously).  We used SQL Server for persistent storage of pipelines of step-wise tasks, .NET-based agents and a good ol’ Windows Service to execute the pipelines.  The service polled for pipelines waiting for work to execute, instantiate the appropriate agents for the work and provide context and threads for agent execution.  After quite a bit of rigorous stress and negative testing, the system proved to be very efficient.  Separately, it was actually very adaptable to other purposes since the agents implemented the business logic — new agents or agents used in different ways could implement different functionality.

Ok, enough of what we did before.  Clemens’ post builds on the Scheduler-Agent pattern by adding a Supervisor.  This Supervisor appears to be a sort of über-agent which is scheduled to check for problems in the pipelines (i.e., lists of tasks) and attempt to correct them.  Clemens writes,

“If the agent keels over and dies as it is processing the step [pipeline task] (or right before or right after), it is obviously no longer in a position to let the scheduler know about its fate.”

In these cases, the pipeline is likely in an indeterminate state.  The scheduler has handed off the responsibility to the agent, but the agent hasn’t reported that it has finished its work (or even that it encountered an error).  Enter the Supervisor.  Just look for incomplete work and start over, right?  Well, it’s not a simple as it seems at first.  What if the agent had calculated and added tax to a total amount and then died just before reporting that it was done?  The Supervisor shouldn’t trigger this agent to execute again because the tax would be added a second time.  This particular case is very simple, but you get the point.  The entire Schedule-Agent-Supervisor pattern requires a level of atomicity in various areas to reduce these risks.

In case anyone is curious to know how we dealt with this in our solution a few years ago, we used a “supervisor,” too:  a developer who found the broken agent in the database and reset it!  How’s that for sophisticated? 😉