Wednesday, September 5, 2012

A non-empty list type for .NET

A simple immutable list like the fundamental list type in OCaml, F# or Haskell can be expressed as:

type 'a Alist = 
    | Nil 
    | Cons of 'a * 'a Alist 

That is, it's either empty, or it's non-empty. We could refactor the non-empty part to a record type:

type 'a Alist = 
    | Nil 
    | Cons of 'a NonEmptyList 
and 'a NonEmptyList = { Head: 'a; Tail: 'a Alist }

This NonEmptyList type is clearly not a new data structure (it's still an immutable list). It may seem silly at first to use this as a separate list type, but it actually goes a long way towards making illegal states unrepresentable.

Because it is guaranteed not to be empty by the type system, it has certain interesting properties. For one, obviously getting the head of a non-empty list will always work, for any instance of the type, while List.head throws an exception for an empty list (here's the proof of why it can't have any other behavior).

The F# List module has many such partial functions that are undefined for empty lists: head, tail, reduce, average, min, max... and of course the respective functions in the Seq module and System.Linq.Enumerable. If you want to use one of these functions on a regular list/IEnumerable, you either have to immediately check if the input is empty first, and return a 'default' value (many people wrap this in a function e.g. MaxOrDefault(defaultValue), effectively making it a total function); or catch the possible exception every time. Otherwise you're risking a possible failure. Another way to make these functions total is by simply removing the empty list from the domain, that is, operating on non-empty lists.

The Haskell community generally recommends avoiding such trivially avoidable partial functions [1] [2] [3], and this advice applies equally to most (all?) languages, especially typed languages. At the very least, it's useful to be aware of where and why you're using a partial function.

Applicative validation makes for a good example of NonEmptyLists. I've written about applicative functor validation before, in F# and in C#, with examples. The type I used to represent a validation was Choice<'a, string list>. This means: either the value (when it's correct), or a list of errors. But strictly speaking, this type is too "loose", since it allows the value Choice2Of2 [], which intuitively means "The input was invalid, but there is no error". This simply doesn't make any sense. If the input is invalid, there must be at least one error. Thus, the correct type to use here is Choice<'a, string NonEmptyList>.

Another occurrence of NonEmptyList recently popped up while I was writing a library to bind the Urchin Data API. In this API there's a parameter that is mandatory, but admits more than one value: a perfect match for a NonEmptyList.

In general, when you find yourself not knowing what to do with the empty list case, or when you think "this list can't possibly be empty here", it may be an indication that you need a NonEmptyList.

The code is currently in my fork of FSharpx, it includes the usual functions: cons, map, append, toList, rev, collect, etc.

It's also usable from C#, here are some tests showing this.

I briefly touched on the subject of totality here, which has deep connections to Turing completeness. Here's some recommended further reading about it:

Tuesday, August 7, 2012

Optional parameters interop: C# - F#

Both C# and F# support optional parameters. But since they're implemented differently, how well do they play together? How well do they interop?

Here's I'll analize both scenarios: consuming F# optional parameters from C#, and consuming C# optional parameters from F#.

For reference, I'm using VS2012 RC (F# 3.0, C# 5.0)

Calling C# optional parameters in F#

Let's start with some C# code that has optional parameters and see how it behaves in F#:

public class CSharp {
    public static string Something(string a, int b = 1, double c = 2.0) {
        return string.Format("{0}: {1} {2}", a, b, c);
    }
}

Here are some example uses of this function:

var a = CSharp.Something("hello");
var b = CSharp.Something("hello", 2);
var c = CSharp.Something("hello", 2, 3.4);

Now we try to call this method from F# and we see:

csharp-optional

Uh-oh, those parameters sure don't look very optional. However, it all works fine and we can write:

let a = CSharp.Something("hello")
let b = CSharp.Something("hello", 2)
let c = CSharp.Something("hello", 2, 3.4)

which compiles and works as expected.

Calling F# optional parameters in C#

Now the other way around, a method defined in F#, using the F# flavor of optional parameters:

type FSharp = 
    static member Something(a, ?b, ?c) = 
        let b = defaultArg b 0 
        let c = defaultArg c 0.0 
        sprintf "%s: %d %f" a b c 

We can happily use it like this in F#:

let a = FSharp.Something("hello")
let b = FSharp.Something("hello", 2)
let c = FSharp.Something("hello", 2, 3.4)

But here's how this method looks like in C#:

fsharp-to-csharp

Yeah, there's nothing optional about those parameters.

What we need to do is to implement the C# flavor of optional parameters "manually". Fortunately that's pretty easy, just mark those parameters with the Optional and DefaultParameterValue attributes:

open System.Runtime.InteropServices

type FSharp = 
    static member Something(a, [<Optional;DefaultParameterValue(null)>] ?b, [<Optional;DefaultParameterValue(null)>] ?c) = 
        let b = defaultArg b 0 
        let c = defaultArg c 0.0 
        sprintf "%s: %d %f" a b c 

Why "null" you ask? The default value should have been None, but that's not a compile-time constant so it can't be used as an attribute argument. Null is interpreted as None.

These attributes don't affect F# callers, but now in C# we can write:

Console.WriteLine(FSharp.Something("hello"));
Console.WriteLine(FSharp.Something("hello", FSharpOption<int>.Some(5)));

So we have optional parameters but we still have to deal with option types when we want to use them. If you find that annoying or ugly, you could use FSharpx, in which case FSharpOption<int>.Some(5) turns into 5.Some() .

The astute reader will suggest an overload just to handle the C# compatibility case. Alas, that doesn't work in the general case. Let's try and see what happens:

type FSharp = 
    static member private theActualFunction (a, b, c) =
        sprintf "%s: %d %f" a b c

    static member Something(a, ?b, ?c) =
        let b = defaultArg b 0
        let c = defaultArg c 0.0
        FSharp.theActualFunction(a,b,c)

    static member Something(a, [<Optional;DefaultParameterValue(0)>] b, [<Optional;DefaultParameterValue(0.0)>] c) =
        FSharp.theActualFunction(a,b,c)

Note that I moved the "actual working function" to a separate method, otherwise the second overload would just recurse. But we have a duplication in the definition of the default values. Still, the real problem shows when we try to use this in F#:

let d = FSharp.Something("hello", 2, 3.4)

This doesn't compile as F# can't figure out which one of the overloads to use.

Conclusion

F# has no issues consuming optional parameters defined in C#.

When writing methods with optional parameters in F# to be called from C#, either add the corresponding attributes and deal with the option types, or add a separate non-overloaded method. Or forget the optional parameters altogether and add overloads, just as we all did in C# before it supported optional parameters.

Monday, July 30, 2012

A tale of equational reasoning in F#

My implementation of formlets, based on the original paper, composes quite a few applicative functors. They're all standard applicatives. For example, the one that looks up values from the submitted form is just a specialized Reader (i.e. a Reader with one of the type parameters fixed to the form type). The applicative responsible for generating form element names is a State. Another two of the applicatives are actually the same applicative, only specialized with different type parameters.

Since many of these are already implemented in FSharpx, I decided to use those implementations instead... After making the necessary changes and getting it to compile, I ran the tests and got a lot of failures. Many of the outputs involving lists were exactly in inverse order! I traced it down to the composition of applicatives, but I couldn't figure out what was wrong.

I'll illustrate with a simple but concrete example. We'll use the Writer applicative. Essentially, the effect of this applicative is appending values with a monoid. Here we'll just accumulate on a list.

This is much simpler to see in code:

let puree x = [],x 
let ap (x1,x2) (f1,f2) = f1 @ x1, f2 x2 

An example using it:

puree (-) |> ap (["a";"b"],3) |> ap (["c";"d"],2)

This evaluates to (["a"; "b"; "c"; "d"], 1) , i.e. it concatenates the lists (the effect) and applies the function to the second value in the tuple.

So far so good. Now let's try to compose this applicative with itself. Composing applicatives is easy: as I explained in a previous article, just lift ap and apply pure to pure:

let lift2 f a b = puree f |> ap a |> ap b

let composedPure x = x |> puree |> puree 
let composedAp x f = lift2 ap x f

Let's see how this works:

composedPure (-) 
|> composedAp (["a"; "b"], ([1; 2], 2)) 
|> composedAp (["c"; "d"], ([3; 4], 3))

which gives us (["c"; "d"; "a"; "b"], ([1; 2; 3; 4], -1))

Uh oh, the outer applicative has its effect flipped! The result should have been (["a"; "b"; "c"; "d"], ([1; 2; 3; 4], -1)) . What went wrong here?

One difference in this code with the Haskell definition of applicative functors is that I flipped the parameters of ap. This allowed us to apply ap with a pipe as is usual in F#. You could also use a forward and a backward pipe to "infixify" a function but it just doesn't look right to me. Even though it compiles and apparently looks correct, this difference broke our applicative composition.

In order to fix the applicative composition and still keep the convenient flipped parameters, we have to change composedAp to:

let flip f a b = f b a

let composedAp x f = flip (lift2 (flip ap)) x f

The question now is: do you really understand why this composedAp is correct, just by looking at its definition, while the previous one would flip one of the applicatives and not the other?

To be honest, I don't. But simple equational reasoning can tell us what went wrong. Let's start with the original (incorrect) definition of composedAp:

  lift2 ap x f 
= puree ap |> ap x |> ap f              // lift2 definition
= ap f (ap x (puree ap))                // |> definition
= ap (f1,f2) (ap (x1,x2) (pap1, pap2))  // expand tuples, apply puree
= ap (f1,f2) (pap1 @ x1, pap2 x2)       // apply inner ap
= pap1 @ x1 @ f1, pap2 x2 f2            // apply outer ap
= [] @ x1 @ f1, ap x2 f2                // apply puree
= x1 @ f1, ap (x21, x22) (f21, f22)     // simplify empty list, expand tuples
= x1 @ f1, (f21 @ x21, f22 x22)         // apply ap

Now the correct composedAp, for comparison:

flip (lift2 (flip ap)) x f 
= flip (fun f a b -> ap b (ap a (puree (flip ap)))) x f     // lift2 definition
= (fun f a b -> ap b (ap a (puree (flip ap)))) f x          // apply flip
= ap x (ap f (puree (flip ap)))                             // apply lambda
= ap (x1,x2) (ap (f1,f2) (pfap1, pfap2))                    // expand tuples, apply puree
= ap (x1,x2) (pfap1 @ f1, pfap2 f2)                         // apply ap
= pfap1 @ f1 @ x1, pfap2 f2 x2                              // apply ap
= [] @ f1 @ x1, (flip ap) f2 x2                             // puree
= f1 @ x1, ap x2 f2                                         // simplify empty list, apply flip
= f1 @ x1, ap (x21, x22) (f21, f22)                         // expand tuples
= f1 @ x1, (f21 @ x21, f22 x22)                             // apply ap

By comparing both you can get a better understanding of why two flips are necessary.

Reasoning like this is a simple but powerful tool. We kinda do it continuously, informally, while writing code, which is usually called "running the program in your head". The absence of side-effects (i.e. pure functional programming) makes it easier to do it formally as well as informally, since you typically need to juggle less stuff in your head.

Monday, June 11, 2012

Fuchu: a functional test library for .NET

In my last two posts I showed how MbUnit supports first-class tests, and how you could use that to build a DSL in F# around it.

I explained how many concepts in typical xUnit frameworks can be more simply expressed when tests are first-class values, which is not the case for most .NET and Java test frameworks.

More concretely, test setup/teardown is a function over a test, and parameterized tests are... just data manipulation.

Since first-class tests greatly simplify things, why not dispense with the typical class-based, attribute-driven approach and build a test library around first-class tests? Well, Haskellers have been doing this for at least 10 years now, with HUnit.

HUnit organizes tests using this tree:

-- | The basic structure used to create an annotated tree of test cases. 
data Test 
    -- | A single, independent test case composed. 
    = TestCase Assertion 
    -- | A set of @Test@s sharing the same level in the hierarchy. 
    | TestList [Test] 
    -- | A name or description for a subtree of the @Test@s. 
    | TestLabel String Test 

Where Assertion is simply an alias for IO (). This is all you need to organize tests in suites and give them names.

We can trivially translate this to F# :

type TestCode = unit -> unit

type Test = 
    | TestCase of TestCode
    | TestList of Test seq
    | TestLabel of string * Test

Let's see an example:

let testA = 
    TestLabel ("testsuite A", TestList 
                         [
                            TestLabel ("test A", TestCase(fun _ -> Assert.AreEqual(4, 2+2)))
                            TestLabel ("test B", TestCase(fun _ -> Assert.AreEqual(8, 4+4)))
                         ])

It's quite verbose, but we can define the same DSL as I defined earlier for MbUnit tests, so this becomes:

let testA = 
    "testsuite A " =>> [
        "test A" => 
            fun _ -> Assert.AreEqual(4, 2+2)
        "test B" =>
            fun _ -> Assert.AreEqual(8, 4+4)
    ]

Actually, I first ported HUnit (including this DSL), then discovered that MbUnit has first-class tests and later wrote the DSL around MbUnit. Everything I described in those posts (setup/teardown as higher-order functions, parameterized tests as simple data manipulation, arbitrary nesting of test suites) applies here in the exact same way.

In fact, MbUnit's class hierarchy of Test/TestSuite/TestCase can be read as the following algebraic data type:

type Test =
| TestSuite of string * Test list
| TestCase of string * Action

which turns out to be very similar to the tree we translated from HUnit, only the names are embedded instead of being a separate case.

I called this HUnit port Fuchu (it doesn't mean anything), it's on github.

Assertions

Fuchu doesn't include any assertion functions, or at least not yet. (EDIT: assertions were added in 0.2.0) It only gives you tools to organize and run tests, but you're free to use NUnit, MbUnit, xUnit, NHamcrest, etc, or more F#-ish solutions like Unquote or FsUnit or NaturalSpec for assertions.

Tighter integration with FsCheck is planned. (EDIT: it was added in the first release of Fuchu)

Runner/Tooling

As with HUnit, the test assembly is the runner itself. That is, as opposed to having an external test runner as with most test frameworks, your test assembly is an executable (a console application). This is because it's more of a library instead of a framework. As a consequence, there is no need of installing any external tool to run tests (just hit CTRL-F5 in Visual Studio) or debug tests (just set your breakpoints and hit F5 in Visual Studio). Here's a clear signal of why this matters:

So how do you run tests with Fuchu? Given a test suite testA like the one defined above, you can run it like this:

[<EntryPoint>]
let main _ = run testA // or runParallel

But this is quite inconvenient, as it's common to split tests among different modules/files and this would mean having to list all tests somewhere, to feed them to the run function. HUnit works around this using Template Haskell, and OUnit (OCaml's port of HUnit) users generate the boilerplate code by parsing the tests' source code.

In .NET we can just decorate the tests with an attribute and then use reflection to fetch them:

[<Tests>]
let testA = 
    "2+2=4" => 
        fun _ -> Assert.AreEqual(4, 2+2)

[<Tests>]
let testB = 
    "2*3=6" => 
        fun _ -> Assert.AreEqual(7, 2*3)
[<EntryPoint>]
let main args = defaultMainThisAssembly args

This function defaultMainThisAssembly does exactly what it says on the tin. Notice that it also takes the command-line args, so if you call it with "/m" it will run the tests in parallel. (Curiously, you can't say let main = defaultMainThisAssembly, it won't be recognized as the entry point).

By the way, this is just an example, you wouldn't normally annotate every single test with the Tests attribute, only the top-level test group per module.

Run this code and you get an output like this:

2*3=6: Failed: 
  Expected: 7
  But was:  6

  G:\prg\Test.fs(15,1): SomeModule.testB@15.Invoke(Unit _arg1)
 (00:00:00.0017681)

2 tests run: 1 passed, 0 ignored, 1 failed, 0 errored (00:00:00.0058780)

If you run this within Visual Studio with F5 you can navigate to the failing assertion by clicking on the line that looks like a mini stack trace.

REPL it!

Since running tests is as easy as saying "run test", it's also convenient sometimes to do so from the F# REPL.

Pros:

  • You can directly load the source code under tests in the REPL, which cuts down compilation times.
  • Easy to cherry-pick one or a few tests to run instead of running all tests (with the provided Test.filter function)

Cons:

  • Having to manually load all dependencies to the tests. It may be possible to work around this using a variant of this script by Gustavo Guerra.
  • If you reference the assembly under test in the REPL, fsi.exe blocks the DLLs, so you have to reset the REPL session to recompile. But if you're testing F# code, you can work around this by loading source code instead of referencing the assembly.

You can see an example of running tests from the REPL here.

Other tools

Integrating with other tools is not simple. Most tools out there seem to assume that tests are organized in classes, and each test corresponds to a method or function. This also happens with MbUnit's StaticTestFactory: for example in ReSharper or TestDriven.Net you can't single out tests. Still, they can be made to run let-bound tests (which may be a test suite), so it should be possible to have some support within this limitation.

Also, no immediate support for any continuous test runner. I checked with Greg Young, he tells me that MightyMoose/AutoTest.NET can be configured to use an arbitrary executable (with limitations). Remco Mulder, of NCrunch, suggested wrapping the test runner in a test from a known test framework as a workaround. Maybe executing the tests after compilation (with a simple AfterBuild msbuild target) is enough. I haven't looked into this yet.

Coverage tools should have no problem, it makes no difference where the executable comes from.

Build tools should have no issues either; obviously FAKE is the more direct option, but I see no problems integrating this with other build tools.

C# support

I threw in a few wrapper functions to make this library usable in C# / VB.NET. Of course, it will never be as concise as F#, but still usable. I'm not going to fully explain this (it's just boring sugar) but you can see an example here.

NUnit/MbUnit test loading

Even though it may seem very different, Fuchu is still built on xUnit concepts. And since tests are first-class values, it's very easy to map tests written with other xUnit framework to Fuchu tests. For example, building Fuchu tests from NUnit test classes takes less than 100 LoC (it's already built into Fuchu)

This lets you use Fuchu as a runner for existing tests, and to write new tests. I'm planning to use this soon in SolrNet to replace Gallio (for example, Gallio doesn't work on Mono).

There is a limitation here: Fuchu can't express TestFixtureTearDowns. It can do TestFixtureSetups (and obviously SetUp/TearDown, as explained in previous posts), but not TestFixtureTearDowns (or at least not unless you treat that test suite separately). Give it a try and see for yourself :) . Is it a real downside? I don't think so (for example, TestFixtureTearDowns make parallelization harder), but it's something to be aware of. Also I haven't looked into test inheritance yet, but it should be pretty easy to support it.

Conclusions

Does .NET really need yet another test framework? Absolutely not. The current test frameworks are "good enough" and hugely popular. But since they don't treat tests as first-class values, extending them results in more and more complexity. Consider the lifecycle of a test in a typical unit testing framework. Inheritance and multiple marker attributes make it so complex that it reminds me of the ASP.NET page lifecycle

What I propose with Fuchu is a hopefully simpler, no-magic model. Remember KISS?

Code is here. Binaries on NuGet.

Friday, May 11, 2012

An F# DSL for MbUnit

In F# we typically organize tests much like in C# or VB.NET: writing functions marked with a [<Test>] attribute or similar. Actually there's a slight advantage in F#: you don't need to write a class marked as test fixture, you can directly write the tests as let-bound functions. Still, it's fundamentally the same model. (If you're into BDD there's also TickSpec as an alternative model).

Since it's the same model, you get the same issues I described in my last post, and then some: for example as Kurt explains, attributes in F# sometimes aren't treated exactly as in C#.

Also in my last post, I wrote about how MbUnit supports first-class tests as an alternative to attribute-defined tests. In F# we can take advantage of this and custom operators to build a very concise DSL to define tests.

First let's see a small test suite with setup/teardown, written with the classic attributes:

[<TestFixture>]
type ``MemoryStream tests``() =
    let mutable ms : MemoryStream = null

    [<SetUp>]
    member x.Setup() =
        ms <- new MemoryStream()

    [<TearDown>]
    member x.Teardown() =
        ms.Dispose()

    [<Test>]
    member x.``Can read``() =
        Assert.IsTrue ms.CanRead

    [<Test>]
    member x.``Can write``() =
        Assert.IsTrue ms.CanWrite

Looks simple enough, right? And yet, the mutable field is a smell, or at least an indicator that this isn't functional. Let's try to get rid of that mutable.

As a first step we'll rewrite this as first-class tests, that is, using [<StaticTestFactory>] as shown in my last post:

[<StaticTestFactory>]
let testFactory() =
    let suite = TestSuite("MemoryStream tests")
    let ms : MemoryStream ref = ref null
    suite.SetUp <- fun () -> ms := new MemoryStream()
    suite.TearDown <- fun () -> (!ms).Dispose()
    let tests = [
        TestCase("Can read", 
            fun () -> Assert.IsTrue (!ms).CanRead)
        TestCase("Can write", 
            fun () -> Assert.IsTrue (!ms).CanWrite)
    ]
    Seq.iter suite.Children.Add tests
    [suite]

Oh great, that's even uglier than what we started with! And we have replaced the mutable field with a ref cell, not much of an improvement. 
But bear with me, we have first-class tests now so there's a lot a room for improvement.

In order to keep refactoring this, we need to realize that the problem is that our test cases should be functions MemoryStream -> unit instead of unit -> unit. That way, they wouldn't have to depend on an external MemoryStream instance; instead the instance would be pushed somehow to the test. Let's write that:

let tests = [
    "Can read", (fun (ms: MemoryStream) -> Assert.IsTrue ms.CanRead)
    "Can write", (fun ms -> Assert.IsTrue ms.CanWrite)
]

Now we have this list of strings and MemoryStream -> unit functions. What we need now is to turn these functions into unit -> unit so we can ultimately build TestCases.

In other words, we need a function (MemoryStream -> unit) -> (unit -> unit). This function should create the MemoryStream, pass it to our test function, then dispose the MemoryStream. Hey, what do you know, turns out that's just what SetUp and TearDown do!

Still with me? It's much easier to see this in code:

let withMemoryStream f () = 
    use ms = new MemoryStream()
    f ms

Now we apply this to our list, building the TestCases and then the TestSuite:

[<StaticTestFactory>]
let testFactory() =
    let suite = TestSuite("MemoryStream tests")
    tests 
    |> Seq.map (fun (n,t) -> TestCase(n, Gallio.Common.Action(withMemoryStream t)))
    |> Seq.iter suite.Children.Add
    [suite]

We've eliminated all mutable references, and also replaced SetUp/TearDown with a simple higher-order function.

But we can do still better, in terms of readability. We can define a few custom operators to hide the TestSuite and TestCase constructors:

let inline (=>>) name tests =
    let suite = TestSuite(name)
    Seq.iter suite.Children.Add tests
    suite :> Test

let inline (=>) name (test: unit -> unit) =
    TestCase(name, Gallio.Common.Action test) :> Test

[<StaticTestFactory>]
let testFactory() =
    [
        "MemoryStream tests" =>> [
            "Can read" => 
                withMemoryStream(fun ms -> Assert.IsTrue ms.CanRead)
            "Can write" => 
                withMemoryStream(fun ms -> Assert.IsTrue ms.CanWrite)
        ]
    ]

And with a couple more operators we get rid of the duplicate call to withMemoryStream:

let inline (+>) f =
     Seq.map (fun (name, partialTest) ->
                    name => f partialTest)

let inline (==>) (name: string) test = name,test

[<StaticTestFactory>]
let testFactory() =
    [
        "MemoryStream tests" =>>
            withMemoryStream +> [
                "Can read" ==> 
                    fun ms -> Assert.IsTrue ms.CanRead
                "Can write" ==>
                    fun ms -> Assert.IsTrue ms.CanWrite
            ]
    ]

Conclusions

Confused about all those kinds of arrows? The good thing about first-class tests is that you can build them any way you want, no need to use these operators if you don't like them. That's also precisely one of its downsides: as there is no fixed idiom, it can get harder to read compared to attribute-based test definitions, where there is a single, well-defined way to do things.

In my last post I showed how first-class tests practically eliminate the concept of parameterized tests. In this post I showed how they eliminate the concept of setup/teardown, replacing them with a higher-order function, a more generic concept.

More generally, I'd say that whatever domain you're modeling (in this case, tests), there is much to gain if the core concepts are representable as first-class values. It should also be noted that different languages have very different notions of what language objects are first-class values. Some are more flexible than others, but that doesn't imply any superiority by itself. However it does mean that if you're not aware of this you'll probably misuse your language and end up with ever more complex workarounds to manipulate your domain objects as values. Nice APIs, conventions, configuration, etc, are all secondary and can be built much more easily on top of composable, first-class building blocks.

But I digress. In the next post I'll show a simple testing library built around tests as first-class values and more pros/cons about this approach.

Tuesday, May 8, 2012

First-class tests in MbUnit

Originally, xUnit style testing frameworks used inheritance to define tests. SUnit, the original xUnit framework, builds test cases by inheriting the TestCase class. NUnit 1.0 and JUnit derived from SUnit and also used inheritance. Fast-forward to today, unit testing frameworks in .NET and Java typically organize tests using attributes/annotations instead.

For a few years now, MbUnit has been able to define tests programmatically as an alternative, though it seems this feature isn't used much. Let's compare attributes vs programmatic tests with a simple example in C#:

Attributes

[TestFixture]
public class TestFixture {
    [Test]
    public void Test() {
        Assert.AreEqual(4, 2 + 2);
    }

    [Test]
    public void AnotherTest() {
        Assert.AreEqual(8, 4 + 4);
    }
}

Programmatic

public class TestFixture {
    [StaticTestFactory]
    public static IEnumerable<Test> Tests() {
        yield return new TestCase("Test", () => {
            Assert.AreEqual(4, 2 + 2);
        });

        yield return new TestCase("Another test", () => {
            Assert.AreEqual(8, 4 + 4);
        });
    }
}

At first blush, declaring tests programmatically is more verbose and complex. However, the real difference is that these tests are first-class values. It becomes more clear why this matters with an example of parameterized tests:

Attributes

[TestFixture]
public class TestFixture {
    [Test]
    [Factory("Parameters")]
    public void Parse(string input, DateTime expectedOutput) {
        var r = DateTime.ParseExact(input, "yyyy-MM-dd'T'HH:mm:ss.FFF'Z'", CultureInfo.InvariantCulture);
        Assert.AreEqual(expectedOutput, r);
    }

    IEnumerable<object[]> Parameters() {
        yield return new object[] { "1-01-01T00:00:00Z", new DateTime(1, 1, 1) };
        yield return new object[] { "2004-11-02T04:05:20Z", new DateTime(2004, 11, 2, 4, 5, 20) };
    }
}

Programmatic

public class TestFixture {
    [StaticTestFactory]
    public static IEnumerable<Test> Tests() {
        var parameters = new[] {
            new { input = "1-01-01T00:00:00Z", expectedOutput = new DateTime(1, 1, 1) },
            new { input = "2004-11-02T04:05:20Z", expectedOutput = new DateTime(2004, 11, 2, 4, 5, 20) },
        };

        return parameters.Select(p => new TestCase("Parse " + p.input, () => {
            var r = DateTime.ParseExact(p.input, "yyyy-MM-dd'T'HH:mm:ss.FFF'Z'", CultureInfo.InvariantCulture);
            Assert.AreEqual(p.expectedOutput, r);
        }));
    }
}

Programmatically, we just wrote the parameters and tests in a direct style. With attributes, not only we lost the types but also it's more complicated: you have to know  (or look up in the documentation) that you need a [Factory] attribute, that its string parameter indicates the method name that contains the test parameters, and the format for the parameters (e.g. can they be represented as a property? As a field? Can it be private? Static? Can it be a non-generic IEnumerable? An ArrayList[]?). Fortunately, MbUnit is quite flexible about it. Yet it doesn't handle an ArrayList[].

Something similar happens with JUnit and TestNG. Actually JUnit did have something close to first-class tests with its inheritance API.

With programmatic tests, you simply return a list of tests, there's no magic about it. It doesn't matter how they're built, they can be parameterized or not, all you have to know is [StaticTestFactory] public static IEnumerable<Test> Tests() . If they're parameterized, it doesn't matter what kind of parameters they are. Actually, the very concept of "parameterized tests" simply disappears.

With attributes, you may have tried to use [Row] first, only to have the compiler remind you that attribute parameter types are very limited and you can't have a DateTime. Or a function. Or even a decimal. The testing framework gets in the way. Attributes are just not the right tool to model this.

With programmatic tests, you are in control, not the testing framework. It becomes more of a library rather than a framework. Things are conceptually simpler.

What about SetUp and TearDown? Don't worry, MbUnit supports them directly as properties of TestSuite. However, as we'll see in the next post, they're not really necessary. We'll also see a few other pros/cons first-class tests have.

I'll leave you with this quote from the twitter-fake Alain de Botton:

Wednesday, May 2, 2012

Moroco: a minimal mocking library for C# / VB.NET

The more I learn about functional programming, the more I come to question many widely used and accepted practices in mainstream programming.

This time it's the turn of mocks and mocking libraries.

First, since there are so many different definitions for stubs, mocks, fakes, etc, here's my own definition of a mock: an entity (in an object-oriented language, usually an object) used to test an interaction between the entity under test and an external entity (again, in OO languages, these entities are objects).

So mocks are used for interaction-based testing which means testing for side-effects. The original paper on mock objects says this explicitly: "Test code should communicate its intent as simply and clearly as possible. This can be difficult if a test has to set up domain state or the domain code causes side effects."

Side effects are code smells, or more precisely, they should be few and isolated. Even the creators of mock object say that side effects make testing difficult! We should minimize the need for mocking. Code without side-effect doesn't need mocks. You still may need stubs or fakes, but those should be trivial to build.

Quoting Daniel Cazzulino (author of Moq): "The sole presence of a 'Verify' method on the mock is a smell to me, one that will slowly get you into testing the interactions as opposed to testing the observable state changes caused by a particular behavior.". Make those states immutable (i.e. a state change create a new state) and you're half-way to side-effect-free code.

Remember that discussion a few years ago where people said that Typemock was too powerful? The argument was that Typemock "doesn't force you to write testable code". What this really means is "it doesn't make you isolate side-effects". I'm thinking that all current mocking libraries are actually too powerful: instead of making you think how to write pure (side-effect free) code, it encourages you to just use a mock to replace your impure code with pure code in tests.

There's also the matter of library complexity. .NET mocking libraries are big and complex: Moq: 17000 LoC, NSubstitute: 12000 LoC, Rhino Mocks: 87000 LoC, FakeItEasy: 17000 LoC. Not counting the embedded runtime proxy library (usually Castle DynamicProxy).

Many have issues running mocks in parallel (1, 2, 3, 4). It's 2012 and most developers have at least a 4-core workstation, using a mock library that limits my ability to run tests in parallel is getting ridiculous.

In summary, I think mock libraries support an undesirable practice, are not worth their code and have to go, except for very specific scenarios. But I still have a lot of existing side-effecting code I have to test, I can't just wish it away. Refactoring to pure code is not trivial. So I decided that

To which I got an encouraging reply:

By the way I'm not the first or the only one that thinks that mocking libraries aren't worth it. Uncle Bob also prefers manual mocking (even in Java, which is much more verbose than any .NET language), though he mostly stresses the argument of simplicity.

In F# it's easy to do manual mocking thanks to object expressions (1, 2), so you don't have to actually create a new class for each mock. In C#/VB.NET we're not so lucky but we can get 80% there with a little boilerplate, making a "semi-manual", reusable mock class with settable Funcs to define behavior. Example:

interface ISomething {
    void DoSomething(int a);
}

class MockSomething: ISomething {
    public Action<int> doSomething;

    public void DoSomething(int a) {
        doSomething(a);
    }
}

class Test {
    void test() {
        var r = new List<int>();
        var s = new MockSomething {
            doSomething = r.Add
        };
        // etc
    }
}

This isn't anything new, people have been doing this for years. Downsides: doesn't play well with overloaded and generic methods, but still works.

Also, as Uncle Bob explains, manual mocks are prone to break whenever you change the mocked interface, but if you hit this often it could be revealing you that perhaps you should have used an abstract base class instead.

I added some code to track the call count of a Func and named the resulting library Moroco. So I wanted to get away from mocking libraries and ended up writing one, talk about hypocrisy! The difference between Moroco and other mocking libraries is that it's really minimal: less than 400 lines of pretty trivial code, fitting in a single file, with no dependencies. And I'm still against mocks: I get to count mocks to measure mock smell. And run tests in parallel.

I'm already using it in SolrNet, where I simply dropped Moroco's source code in the test project and replaced Rhino.Mocks in all tests. Here's the 'before vs after' of one of the tests:

Rhino.Mocks

[Test]
public void Extract() {
    var mocks = new MockRepository();
    var connection = mocks.StrictMock<ISolrConnection>();
    var extractResponseParser = mocks.StrictMock<ISolrExtractResponseParser>();
    var docSerializer = new SolrDocumentSerializer<TestDocumentWithoutUniqueKey>(new AttributesMappingManager(), new DefaultFieldSerializer());
    var parameters = new ExtractParameters(null, "1", "test.doc");
    With.Mocks(mocks)
        .Expecting(() => {
            Expect.On(connection)
                .Call(connection.PostStream("/update/extract", null, parameters.Content, new List<KeyValuePair<string, string>> {
                    new KeyValuePair<string, string>("literal.id", parameters.Id),
                    new KeyValuePair<string, string>("resource.name", parameters.ResourceName),
                }))
                .Repeat.Once()
                .Return(EmbeddedResource.GetEmbeddedString(GetType(), "Resources.responseWithExtractContent.xml"));
            Expect.On(extractResponseParser)
                .Call(extractResponseParser.Parse(null))
                .IgnoreArguments()
                .Return(new ExtractResponse(null));
        })
        .Verify(() => {
            var ops = new SolrBasicServer<TestDocumentWithoutUniqueKey>(connection, null, docSerializer, null, null, null, null, extractResponseParser);
            ops.Extract(parameters);
        });
}


Moroco

[Test]
public void Extract() {
    var parameters = new ExtractParameters(null, "1", "test.doc");
    var connection = new MSolrConnection();
    connection.postStream += (url, contentType, content, param) => {
        Assert.AreEqual("/update/extract", url);
        Assert.AreEqual(parameters.Content, content);
        var expectedParams = new[] {
            KV.Create("literal.id", parameters.Id),
            KV.Create("resource.name", parameters.ResourceName),
        };
        Assert.AreElementsEqualIgnoringOrder(expectedParams, param);
        return EmbeddedResource.GetEmbeddedString(GetType(), "Resources.responseWithExtractContent.xml");
    };
    var docSerializer = new SolrDocumentSerializer<TestDocumentWithoutUniqueKey>(new AttributesMappingManager(), new DefaultFieldSerializer());
    var extractResponseParser = new MSolrExtractResponseParser {
        parse = _ => new ExtractResponse(null)
    };
    var ops = new SolrBasicServer<TestDocumentWithoutUniqueKey>(connection, null, docSerializer, null, null, null, null, extractResponseParser);
    ops.Extract(parameters);
    Assert.AreEqual(1, connection.postStream.Calls);
}

Yes, I do realize this uses an old Rhino.Mocks API, but it's necessary to get thread-safety.

Conclusion

No, I don't expect anyone to use Moroco instead of Moq, Rhino.Mocks, etc. Yes, I know this means more code (though it's not as much as you might think), and I agree that .NET needs another mock library like I need a hole in my head. But I think we should think twice before using a mock and see if we can find a side-effect-free alternative for some piece of code.
Even when you do use mocks, don't just blindly reach for a mocking library. Consider the trade-offs.

Thursday, March 29, 2012

An example of applicative validation in FSharpx

I recently found a nice example of applicative functor validation in Scala (using Scalaz) by Chris Marshall, and decided to port it to F# and C# using FSharpx.

I blogged about applicative functor validation before, in F# and in C#.

When trying to port the Scala code to F# I found there were a few missing general functions in FSharpx, notably sequence and mapM. These are one- or two-liners, I ported them from Haskell, as it's syntactically closer to F# than Scala. Hoogle is always a big help for this.

Here is the original code in Scala; here's the F# port and here's the C# port.
I'm not going to copy it here: it's 160 lines of F# and 250 lines of C#.

This example also makes for a nice comparison of these three languages (or four, if you count the implicit presence of Haskell). There are a few little differences in the ports, it's not a literal translation, but you can still see how Scala, being semantically closer to Haskell than either F# or C#, achieves more generality. As for type inference, the F# version requires almost no type annotations, while C# needs the most type annotations, and Scala is somewhere in the middle. This actually depends on what you consider a type annotation.

I chose to make Person immutable in C# to reflect more accurately the equivalent F# and Scala code, but it's not really instrumental to this example. Still, it shows how verbose it is to create a truly immutable class in C#. The C# dev team at Microsoft seems to highly value immutability, so I still have hopes that a future version of C# will improve this situation.

The ability to define custom operators in Scala and F#, like <!> or *> (an ability that C# lacks) also makes it easier to work with different ways of composing functions. FSharpx also offers 'named' versions for many of these operators, for example <!> is simply 'map' and <*> is 'ap'. Despite what some people say, I think custom operators enable better readability once you know the concepts behind them. Remember that at some point you also learned what '=', '%' and '+' mean.

In particular, the F# port shows the Kleisli composition operator >=> which I haven't seen mentioned in F# before. This operator is like the regular function composition operator >> except it works for monadic functions a -> m b. Compare the signatures for >> and >=> for Option:

(>>) : ('a -> 'b) -> ('b -> 'c) -> 'a -> 'c (>=>) : ('a -> 'b option) -> ('b -> 'c option) -> 'a -> 'c option

I'm quite pleased with the results of this port, even if I do say so myself. This example shows again that many higher concepts in functional programming commonly applied in Haskell are applicable, useful and usable in F# and even in C#. The lack of typeclasses and type constructor abstraction in .NET means some code duplication (mapM for example has to be defined for each monad), but this duplication is on the side of library code in many cases, and so client code isn't that badly affected.

Homework: port this example to Gustavo's fork of FSharpx.

Friday, March 2, 2012

Wrapping visitors with active patterns in F#

In the last post I showed how to interop F# and C# algebraic data types (ADT).

In C# you'll typically have ADTs or ADT-like structures expressed as a hierarchy of classes and the Visitor pattern to traverse/deconstruct them.

So the question is: what's the best way to use C#/VB.NET visitors in F#?

As an example, let's borrow a C# - F# comparison from Stackoverflow by Juliet Rosenthal, which models boolean operations:

public interface IExprVisitor<out T> {
    T Visit(TrueExpr expr);
    T Visit(And expr);
    T Visit(Nand expr);
    T Visit(Or expr);
    T Visit(Xor expr);
    T Visit(Not expr);
}

public abstract class Expr {
    public abstract t Accept<t>(IExprVisitor<t> visitor);
}

public abstract class UnaryOp : Expr {
    public Expr First { get; private set; }

    public UnaryOp(Expr first) {
        First = first;
    }
}

public abstract class BinExpr : Expr {
    public Expr First { get; private set; }
    public Expr Second { get; private set; }

    public BinExpr(Expr first, Expr second) {
        First = first;
        Second = second;
    }
}

public class TrueExpr : Expr {
    public override t Accept<t>(IExprVisitor<t> visitor) {
        return visitor.Visit(this);
    }
}

public class And : BinExpr {
    public And(Expr first, Expr second) : base(first, second) {}

    public override t Accept<t>(IExprVisitor<t> visitor) {
        return visitor.Visit(this);
    }
}

public class Nand : BinExpr {
    public Nand(Expr first, Expr second) : base(first, second) {}

    public override t Accept<t>(IExprVisitor<t> visitor) {
        return visitor.Visit(this);
    }
}

public class Or : BinExpr {
    public Or(Expr first, Expr second) : base(first, second) {}

    public override t Accept<t>(IExprVisitor<t> visitor) {
        return visitor.Visit(this);
    }
}

public class Xor : BinExpr {
    public Xor(Expr first, Expr second) : base(first, second) {}

    public override t Accept<t>(IExprVisitor<t> visitor) {
        return visitor.Visit(this);
    }
}

public class Not : UnaryOp {
    public Not(Expr first) : base(first) {}

    public override t Accept<t>(IExprVisitor<t> visitor) {
        return visitor.Visit(this);
    }
}

Let's say we want to write a function in F# to evaluate a boolean expression using these classes. We could use an object expression to create an inline visitor:

let rec eval (e: Expr) =
    e.Accept { new IExprVisitor<bool> with
                member x.Visit(e: TrueExpr) = true
                member x.Visit(e: And)      = eval(e.First) && eval(e.Second)
                member x.Visit(e: Nand)     = not(eval(e.First) && eval(e.Second))
                member x.Visit(e: Or)       = eval(e.First) || eval(e.Second)
                member x.Visit(e: Xor)      = eval(e.First) <> eval(e.Second)
                member x.Visit(e: Not)      = not(eval(e.First)) }

This is already more concise than the equivalent C# code, but we can do better. Once again, active patterns are the key. We only need one visitor to do the required plumbing:

module Expr = 
    open DiscUnionInteropCS

    type ExprChoice = Choice<unit, Expr * Expr, Expr * Expr, Expr * Expr, Expr * Expr, Expr>

    let private visitor = 
        { new IExprVisitor<ExprChoice> with
            member x.Visit(e: TrueExpr): ExprChoice = Choice1Of6 ()
            member x.Visit(e: And):      ExprChoice = Choice2Of6 (e.First, e.Second)
            member x.Visit(e: Nand):     ExprChoice = Choice3Of6 (e.First, e.Second)
            member x.Visit(e: Or):       ExprChoice = Choice4Of6 (e.First, e.Second)
            member x.Visit(e: Xor):      ExprChoice = Choice5Of6 (e.First, e.Second)
            member x.Visit(e: Not):      ExprChoice = Choice6Of6 e.First }

    let (|True|And|Nand|Or|Xor|Not|) (e: Expr) =
        e.Accept visitor

And now we can write eval more idiomatically and more concise:

let rec eval = 
    function
    | True          -> true
    | And(e1, e2)   -> eval(e1) && eval(e2)
    | Nand(e1, e2)  -> not(eval(e1) && eval(e2))
    | Or(e1, e2)    -> eval(e1) || eval(e2)
    | Xor(e1, e2)   -> eval(e1) <> eval(e2)
    | Not(e1)       -> not(eval(e1))

This also opens the doors for more complex pattern matching. See Juliet's post for an example.

Thursday, March 1, 2012

Algebraic data type interop: F# - C#

In a previous post I wrote about encoding algebraic data types in C#. Now let's explore the interoperability issues that arise when defining and consuming algebraic data types (ADTs) cross-language in C# and F#. More concretely, let's analyze construction and deconstruction of an ADT and how to keep operations as idiomatic as possible while also retaining type safety.

Defining an ADT in F# and consuming it in C#

In F#, ADTs are called discriminated unions. The first thing I should mention is that the F# component design guidelines recommend hiding discriminated unions as part of a general .NET API. I prefer to interpret it like this: if you can hide it with minor consequences, or you have stringent binary backwards compatibility requirements, or you foresee it changing a lot, hide it. Otherwise I wouldn't worry much.

Let's use this simple discriminated union as example:

type Shape =
| Circle of float
| Rectangle of float * float

Construction in C# is pretty straightforward: F# exposes static methods NewCircle and NewRectangle:

var circle = Shape.NewCircle(23.77);
var rectangle = Shape.NewRectangle(1.5, 2.2);

No, you can't use constructors directly to instantiate Circle or Rectangle, F# compiles these constructors as internal. No big deal really.

Deconstruction, however, is a problem here. C# doesn't have pattern matching, but as I showed in the previous article you can simulate this with a Match() method like this:

static class ShapeExtensions {
    public static T Match<T>(this Shape shape, Func<double, T> circle, Func<double, double, T> rectangle) {
        if (shape is Shape.Circle) {
            var x = (Shape.Circle)shape;
            return circle(x.Item);
        }
        var y = (Shape.Rectangle)shape;
        return rectangle(y.Item1, y.Item2);
    }
}

Here we did it as an extension method in the consumer side of things (C#). The problem with this is, if we add another case to Shape (say, Triangle), this will still compile successfully without even a warning, but fail at runtime, instead of failing at compile-time as it should!

It's best to define this in F# where we can take advantage of exhaustively-checked pattern matching, either as a regular instance member of Shape or as an extension member:

[<Extension>]
type Shape with
    [<Extension>]
    static member Match(shape, circle: Func<_,_>, rectangle: Func<_,_,_>) =
        match shape with
        | Circle x -> circle.Invoke x
        | Rectangle (x,y) -> rectangle.Invoke(x,y)

This is how we do it in FSharpx to work with Option and Choice in C#.

Defining an ADT in C# and consuming it in F#

Defining an ADT in C# is already explained in my previous post. But how does this encoding behave when used in F#?

To recap, the C# code we used is:

namespace DiscUnionInteropCS {
    public abstract class Shape {
        private Shape() {}

        public sealed class Circle : Shape {
            public readonly double Radius;

            public Circle(double radius) {
                Radius = radius;
            }
        }

        public sealed class Rectangle : Shape {
            public readonly double Height;
            public readonly double Width;

            public Rectangle(double height, double width) {
                Height = height;
                Width = width;
            }
        }

        public T Match<T>(Func<double, T> circle, Func<double, double, T> rectangle) {
            if (this is Circle) {
                var x = (Circle) this;
                return circle(x.Radius);
            }
            var y = (Rectangle) this;
            return rectangle(y.Height, y.Width);
        }
    }
}

Just as before, let's analyze construction first. We could use constructors:

let shape = Shape.Circle 2.0

which looks like a regular F# discriminated union construction with required qualified access. There are however two problems with this:

  1. Object constructors in F# are not first-class functions. Try to use function composition (>>) or piping (|>) with an object constructor. It doesn't compile. On the other hand, discriminated union constructors in F# are first-class functions.
  2. Concrete case types lead to unnecessary upcasts. shape here is of type Circle, not Shape. This isn't much of a problem in C# because it upcasts automatically, but F# doesn't, and so a function that returns Shape would require an upcast.

Because of this, it's best to wrap constructors:

let inline Circle x = Shape.Circle x :> Shape
let inline Rectangle (a,b) = Shape.Rectangle(a,b) :> Shape

Let's see deconstruction now. In F# this obviously means pattern matching. We want to be able to write this:

let area =
    match shape with
    | Circle radius -> System.Math.PI * radius * radius
    | Rectangle (h, w) -> h * w

We can achieve this with a simple active pattern that wraps the Match method:

let inline (|Circle|Rectangle|) (s: Shape) =
    s.Match(circle = (fun x -> Choice1Of2 x),
            rectangle = (fun x y -> Choice2Of2 (x,y)))

For convenience, put this all in a module:

module Shape =
    open DiscUnionInteropCS

    let inline Circle x = Shape.Circle x :> Shape
    let inline Rectangle (a,b) = Shape.Rectangle(a,b) :> Shape
    let inline (|Circle|Rectangle|) (s: Shape) =
        s.Match(circle = (fun x -> Choice1Of2 x),
                rectangle = (fun x y -> Choice2Of2 (x,y)))

So with a little boilerplate you can have ADTs defined in C# behaving just like in F# (modulo pretty-printing, comparison, etc, but that's up the C# implementation if needed). No need to to define a separate, isomorphic ADT.

Note that pattern matching on the concrete type of a Shape would easily break, just like when we defined the ADT in F# with Match in C#. By using the original Match, if the original definition is modified, Match() will change and so the active pattern will break accordingly at compile-time. If you need binary backwards compatibility however, it's going to be more complex than this.

In the next post I'll show an example of a common variant of this.

By the way it would be interesting to see how ADTs in Boo and Nemerle interop with F# and C#.

Thursday, February 23, 2012

Static upcast in C#

I was rather surprised to realize only recently, after using C# for so many years, that it doesn't have a proper static upcast operator. By "static upcast operator" I mean a built-in language operator or a function that upcasts with a static (i.e. compile-time) check.

C# actually does implicit upcasting and most people probably don't even realize it. Consider this simple example:

Stream Fun() {
    return new MemoryStream();
}

Whereas in F# we have to do this upcast explicitly, or we get a compile-time error:

let Fun () : Stream = 
    upcast new MemoryStream()

The reason being that type inference is problematic in the face of subtyping [1].

Now how does this interact with parametric polymorphism (generics)?

C# 4.0 introduced variant interfaces, so we can write:

IEnumerable<IEnumerable<Stream>> Fun() {
    return new List<List<MemoryStream>>();
}

Note that covariance is not implicit upcasting: List<List<MemoryStream>> is not a subtype of IEnumerable<IEnumerable<Stream>>.

But this doesn't compile in C# 3.0, requiring conversions instead. When the supertypes are invariant we have to start converting. Even in C# 4.0 if you target .NET 3.5 the above snippet does not compile because System.Collections.Generic.IEnumerable<T> isn't covariant in T. And even in C# 4.0 targeting .NET 4.0 this doesn't compile:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>();
} 

because ICollection<T> isn't covariant in T. It's not covariant for good reason: it contains mutators (i.e. methods that mutate the object implementing the interface), so making it covariant would make the type system unsound (actually, this already happens in C# and Java) [2][3].

A programmer new to C# might try the following to appease the compiler (ReSharper suggests this so it must be ok? UPDATE: I submitted this bug and ReSharper fixed it.):

ICollection<ICollection<Stream>> Fun() {
    return (ICollection<ICollection<Stream>>)new List<List<MemoryStream>>();
}

(attempt #1)

It compiles! But upon running the program, our C# learner is greeted with an InvalidCastException.

The second suggestion on ReSharper says "safely cast as...":

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>() as ICollection<ICollection<Stream>>;
}

(attempt #2)

And sure enough, it's safe since it doesn't throw, but all he gets is a null.

So our hypothetical developer googles a bit and learns about Enumerable.Cast<T>(), so he tries:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>>()
        .Cast<ICollection<Stream>>().ToList();
}

(attempt #3)

Yay, no errors! Ok, let's add elements to this list:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
        .Cast<ICollection<Stream>>().ToList();
}

(attempt #4)

Oh my, InvalidCastException is back...

Determined to make this work, he learns a bit more about LINQ and gets this to compile:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
    .Select(x => (ICollection<Stream>)x).ToList();
}

(attempt #5)

But gets another InvalidCastException. He forgot to convert the inner list! He tries again:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
        .Select(x => (ICollection<Stream>)x.Select(y => (Stream)y).ToList()).ToList();
}

(attempt #6)

This (finally!) works as expected.

Experienced C# programmers are probably laughing now at these obvious mistakes, but there are two non-trivial lessons to learn here:

  1. Avoid applying Enumerable.Cast<T>() to IEnumerable<U> (for T,U != object). Indeed, Enumerable.Cast<T>() is the source of many confusions, even unrelated to subtyping [4] [5] [6] [7] [8], and yet often poorly advised [9] [10] [11] [12] [13] [14] since it's essentially not type-safe. Cast<T>() will happily try to cast any type into any other type without any compiler check.
    Other than bringing a non-generic IEnumerable into an IEnumerable<T>, I don't think there's any reason to use Cast<T>() on an IEnumerable<U>.
    The same argument can be applied to OfType<T>().
  2. It's easy to get casting wrong (not as easy as in C, but still), particularly when working with complex types (where the definition of 'complex' depends on each programmer), when the compiler checks aren't strict enough (here's a scenario that justifies why C# allows seemingly 'wrong' casts as in attempt #5).

Note how in attempt #6 the conversion involves three upcasts:

  • MemoryStream -> Stream (explicit through casting)
  • List<Stream> -> ICollection<Stream> (explicit through casting)
  • List<ICollection<Stream>> -> ICollection<ICollection<Stream>> (implicit)

What we could use here is a static upcast operator, a function that only does upcasts and no other kind of potentially unsafe casts, that doesn't let us screw things up no matter what types we feed it. It should catch any invalid upcast at compile-time. But as I said at the beginning of the post, this doesn't exist in C#. It's easily doable though:

static U Upcast<T, U>(this T o) where T : U {
    return o;
}

With this we can write:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> { 
        new List<MemoryStream> { 
            new MemoryStream(), 
        } 
    }
    .Select(x => x.Select(y => y.Upcast<MemoryStream, Stream>()).ToList().Upcast<List<Stream>, ICollection<Stream>>()).ToList();
}

You may object that this is awfully verbose. Maybe so, but you can't screw this up no matter what types you change. The verbosity stems from the lack of type inference in C#. You may also want to lift this to operate on IEnumerables to make it a bit shorter, e.g:

static IEnumerable<U> SelectUpcast<T, U>(this IEnumerable<T> o) where T : U {
    return o.Select(x => x.Upcast<T, U>());
}
ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> {
        new List<MemoryStream> {
            new MemoryStream(),
        }
    }
    .Select(x => x.SelectUpcast<Stream, Stream>().ToList().Upcast<List<Stream>, ICollection<Stream>>()).ToList();
}

Alternatively, we could have used explicitly typed variables to avoid casts:

ICollection<ICollection<Stream>> Fun() {
    return new List<List<MemoryStream>> {
        new List<MemoryStream> {
            new MemoryStream(),
        }
    }
    .Select(x => {
        ICollection<Stream> l = x.Select((Stream s) => s).ToList();
        return l;
    }).ToList();
}

I mentioned before that F# has a static upcast operator (actually two, one explicit/coercing and one inferencing operator). Here's what the same Fun() looks like in F#:

let Fun(): ICollection<ICollection<Stream>> = 
    List [ List [ new MemoryStream() ]]
    |> Seq.map (fun x -> List (Seq.map (fun s -> s :> Stream) x) :> ICollection<_>)
    |> Enumerable.ToList
    |> fun x -> upcast x

Now if you excuse me, I have to go replace a bunch of casts... ;-)

References