Wednesday, December 29, 2010

Zipping with applicative functors in F#

In my last post I briefly described functors from a F# perspective. Now it's the turn of applicative functors. Since there were few to none concrete examples in my previous post, this time I'll start with an example. Hopefully everyone remembers Seq.zip. Remember also Seq.zip3 ? Just to recap, they combine sequences into sequences of pairs and triples respectively. The signatures are:

Seq.zip : seq<'a> -> seq<'b> -> seq<'a * 'b>
Seq.zip3 : seq<'a> -> seq<'b> -> seq<'c> -> seq<'a * 'b * 'c>

Now let's say we want zip5:

Seq.zip5 : seq<'a> -> seq<'b> -> seq<'c> -> seq<'d> -> seq<'e> -> seq<'a * 'b * 'c * 'd * 'e>

We could implement it with a nested zip3 and a flattening map:

let zip5 a b c d e = 
    Seq.zip3 a b (Seq.zip3 c d e) |> Seq.map (fun (n,m,(o,p,q)) -> n,m,o,p,q)

Or we could write it using an applicative functor, like this:

let zip5 a b c d e =
    puree (fun n m o p q -> n,m,o,p,q) <*> a <*> b <*> c <*> d <*> e

I think you'll agree that the latter looks much cleaner, so let's see how we get there.

Applicative functors are defined by two operations: pure and <*> (actually but we can't write that in ASCII!). Again we can't express these generically in .NET's type system due to the lack of type constructor abstraction, but this is how their signatures would look like:

pure : 'a -> 't<'a> 
(<*>) : 't<'a -> 'b> -> 't<'a> -> 't<'b>

I prefer the bracket-less ML way of expressing types, whenever possible:

pure : 'a -> 'a 't 
(<*>) : ('a -> 'b) 't -> 'a 't -> 'b 't

Intuitively, pure lifts a value (usually a function) inside the domain of the applicative functor, and <*> applies the function inside the applicative to an applicative value, yielding another applicative value.

At this point, it might not be immediately clear that <*> is an application, but compare its signature with the regular function application operator (<|) and you'll see the similarity:

(<|) : ('a -> 'b) -> 'a -> 'b

Since pure is a reserved keyword in F# (it doesn't do anything yet, though) we'll use puree in our code instead. MashedPotatoes

The applicative functor that enables the zip5 implementation above is known as ZipList:

module ZipList = 
    let puree v = Seq.initInfinite (fun _ -> v)
    let (<*>) f a = Seq.zip f a |> Seq.map (fun (k,v) -> k v)

The signatures for ZipList:

puree : 'a -> 'a seq 
(<*>) : ('a -> 'b) seq -> 'a seq -> 'b seq

In plain English, <*> in a ZipList applies a sequence of functions to a sequence of values, pairwise; while puree creates an infinite sequence of values (usually functions) ready to be applied to another sequence of values using <*>.

Functors and applicatives

Since we're always using this ZipList in the form "puree function <*> value" we might as well wrap that in an operator:

let (<!>) f a = puree f <*> a

Sow now we can write zip5 a bit shorter:

let zip5 a b c d e =  
    (fun n m o p q -> n,m,o,p,q) <!> a <*> b <*> c <*> d <*> e

Let's take a moment to see the signature of <!>

(<!>) : ('a -> 'b) -> 'a seq -> 'b seq

This signature should look familiar to you. Yes, it's Seq.map. Applicative functors are also functors, which is quite obvious from its name but we hadn't seen it until now. So we can write a map (a.k.a. lift) function for every applicative functor exactly as the <!> operator above.

Observing the law

Applicative functors, just as regular functors, need to satisfy some laws. Just as the last time, we'll use FsCheck, and again we'll have to cheat a little to get comparable types (we can't compare infinite sequences).

open System.Linq
// limited comparison for infinite sequences
let toListFrom r a = Enumerable.Take(a, List.length r) |> Seq.toList

type ApplicativeLaws<'a,'b,'c when 'a:equality and 'b:equality>() =
    static member identity (v: 'a list) = 
        let x = puree id <*> v
        toListFrom v x = v
    static member composition (u: ('a -> 'b) list) (v: ('c -> 'a) list) (w: 'c list) = 
        let toList = toListFrom u
        let a = puree (<<) <*> u <*> v <*> w
        let b = u <*> (v <*> w)
        toList a = toList b
    static member homomorphism (f: 'a -> 'b) x = 
        let toList a = Enumerable.Take(a, 10000) |> Seq.toList
        let a = puree f <*> puree x
        let b = puree (f x)
        toList a = toList b
    static member interchange (u: ('a -> 'b) list) (y: 'a) = 
        let toList = toListFrom u
        let a = u <*> puree y
        let b = puree ((|>) y) <*> u
        toList a = toList b

FsCheck.Check.QuickAll<ApplicativeLaws<_,_,_>>()

In the next post, we'll see how applicative functors relate to monads, with a parsing example.

Friday, December 24, 2010

Notes on Haskell functors and F#

I've been learning a bit of Haskell lately, and I wanted to share some of what I have learned so far, from a F# perspective. I hope you find these notes useful, and if you find a mistake please let me know. I'll start with a brief explanation of functors and how they translate (or not) to F#.

Functors are basically things that can be mapped over. In F# we have lots of them:

Set.map : ('a -> 'b) -> 'a Set -> 'b Set 
Seq.map : ('a -> 'b) -> 'a seq -> 'b seq 
List.map : ('a -> 'b) -> 'a list -> 'b list 
Array.map : ('a -> 'b) -> 'a [] -> 'b []

See the pattern? If we could generalize this, the signature would look something like this:

map : ('a -> 'b) -> 'a 'T -> 'b 'T

or:

map : ('a -> 'b) -> 'T<'a> -> 'T<'b>

This function is called fmap in Haskell, and it defines the Functor typeclass, one of the most basic typeclasses in Haskell. (I won't talk about typeclasses here, the post would go way out of scope). Unfortunately, .NET's type system isn't flexible enough to express this. Joe Duffy has a great article explaining it for us .NET developers. Actually it seems that OCaml's functors can be encoded in .NET so it would be possible in principle, but quite awkwardly so.

At this point you might be thinking that the concept of functors only applies to collections, but it's actually more general than that. For example, Options are also functors:

Option.map : ('a -> 'b) -> 'a option -> 'b option

Even functions are functors. The map function in this case is the composition operator (<<)

(<<) : ('a -> 'b) -> ('c -> 'a) -> ('c -> 'b)

To make it easier to recognize this as a functor, you might interpret it as:

(<<) : ('a -> 'b) -> "function that takes 'c and returns"<'a> -> "function that takes 'c and returns"<'b>

Haskell's type system is powerful enough to be able to abstract that.

Also, all monads are functors. You can define map in terms of Bind and Return. For example, for async:

let asyncMap f m = 
    async { 
        let! x = m 
        return f x 
    }

asyncMap : ('a -> 'b) -> Async<'a> -> Async<'b>

In the context of monads, map is usually called lift. In fact, in Haskell they're pretty much equivalent, save for class constraints. The name "lift" comes from the notion of lifting a function to operate in the domain of the monad.

The same code as above, desugared:

let asyncLift f m = async.Bind(m, fun x -> async.Return(f x))

For a FParsec monad:

let parserLift f m = parser.Bind(m, fun x -> parser.Return(f x))

Now, F# can't generalize over monads (for the same reasons I mentioned above), but can we still write a generic lift for all monads? Sort of, thanks to inlining and member constraint invocation expressions:

let inline lift builder f m = 
    let inline ret x = (^a: (member Return: 'b -> 'c) (builder,f x)) 
    (^a: (member Bind: 'd * ('e -> 'c) -> 'c) (builder, m, ret))

You can find more generic monad stuff like this in the M<'a> Lib project. Using this generic lift function we can create monad-specific lifts like this:

let asyncLift x = lift async x
let parserLift x = lift parser x

Actually, FParsec already includes lift, although slightly disguised:

(|>>) : Parser<'a,'s> -> ('a -> 'b) -> Parser<'b,'s>

This is lift, only with flipped parameters.

One thing to note about functors is that not everything that satisfies the signature of fmap automatically becomes a functor. There are some laws that functors must obey to have the behavior one would expect. These laws actually come from the definition of a functor in category theory. I'm not going to explain these laws in detail here, you can find good explanations in this article on basic category theory from a Haskell perspective or this one on functors from a Scala perspective.

Haskell developers can test these laws using QuickCheck, so in F# we could try to test them with FsCheck:

open FsCheck
open FsCheck.Prop

let runAsyncRet f a = Async.RunSynchronously (f (async.Return a)) 
let liftId x = runAsyncRet id x = id x 
let liftDist x (f,g) = runAsyncRet (asyncLift (f << g)) x = runAsyncRet (asyncLift f << asyncLift g) x 
Check.Quick liftId 
Check.Quick liftDist

I haven't showed any real-world examples of functors in F#. The lack of ad-hoc polymorphism takes away much of the power of the abstraction, but still many times it's simpler to use lift instead of computation expression sugar for trivial expressions. Moreover it encourages writing simpler, shorter, composable functions, which fits the functional style better. There's a good article on the Haskell wiki against "do notation" (i.e. syntactic sugar for monadic code).

In the next article I'll talk about applicative functors, in my opinion a more interesting abstraction, and I'll show some more concrete real-world examples of how they can be used in F#.

Monday, December 13, 2010

Customizing SolrNet

One of the most voted enhancement requests for SolrNet (an Apache Solr client for .NET) right now is to add support for POSTing when querying.

Let me explain: queries are serialized by SolrNet and sent to Solr via HTTP. Normally, queries are issued with a GET request and the query itself goes in the query string part of the URL. A simple query URL might look like this: http://localhost:8983/solr/select?q=id:123 .

The problem arises when the query is too long to fit in the query string. Even though the HTTP protocol does not place any a priori limit on the length of a URI, most (all?) servers do, for performance and security reasons.

Here's a little program that reproduces this issue:

internal class Program {
    private const string serverURL = "http://localhost:8983/solr";

    private static void Main(string[] args) {
        Startup.Init<Dictionary<string, object>>(serverURL);
        var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
        solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
    }
}

This creates the query "id:0 OR id:1 OR ... OR id:999", it's about 10KB after encoding, more than enough for our tests. Running this against Solr on Jetty 6 makes Jetty throw:

2010-12-13 17:52:33.362::WARN:  handle failed 
java.io.IOException: FULL 
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) 
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) 
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) 
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) 
        at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Not very graceful... it should probably respond with 414 Request-URI Too Long instead of throwing like this, but clients shouldn't send such long URIs anyway.

Steven Livingston has a good blog post describing a patch modifying some classes in SolrNet to deal with this issue. However, even though I never foresaw this problem when writing SolrNet, solving it does not really require any changes to the existing codebase.

In this particular case, what we need to do concretely is override the Get() method of the ISolrConnection service and make it issue POST requests instead of GET. We can write a decorator to achieve this:

public class PostSolrConnection : ISolrConnection {
    private readonly ISolrConnection conn;
    private readonly string serverUrl;

    public PostSolrConnection(ISolrConnection conn, string serverUrl) {
        this.conn = conn;
        this.serverUrl = serverUrl;
    }

    public string Post(string relativeUrl, string s) {
        return conn.Post(relativeUrl, s);
    }

    public string Get(string relativeUrl, IEnumerable<KeyValuePair<string, string>> parameters) {
        var u = new UriBuilder(serverUrl);
        u.Path += relativeUrl;
        var request = (HttpWebRequest) WebRequest.Create(u.Uri);
        request.Method = "POST";
        request.ContentType = "application/x-www-form-urlencoded";
        var qs = string.Join("&", parameters
            .Select(kv => string.Format("{0}={1}", HttpUtility.UrlEncode(kv.Key), HttpUtility.UrlEncode(kv.Value)))
            .ToArray());
        request.ContentLength = Encoding.UTF8.GetByteCount(qs);
        request.ProtocolVersion = HttpVersion.Version11;
        request.KeepAlive = true;
        try {
            using (var postParams = request.GetRequestStream())
            using (var sw = new StreamWriter(postParams))
                sw.Write(qs);
            using (var response = request.GetResponse())
            using (var responseStream = response.GetResponseStream())
            using (var sr = new StreamReader(responseStream, Encoding.UTF8, true))
                return sr.ReadToEnd();
        } catch (WebException e) {
            throw new SolrConnectionException(e);
        }
    }
}

Now we have to apply this decorator:

private static void Main(string[] args) {
    Startup.Init<Dictionary<string, object>>(new PostSolrConnection(new SolrConnection(serverURL), serverURL));
    var solr = Startup.Container.GetInstance<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

That's it! If you're using Windsor, applying the decorator looks like this:

private static void Main(string[] args) {
    var container = new WindsorContainer();
    container.Register(Component.For<ISolrConnection>()
        .ImplementedBy<PostSolrConnection>()
        .Parameters(Parameter.ForKey("serverUrl").Eq(serverURL)));
    container.AddFacility("solr", new SolrNetFacility(serverURL));
    var solr = container.Resolve<ISolrOperations<Dictionary<string, object>>>();
    solr.Query(Query.Field("id").In(Enumerable.Range(0, 1000).Select(x => x.ToString()).ToArray()));
}

This is the real benefit of writing decoupled code. Not testability, but flexibility. Testability is nice of course, but not the primary purpose.
When your code is decoupled, you can even implement entire features mostly by rearranging the object graph. This is pretty much how I implemented multicore support in SolrNet.

The PostSolrConnection implementation above works with SolrNet 0.3.0 and probably also 0.2.3. PostSolrConnection is not the default because: a) it needs to be tested thoroughly, and b) Solr doesn't emit cache headers when POSTing so it precludes caching.

Monday, December 6, 2010

SolrNet 0.3.0 released

SolrNet is a Solr client for .NET. I just released SolrNet 0.3.0.

Longest. Beta. Ever. I know. But I really wanted to document everything and change many things in the release package, and in order to do that I had to get rid of MSBuild first.

There aren't many changes in the library itself:

  • Upgraded Ninject module to Ninject 2.0 RTM
  • Upgraded StructureMap registry to StructureMap 2.6.1
  • Upgraded Windsor facility to Windsor 2.5.2
  • Added support for multi-core for StructureMap
  • Improved response parsing performance
  • Fixed a couple of minor bugs

If you're upgrading from 0.2.3 please read about the breaking changes.

Also, this will be the last release to support .NET 2.0.

As for the package, it now contains:

  • Merged and unmerged assemblies (only the merged assembly with all IoC integrations was included before).
  • PDBs for all assemblies.
  • All assemblies are now signed with a strong name.
  • Doxygen-compiled documentation (replacing Sandcastle which was too bloated).
  • Note explaining exactly what assemblies you need depending on how you integrate the library. Hopefully this will help clear up the confusion.
I'd like to thank the following people who contributed to this release:

Binaries, docs and sample app downloads here. You can also get it via NuGet packages.

Sunday, November 21, 2010

Shallow submodules in Git

A git submodule is really little more than an embedded repository with some configuration in the parent repository to track its HEAD. When you first clone the parent repository, you have to manually fetch the submodules by running:

git submodule init
git submodule update

This actually runs a clone (initially) for each submodule. Each submodule is a fully-fledged repository. You can even commit and push from a submodule repository (if you have permissions, of course).

But sometimes you only want submodules for read-only purposes. For example if you have a "superproject" whose purpose is to integrate several projects through submodules. When that's the case, it becomes annoying having to download the entire history for each submodule. It wastes time, bandwidth and disk for everyone that wants to work with this superproject. Shallow clones are great for this, but how can we apply them to submodules?

After asking on stackoverflow, Ryan Graham gave me the hint that "submodule update" can handle previously cloned submodules, so it was just a matter of running a "manual" shallow clone for each submodule between "submodule init" and "submodule update". This script does just that:

Keep in mind that the savings in disk space are usually not quite what you'd expect.

Wednesday, November 17, 2010

Windsor-managed MembershipProviders

I see many questions on Stackoverflow that are basically variants of this: "How can I integrate my custom MembershipProvider into my IoC container?"

Integrating your custom MembershipProvider to a IoC container has many advantages, since it would let you treat it just like any other service: you could manage its lifetime, dependencies, configuration, even proxy it if you wanted.

Problem is, MembershipProviders are one of those things that are managed by the ASP.NET runtime. You just configure it in your web.config and the runtime instantiates it when it's needed. You don't really get much control over its creation.

A cheap solution is to use the container as a service locator directly in your membership provider (using something like CommonServiceLocator), e.g.:

public class MyMembershipProvider : MembershipProvider {
    private IUserRepository repo {
        get { return ServiceLocator.Current.GetInstance<IUserRepository>(); }
    }

    public override string GetUserNameByEmail(string email) {
        return repo.First(u => u.Email == email);
    }

    ...
}

Using a service locator like this should be avoided as much as possible. Mark Seeman explains it thoroughly in this article. In a nutshell, you want to limit the usage of the service locator pattern to glue code (i.e. very low-level infrastracture), even there use it as little as possible, and never use it in application-level code.

As usual with this kind of problems, the solution is to write a wrapper/adapter/bridge/whatever-you-want-to-call-it that isolates the issue so that client code doesn't have to suffer it. It's similar in concept to the implementation of Windsor-managed HttpModules. It's actually simpler than that, we don't need a custom lifestyle manager here.

In fact, Spring.NET has had such an adapter for quite some time. The only problem with that implementation is that you can't change the lifetime of the custom provider, it's always a singleton. My implementation doesn't have this limitation: your provider can be transient, singleton, per web request, whatever. The price for this is that you can't use Initialize() (more precisely, it won't do anything), but since it's managed by the container, you can use the container to provide any configuration, which is much more flexible. The implementation is about 200 lines of boring, simple code so I'm not going to post it here. It does use Windsor as a service locator, but this is low-level infrastracture/glue code. The goal here is to keep your code clean.

The code is here, and here's how to use it:

  1. Write your custom MembershipProvider as a regular component, using constructor or property injection as you see fit.
  2. Implement IContainerAccessor in your global HttpApplication class. Use this article as reference.
  3. Register your custom provider in Windsor and assign a name to the component. E.g.:

    container.Register(Component.For<MyMembershipProvider>()
        .LifeStyle.Transient
        .Named("myProvider"));
  4. Register your custom provider in your web.config using the adapter and referencing the name of the corresponding Windsor component in a "providerId" attribute. E.g.:

        <membership defaultProvider="customProvider">
          <providers>
            <clear/>
            <add name="customProvider" type="ProviderInjection.WebWindsorMembershipProvider, ProviderInjection" providerId="myProvider"/>
          </providers>
        </membership>

That's it. Here's a sample app that you can use as reference. This can be easily ported to any IoC container, and for any provider like RoleProvider, ProfileProvider, etc. I haven't used this in anger so let me know if you have any problems with it.

Tuesday, November 16, 2010

Migrating to FAKE

I finally finished migrating the SolrNet build scripts from MSBuild to FAKE. I did not do this out of a whim or just because I was bored, but because the MSBuild script was getting out of hand. At only 246 lines, it became unmaintainable. I admit I'm not a MSBuild expert, but a build script shouldn't be that hard. Just visually parsing the script was a daunting task.

This screenshot compares the FAKE and the equivalent MSBuild scripts side by side:

fake-vs-msbuild

Even if you can't read it, I bet you can tell which is which.

XML-based DSLs like MSBuild and NAnt have only one advantage: being easily parsable by tools. But if you're going to do any sort of manual maintenance of the script, embedded DSLs in a real language will beat XML progamming every time.

I wrote the original build script for SolrNet in MSBuild for two reasons: no external dependencies, so it wouldn't place any burden on potential contributors; and being a NAnt user, I wanted to learn MSBuild.

But I should have known better: most of my NAnt scripts have evolved over the years to be mostly calls to MSBuild to compile solutions and embedded Boo to handle any logic.

Speaking of Boo, Phantom looks very nice. I picked FAKE over Phantom kind of arbitrarily, mostly because I'm really digging F# right now. But I'm keeping an eye on Phantom. Boo's macros and compiler extensibility are used in Phantom, and Boo's type inference and optional duck-typing make it an excellent language for a build system. And Boo is so small that you can fit the runtime and the compiler in under 2MB, so it's not an issue to distribute it.

Another option is Rake. Albacore and rake-dotnet implement the common .net-building tasks in Rake. But I really don't want to introduce a dependency on Ruby just to build the project.

Then there's PSake. PowerShell is quite powerful and I really like the concept of piping objects instead of lines of text as in unix shells. In the context of a build script, one of the main advantages of using a shell is that it's dead easy to call external programs, so "integrating" with ILMerge, Zip, etc is just a matter of calling the executable in question with the parameters you need, just as you would do it on the command line. No wrappers needed. And PowerShell is ubiquitous enough as to not consider it a dependency. But... I still can't get myself to like PowerShell's syntax. I know this is totally subjective, but I feel just like Jim here. Scripts involving .net are particularly ugly. And my first experience with PowerShell was bumpy to say the least.

Another (lesser for me) issue with PowerShell is that it currently doesn't run on Mono. It's not like I'm building SolrNet on Mono right now, but I certainly don't want to knowingly work against it.

So why did I pick FAKE and F# instead of all the others I mentioned?

  • F# is statically typed and has global type inference. If there is a type error in my script, it won't run. At the same time, type inference gives the code a scripty feel. I didn't have to declare a single type in the build script (you can see it here).
  • I can write my build script on VS2010, complete with intellisense, automatic error checking and debugging.
  • FAKE defines a simple, terse, functional EDSL to manage common build tasks.
  • FAKE is trivially extensible: just write a regular F# function (or .net method in any language) and invoke it. No particular structure or attributes needed. In just a few lines I defined some helper functions to manipulate XML and start/stop Solr to run integration tests.
  • F# works on Mono.
  • No external dependencies: F# is included in any default VS2010 install. And now that F# is Apache-licensed, it will soon be included in Mono (therefore also Ubuntu) and MonoDevelop.

I've also been contributing a few things to FAKE lately:

  • Enhancements to the MSBuild integration.
  • Enhancements to the ILMerge integration.
  • Gallio integration.
  • Simpler shell exec functions and some shell-like functions similar to Ruby's FileUtils.
  • A couple of bugfixes.

Bottom line: a few years ago all there was to build .net projects was MSBuild and NAnt. Nowadays there's a lot of choice. Make sure you do your research and pick the build system that's right for your project and for you.

Thursday, October 28, 2010

GDataDB 0.2 released

Just released GDataDB 0.2. GDataDB is a database-like interface to Google Spreadsheets for .Net.

This is a maintenance release. Here's the short changelog:

  • Andrew Yau implemented deleting tables and databases (which means deleting worksheets and spreadsheets respectively).
  • Ryan Farley fixed a bug with read-only and write-only properties in the mapped class.
  • Updated to use the latest GData library.

Binaries are in github.

Also, Ryan recently wrote a nice article about GDataDB, check it out.

UPDATE 11/3/2010: Ryan Farley added GDataDB and GDataDB.Linq to the NuGet package repository. Thanks Ryan!

Monday, October 11, 2010

A functional wrapper over ADO.NET (part 2)

In my last post I introduced the basics of FsSql, a functional wrapper over ADO.NET for F#. Any code samples in this post will use what was defined in the previous one. To recap, here are the important definitions again:

let openConn() = 
    let conn = new System.Data.SQLite.SQLiteConnection("Data Source=test.db;Version=3;New=True;") 
    conn.Open() 
    conn :> IDbConnection 

let connMgr = Sql.withNewConnection openConn 
let execScalar sql = Sql.execScalar connMgr sql 
let execReader sql = Sql.execReader connMgr sql 
let execReaderf sql = Sql.execReaderF connMgr sql 
let execNonQueryf sql = Sql.execNonQueryF connMgr sql 
let execNonQuery sql p = Sql.execNonQuery connMgr sql p |> ignore 
let exec sql = execNonQuery sql [] 

Transactions

Ambient transactions alla TransactionScope / J2EE are implemented... through functions, of course!

Let's start with a simple function that inserts a record.

let insertUser connMgr = 
    Sql.execNonQueryF connMgr "insert into user (id,name) values (%d,%s)"

Note that it's parameterized by the connection manager. Now we make it require a transaction or throw if there is no current transaction:

let txInsertUser = Tx.mandatory insertUser

And we insert 50 users:

let insert50 connMgr = 
    for i in 1..50 do 
        txInsertUser connMgr i "John" |> ignore

If we run insert50, we'll get an exception "Transaction required!" since we haven't started any transaction. We need to wrap insert50:

let txInsert50 = Tx.required insert50

Tx.required will create the transaction if there isn't a previous one. Now we can run txInsert50 and each txInsertUser will run within this transaction.

Other transactional functions are Tx.never (the opposite of Tx.mandatory), Tx.supports (doesn't care if there's a transaction or not) and Tx.transactional (always starts a new transaction).

These are "ambient" or "implicit" transactions in the sense that they're transparent, i.e. txInsert50 has the same signature as insert50. There is no explicit commit or rollback: if the function ends successfully, it commits (or not, depending on transaction semantics); if there is an exception, it rolls back. Transaction semantics are defined outside the function definition. The actual transaction is carried over in the connection manager.

The library also includes a transaction computation expression. Here's an example:

let tx = Tx.TransactionBuilder()
let tran1() = tx {
    do! Tx.execNonQueryi
            "insert into user (id,name) values (@id,@name)"
            [P("@id", 99); P("@name", "John Doe")]
}
let tran() = tx {
    do! tran1()
    do! Tx.execNonQueryi "insert into blabla" [] // invalid SQL
    return 0
}

match tran() connMgr with // run transaction
| Tx.Commit a -> printfn "Transaction successful, return value %d" a
| Tx.Rollback a -> printfn "Transaction rolled back, return value %A" a
| Tx.Failed e -> printfn "Transaction failed with exception:\n %s" e.Message

This transaction will of course fail. Transaction expressions are currently composed with Tx.required semantics (this might be user-definable in the future). So the exception actually rolls back the whole thing, including the record inserted in tran1.

Mapping

FsSql is not a real ORM and doesn't pretend to be one. Still, I included a few mapping functions, but since there is no real schema definition in code, they're quite verbose to use. But this also makes things more flexible.

In my previous post I defined this function:

let selectById = execReaderf "select * from user where id = %d";;

val selectById : (int -> IDataReader)

This gives us a IDataReader... not the easiest thing to handle, we'd better map it to something more usable, at least a sequence of name*value pairs:

let selectByIdAsNameValue = selectById >> (Sql.mapFirst Sql.asNameValue);;

val selectByIdAsNameValue : (int -> seq<string * obj> option)

selectByIdAsNameValue 20;;

val it : seq<string * obj> option = 
  Some (seq [("id", 20); ("name", "John"); ("address", )])

Or since there are only three fields we could map it as a tuple:

let selectByIdAsTuple = selectById >> (Sql.mapFirst Sql.asTuple3<int,string,string option>);;

val selectByIdAsTuple : (int -> (int * string * string option) option)

selectByIdAsTuple 20;;

val it : (int * string * string option) option = Some (20, "John", null)

Or map it to a record:

type User = {
    id: int
    name: string
    address: string option
}
let asUser (r: #IDataRecord) =
    {id = (r?id).Value; name = (r?name).Value; address = r?address}
let selectByIdAsRecord = selectById >> (Sql.mapFirst asUser);;

val selectByIdAsRecord : (int -> User option)

If your database field names happen to coincide with the record field names, you can use this convenience function as your mapper:

let asUser r = Sql.asRecord<User> "" r

So far we've only seen how to map a single record from the result set (using Sql.mapFirst). Let's see now how we would map something more complex, like a joined query. First we create another table:

exec "create table animal (id int primary key not null, name varchar not null, owner int null, animalType varchar not null)"

Where the owner field will be a foreign key to the USER table. Now the corresponding record type:

type Animal = {
    id: int
    name: string
    animalType: string
    owner: int option
}

Let's insert some records:

let insertAnimal (animal: Animal) = 
    let toNull = function Some x -> x.ToString() | _ -> "null" 
    execNonQueryf 
        "insert into animal (id, name, owner) values (%d, %s, %s)" 
        animal.id animal.name (toNull animal.owner) |> ignore

// inserting sample data 
insertAnimal {id = 1; name = "Seymour"; owner = Some 1} 
insertAnimal {id = 2; name = "Nibbler"; owner = Some 1} 
insertAnimal {id = 3; name = "Tramp"; owner = None} 

Now we'd like to list people with pets. First we create the SQL:

let innerJoinSql = sprintf "select %s,%s from user u join animal a on a.owner = u.id" 
                      (Sql.recordFieldsAlias typeof<User> "u")
                      (Sql.recordFieldsAlias typeof<Animal> "a")

This generates the following SQL:

select u.id u_id,u.name u_name,u.address u_address,a.id a_id,a.name a_name,a.owner a_owner 
from user u join animal a on a.owner = u.id

Here's the mapping function we'll use:

let asUserWithAnimal (r: #IDataRecord) =
    Sql.asRecord<User> "u" r, Sql.asRecord<Animal> "a" r
val asUserWithAnimal : (IDataRecord -> User * Animal)

We'll also use a helper function (included in FsSql):

val Seq.groupByFst : (seq<'a * 'b> -> seq<'a * seq<'b>>

This does exactly what the name and signature suggest: group a sequence of tuples by the first element of the tuple.

Now we have everything we need to run and map the query:

execReader innerJoinSql []
|> Sql.map asUserWithAnimal
|> Seq.groupByFst
|> Seq.iter (fun (person, animals) ->
                printfn "%s has pets %s" person.name (String.Join(", ", animals |> Seq.map (fun a -> a.name))))

Which will print something like "Fry has pets Seymour, Nibbler"

Conclusions

FsSql aims to wrap ADO.NET to make it more idiomatic for F# consumers, providing several fine-grained functions meant to be reused or combined as necessary, as is usual in functional programming.

It's not an ORM by any means, it operates at roughly the same level as ADO.NET, so you don't get typical ORM features like type safety, automatic SQL generation and automatic mapping of query results. Maybe a proper ORM could be built on top of this library.

Other relational data access projects specific to F# include:

Full source code is here.

UPDATE 3/30/2011: I recently released FsSql 0.1, binaries available in github and NuGet.

A functional wrapper over ADO.NET

ADO.NET is the de-facto basic library for data access in .NET, and as everything in the BCL, it's object oriented. Which forces you to write object-oriented code when you use ADO.NET in F#.

Nothing wrong with that, and in fact F# is a great language to write object-oriented code. But I believe I speak for many F# coders when I say we prefer functional programming over OOP whenever possible.

So it boils down to this: you either use ADO.NET's objects directly (like this or this), or you wrap it to give a more functional style (like this, this, or this)

So here's an attempt at creating a generic functional wrapper over ADO.NET, I called this library FsSql. UPDATE 3/30/2011: I recently released FsSql 0.1, binaries available in github and NuGet.

Let's start with some examples...

The connection manager

A simple function to open a connection:

let openConn() = 
    let conn = new System.Data.SQLite.SQLiteConnection("Data Source=test.db;Version=3;New=True;") 
    conn.Open() 
    conn :> IDbConnection 

Let's create a table:

let ddl = "create table user (id int primary key not null, name varchar not null, address varchar null)"

Sql.execNonQuery (Sql.withNewConnection openConn) ddl [] |> ignore

That was quite verbose! The "Sql.withNewConnection openConn" piece is the "connection manager", it basically encapsulates how to create and dispose the connection. In general we'll always use the same connection manager, so we can use partial application around it for all operations:

let connMgr = Sql.withNewConnection openConn 
let execScalar sql = Sql.execScalar connMgr sql 
let execReader sql = Sql.execReader connMgr sql 
let execReaderf sql = Sql.execReaderF connMgr sql 
let execNonQueryf sql = Sql.execNonQueryF connMgr sql 
let execNonQuery sql p = Sql.execNonQuery connMgr sql p |> ignore 
let exec sql = execNonQuery sql []

Non-queries and Parameters

Using the previous definitions now we can write:

execNonQuery 
    "insert into user (id, name, address) values (@id, @name, @address)"  
    (Sql.parameters ["@id",box 1; "@name",box "John"; "@address",box None])

All that parameter boxing gets boring fast, we can define parameters in other ways:

let P = Sql.Parameter.make

execNonQuery 
    "insert into user (id, name, address) values (@id, @name, @address)"  
    [P("@id", 2); P("@name", "George"); P("@address", None)]

Note that I used None for the address parameter. None parameters are automatically mapped to DBNull.

Queries

Let's count the records in our table:

let countUsers(): int64 = 
    execScalar "select count(*) from user" [] |> Option.get 
    
printfn "%d users" (countUsers())

When reading a field from a row in a resultset (or a scalar), you get it as an Option (None if the field is DBNull, otherwise Some x), so it forces you to deal with nullness (database nullness, in this case) as is usual in F#.

Here's an example of querying and iterating over the results:

execReader "select * from user" [] 
|> Seq.ofDataReader 
|> Seq.iter (fun dr -> 
    let id = (dr?id).Value 
    let name = (dr?name).Value 
    let address = 
        match dr?address with 
        | None -> "No registered address" 
        | Some x -> x 
    printfn "Id: %d; Name: %s; Address: %s" id name address)

Here Seq.ofDataReader converts the IDataReader into a sequence of IDataRecords. The dynamic operator is used to get the data out of the fields with option types, again forcing you to deal with nullness.

Also note how connection management is implicit. The connection is automatically closed when the datareader is disposed, which happens at the end of the iteration.

Stored procedures


You can also call stored procedures instead of inline SQL. Here's an example for the AdventureWorks sample database:

let managers = Sql.execSPReader connMgr 
                "uspGetEmployeeManagers" 
                (Sql.parameters ["@EmployeeID", box 1]) 
                |> List.ofDataReader 

Formatted SQL

Let's say we want to create a function to retrieve a record by id. It would look like this:

let selectById (id: int) =
     execReader "select * from user where id = @id" [P("@id", id)]

We can do better than this, using Sql.execReaderF instead:

let selectById = execReaderf "select * from user where id = %d"

The SQL here is interpreted as a printf-formatted string using the printf manipulation I described a couple of months ago. Even though this has its limitations, it's a nifty alternative for little queries like this one.

Async

An ofter overlooked capability of some ADO.NET providers is being able to run commands/queries asynchronously. Maybe it's because (as far as I know) only SqlClient actually implements this properly. Anyway you can use async database calls with FsSql:

async { 
    use! reader = Sql.asyncExecReader connMgr "select * from user" [] 
    let r = reader |> List.ofDataReader 
    return r.Length 
}

Keep in mind that async database calls do not imply better overall scalability by themselves. As usual, make sure by measuring for your specific scenario.

In the second part of this post we'll see transactions and mapping.

Thursday, September 9, 2010

Nullable in F#

In F#, unlike VB.NET and C#, Nullable<T> is not a language-supported construct. That is not to say that you can't use nullable types in F#. But in F#, Nullable<T> is just another type in the BCL, it doesn't get any special treatment from the compiler. Concretely, in C# you get:

  • Special type syntax: C# has some syntax sugar for nullable types, e.g. int? x = 4 is shorthand for Nullable<int> x = 4
  • Implicit conversions: int? a = null and int? a = 4 are both implicit conversions. In the first case because Nullable<int> is a value type so it can't really be null. In the second case there's an implicit conversion from int to Nullable<int>.
  • Overloaded operators: arithmetic, comparison, boolean operators are overloaded for nullable types. The null coalescing (??) operator is also overloaded.

Not having this in F# is not that much of a problem actually, since you usually use an Option type instead of Nullable. Option is widely supported in F# and it has the advantage of working with any underlying type, not just value types.

However this lack of support can become annoying when interoping with code that makes extensive use of Nullables. So let's see what we can do in F# to improve this situation.

Special type syntax

Not much we can do here... then again, it's not so bad either. The type parameter of a Nullable constructor is inferred by the F# compiler so instead of:

let x = Nullable<int>(4)

we can just say:

let x = Nullable 4

If we want a null Nullable, we do have to specify the type:

let x = Nullable<int>()

except for those cases where the compiler knows the type a priori, e.g.:

let x: Nullable<int> = Nullable()

we'll see some more of these later.

Also, we could create a shorter type alias:

type N<'a when 'a: (new: unit -> 'a) and 'a: struct and 'a :> ValueType> = Nullable<'a>
let x = N<int>()

but the few bytes saved are probably not worth the loss of readability.

Implicit conversions

F# does not allow implicit conversions. Just as in OCaml, you need to be explicit about the types you want (barring type inference). Sometimes this can be annoying if you're working with an API that assumes implicit conversions are part of the language, such as LINQ to XML. For these particular cases you can define an operator or shorthand to avoid calling op_Implicit constantly.

However, for Nullable types I'd avoid this. Being explicit about types (modulo type inference) is in the nature of F#.

Mapping to Option

So far we haven't done anything but complaining. So let's start writing some code for a change!

I mentioned before that Nullable is similar to Option. Indeed, mapping one to another is quite easy:

module Option =
   let fromNullable (n: _ Nullable) =
       if n.HasValue
           then Some n.Value
           else None
   let toNullable =
       function
       | None -> Nullable()
       | Some x -> Nullable(x)

This is not really an isomorphism though, like I said the domain of Nullable is smaller than the domain of Option.

Pattern matching

A very useful feature of Options is their ability to be pattern-matched. We can define partial active patterns over Nullable to achieve the same effect:

let (|Null|_|) (x: _ Nullable) =
   if x.HasValue
       then None
       else Some()

let (|Value|_|) = Option.fromNullable

Only problem is that the compiler can't statically assert that these partial active patterns cover all possible cases, so every time you use it you get a warning "Incomplete pattern matches on this expression". You can turn this off with a #nowarn "25" at the beginning of the file. EDIT: you can define the active pattern as a choice instead, to make it exhaustive. See kvb's comment below.

Comparison operators

Next, we define the comparison operators. Equality already works as expected so we don't need to do anything about it. For the other operators, we'll use a convention of appending '?' as a suffix for all operators. For example, '>' becomes '>?'. We'll also use a little helper function to allow us to express Nullable comparisons in terms of their underlying type's comparison functions:

let mapBoolOp op (a: _ Nullable) (b: _ Nullable) =
   if a.HasValue && b.HasValue
       then op a.Value b.Value
       else false

We can also define this using pattern matching, which allows us to take advantage of type inference and makes the code a bit more concise:

let mapBoolOp op a b =
   match a,b with
   | Value x, Value y -> op x y
   | _ -> false

Now the definition of the operators themselves:

let inline (>?) a b = (mapBoolOp (>)) a b
let inline (<?) a b = (mapBoolOp (<)) a b
let inline (>=?) a b = a >? b || a = b
let inline (<=?) a b = a <? b || a = b

And that's it for comparison operators.

Boolean operators

These apply only to Nullable<bool>:

Negation:

let inline notn (a: bool Nullable) =
   if a.HasValue
       then Nullable(not a.Value)
       else Nullable()

C# doesn't have a && (short-circuit and) operator for Nullable<bool>, but it does have a & (non-short-circuit and) operator. This is probably because the right part of the expression has to be evaluated anyway if the left part is null, so it's not much of a short-circuit evaluation. VB.NET has AndAlso and OrElse (short-circuit) for Nullable<bool>, but the documentation warns about this.

let inline (&?) a b =
   let rec and' a b =
       match a,b with
       | Null, Value y when not y -> Nullable(false)
       | Null, Value y when y -> Nullable()
       | Null, Null -> Nullable()
       | Value x, Value y -> Nullable(x && y)
       | _ -> and' b a
   and' a b

Or operator:

let inline (|?) a b = notn ((notn a) &? (notn b))

Arithmetic operators

To define arithmetic operators, we'll use another helper function similar to mapBoolOp:

let liftNullable op (a: _ Nullable) (b: _ Nullable) =
   if a.HasValue && b.HasValue
       then Nullable(op a.Value b.Value)
       else Nullable()
let inline (+?) a b = (liftNullable (+)) a b
let inline (-?) a b = (liftNullable (-)) a b
let inline ( *?) a b = (liftNullable ( *)) a b
let inline (/?) a b = (liftNullable (/)) a b

I stole the idea of lifting operators from this hubfs thread.

By the way, you might wonder why I didn't call mapBoolOp a lift. Well, at first I did, but then I read this article and found out I was using the term "lifting" the wrong way.

Null-coalescing operator

Let's see a basic example of the null coalescing operator applied to a nullable type in C#:

int? c = null;
int d = c ?? -1;

This is just like F#'s defaultArg, except it's for Nullable. If we have to express this in F# :

let c = Nullable<int>()
let d = if c.HasValue then c.Value else -1

If you find that too verbose you can hide it behind an infix operator:

let inline (|??) (a: 'a Nullable) (b: 'a) = if a.HasValue then a.Value else b
let d = c |?? -1

However this same operator can't be applied when chaining multiple '??', e.g.:

int? e = null;
int? f = null;
int g = e ?? f ?? -1;

It's possible to define another operator for this, and combine it with a Lazy to achieve the same effect of laziness and composability (see this article, which does it for Option types), but the end result looks weird in my opinion and not worth the added complexity. Instead, we can use pattern matching:

let e = Nullable<int>()
let f = Nullable<int>()
let g = match e,f with Value x,_ -> x | _,Value y -> y | _ -> -1

It's not as concise as C#, but at least it's pretty clear.

Functional behavior

Since Nullable is so similar to Option, we can also define some composable functions for Nullable just like the ones in the Option module:

module Nullable =
   let create x = Nullable x
   let getOrDefault n v = match n with Value x -> x | _ -> v
   let getOrElse (n: 'a Nullable) (v: 'a Lazy) = match n with Value x -> x | _ -> v.Force()
   let get (x: _ Nullable) = x.Value
   let fromOption = Option.toNullable
   let toOption = Option.fromNullable
   let bind f x =
       match x with
       | Null -> Nullable()
       | Value v -> f v
   let hasValue (x: _ Nullable) = x.HasValue
   let isNull (x: _ Nullable) = not x.HasValue
   let count (x: _ Nullable) = if x.HasValue then 1 else 0
   ...

You might wonder what value Nullable.create or Nullable.get could have. The reason behind them is that constructors and properties are not really first-class functions. You can't compose or pipe them. For example, to create a nullable you have to do let x = Nullable 5 , you can't write let x = 5 |> Nullable.

Conclusions

Even though nullable types are not as nice in F# as in other .NET languages, they're fully usable and the helper functions and operators defined here make it easier to do so. Some times, though, the best bet is to map them to Option types, do your thing, and then map back to Nullables.

Full source code is here.

Wednesday, August 25, 2010

Enumerable.Skip vs Seq.skip

If you're an F# newbie like me(*) you'll eventually try to use Seq.skip and Seq.take to implement pagination, just like you used Enumerable.Skip and Enumerable.Take (or a IQueryable implementation of them) in C#.

And more sooner than later you find out that they don't behave quite the same. If you haven't realized this yet, read on.

Load up fsi.

> let a = seq { 1..100 };;

An F# seq is a System.IEnumerable, it's just a convenient alias. In C# this would be expressed as:

var a = Enumerable.Range(1, 100);

Now let's paginate that. Assuming a page size of 40 elements, in C# we would do something like this:

var firstPage = a.Skip(0).Take(40).ToArray();
var lastPage = a.Skip(80).Take(40).ToArray();

Now in F# :

> let firstPage = a |> Seq.skip 0 |> Seq.take 40 |> Seq.toArray;;
> let lastPage = a |> Seq.skip 80 |> Seq.take 40 |> Seq.toArray;;

Uh-oh. The last expression throws a System.InvalidOperationException: The input sequence has an insufficient number of elements.

The thing is, Seq.skip and Seq.take are more strict and actually do bounds checking, whereas Enumerable.Skip and Enumerable.Take are more "tolerant" and may process and return fewer items than requested.

So how can we get Enumerable.Skip's behavior? The simplest option would be using it as in C#, e.g.:

> open System.Linq;;
> let lastPage = a.Skip(80).Take(40).ToArray();;

This works, but extension methods are generally not idiomatic in F#. We prefer function piping, currying and composition. So let's wrap them in curried forms, which is trivial:

> let eSkip n sequence = Enumerable.Skip(sequence, n);;
> let eTake n sequence = Enumerable.Take(sequence, n);;

And now we can operate as usual with pipes:

> let lastPage = a |> eSkip 80 |> eTake 40 |> Seq.toArray;;

By the way, F# already defines Seq.truncate which works like Enumerable.Take, so we can drop eTake and just write:

> let lastPage = a |> eSkip 80 |> Seq.truncate 40 |> Seq.toArray;;

(*) I've been learning F# and functional programming for about two years now, yet I still consider myself somewhat of a newbie about it... at least until I learn Haskell :-)

Tuesday, August 3, 2010

Figment: a web DSL for F#

As part of my master's thesis project, I'm writing Figment, an embedded DSL for web development for F#. In the spirit of similar web DSLs like Sinatra and Compojure, it aims to be simple, flexible, idiomatic.

It's still very experimental and likely to change but I'd like to show you what I have so far. So here's "Hello World" in Figment:

First a very basic Global.asax to set the entry point:

<%@ Application Inherits="BasicSampleApp.App" %>

and now the code itself:

namespace BasicSampleApp

open System.Web
open Figment.Routing
open Figment.Actions

type App() =
   inherit HttpApplication()
   member this.Application_Start() =
       get "hi" (content "<h1>Hello World!</h1>")

Run it, visit /hi and you get a big Hello World. Of course, everything but the last line is boring boilerplate, so let's focus on that last line. This is basically how it works: first, we have the action type:

type FAction = ControllerContext -> ActionResult

Yes, those are ASP.NET MVC2 classes. Figment is built on top of ASP.NET MVC2. Now, the get function takes a route and an action, and maps GET requests.

get : string -> FAction -> unit

and content is an action generator (or parameterized action), it creates an action that outputs a string as response.

content : string -> FAction

Similarly, there's a redirect action generator, so we can redirect /hello to /hi by saying:

get "hello" (redirect "hi")

Actions and Results

So far we've only seen action generators, now let's see proper actions with a variant of Hello World. We start with a simple function:

let greet firstName lastName age =   
 sprintf "Hello %s %s, you're %d years old" firstName lastName age

and now we bind it to the request and map it to a route:

let greet' (ctx: ControllerContext) =
   let req = ctx.HttpContext.Request
   greet req.["firstname"] req.["lastname"] (int req.["age"])
   |> sprintf "<p>%s</p>" |> Result.content
get "greetme" greet'

Visit /greetme?firstname=John&lastname=Doe&age=50 to see this in action.

Did you notice Result.content? It maps directly to ContentResult. Normally you don't have both Figment.Actions and Figment.Result open in the same file so usually you can skip writing "Result.".

We could have used Result.view (ViewResult) to send the information to a regular ASP.NET MVC view:

let greet2 (p: NameValueCollection) =
   greet p.["firstname"] p.["lastname"] (int p.["age"])
get "greetme2" (bindQuerystring greet2 >> Result.view "someview")

Note also how function composition make it easy to work at any level of abstraction (bindQuerystring is in Figment.Binding)

Filters

Filters are just functions with this signature:

type Filter = FAction -> FAction

With this simple abstraction we can implement authorization, caching, etc. For example, here's how to apply the equivalent of a RequireHttpsAttribute:

get "securegreet" (requireHttps greet')

requireHttps and others live in Figment.Filters.

Routing DSL

Sometimes you need flexibility when defining a route. For example, use a regular expression, or check for a mobile browser. Enter Figment.RoutingConstraints. A routing constraint is a function like this:

type RouteConstraint = HttpContextBase * RouteData -> bool

It returns true if it's a match, false if it's not. It's applied with the action router:

action : RouteConstraint -> FAction -> unit

A trivial route constraint:

let unconstrained (ctx: HttpContextBase, route: RouteData) = true
action unconstrained (content "Hello World")

would map that content to every URL/method. You might think that taking a single constraint is useless, but they can be combined with a few operators to create a small DSL:

let ifGetDsl = ifUrlMatches "^/dsl" &&. ifMethodIsGet

action
   (ifGetDsl &&. !. (ifUserAgentMatches "MSIE"))
   (content "You're NOT using Internet Explorer")

action ifGetDsl (content "You're using Internet Explorer")

Hopefully this last sample was self-explanatory!

Strongly-typed routing

A couple of blog posts ago I briefly mentioned using PrintfFormat manipulation to define strongly-typed routes. This is what I meant:

let nameAndAge firstname lastname age =
   sprintf "Hello %s %s, %d years old" firstname lastname age
   |> Result.content
getS "route/{firstname:%s}/{lastname:%s}/{age:%d}" nameAndAge

This actually routes and binds at the same time.

Conclusions

As I said, this is very much work in progress, and there's still a lot to do. I intend to make it fully open source when I finish writing my thesis. I'll have to analyze tons of web frameworks, in particular functional web frameworks, so hopefully I'll pick up some interesting stuff from Happstack, Snap, Haskell on a Horse, etc. In particular, I'm interested in implementing formlets, IMHO one of the coolest features of WebSharper.

Source code is here.

Wednesday, July 28, 2010

Compiling LaTeX without a local LaTeX compiler

 

I recently started working on my (long long overdue) master's thesis, and naturally I came to LaTeX for typesetting. As a LaTeX newbie*, I was pleased to find out that compiling LaTeX is just like compiling any program: there's a bunch of source code files (.tex) and resources (.bib, .sty, images) which you feed to preprocessors and compilers to yield a dvi/ps/pdf document. TeX is actually a turing-complete language, and LaTeX is a macro collection and a preprocessor for TeX.

There's even an online package repository!

And since writing LaTeX has so many similarities with writing code, even if you're just writing text and using existing LaTeX macros and not developing your own, some of the development practices can be applied here as well, like a build system, refactoring to keep the document manageable, version control, and continuous integration.

Continuous integration is nice not only because it asserts that the document is valid (yes, you can screw up TeX documents), but also you automatically get the resulting PDF as a build artifact. You could even hook a diff step in the process, to make reviews easier.

But finding a public continuous integration server is hard, not to mention one with a LaTeX distribution installed. Luckily, LaTeX compilation can be delegated to a remote server, thanks to CLSI (Common LaTeX Service Interface). So all the integration server has to do is forward a compilation request to the CLSI server when it detects changes in the VCS.

clsi (1)

The arrow directions in the graphic indicate information flow, and the numbers order the actions.

How you implement this depends on what the integration server supports. For example, if it's Windows you can send a manually assembled compilation request (it's a simple XML) with a JScript script.

You can also use a CLSI server as a replacement for a local LaTeX installation. In that case you have to send the actual .tex file contents instead of just an URL, wrapped in a CLSI XML compilation request. I wrote a simple CLSI client in F# to do this, so a compilation script looks like this:

I'm using git and github for version control and ScribTeX's CLSI server, which is the only public CLSI server I have found so far. Kudos to ScribTeX for that! It's very fast (even faster than compiling locally!) and open source.

If you just want an online editor with integrated revision control and preview, check out ScribTeX's editor.

Also worth checking out is LaTeX Lab, a very cool open source, online LaTeX editor for Google Docs, by the same guy that developed CLSI (Bobby Soares).  It's currently on a preview release but it's very usable.

* Actually, I did use LaTeX several years ago, but I did so through LyX, which combines LaTeX with WYSIWYG, so I didn't really learn LaTeX.