Archive for January 2008

Distributed Functional Programming with F# MPI Tools for .NET

Introduction

For many years, parallel computing is an important area for research in high performance computing. Super computers dominated the industry all the time. However with the cost of obtaining a fast computer and a fast network, cluster computing considered as a good alternative. High Performance Computing market grew rapidly, mainly because of the clusters intensified. According to a research, clusters represent 50% of the High Performance Computing system revenue at the end of 2005.

The idea of cluster computing is to have many machines on a high-speed network, clusters of computers running the same program. Recently, with the invention and adoption of multi-core CPU systems for desktops, it has become even more important. MPI makes even easier for people to build supercomputers by the usage of powerful computers, high speed networks and powerful libraries.

Message Passing Interface (MPI) is the standard of message passing in a distributed computing environment. Its benefit for researchers is invaluable.

MPICH  is an open source, portable implementation of Message Passing Interface (MPI) for developing distributed memory application .

The goal of MPI Tools is to make easy to write programs that runs on a cluster of machines. Also make the transition and the portability easy for existing programs in cluster. Using MPITools, it is possible to create distributed functional applications with F#. Although it is primarily developed for .NET framework, it can run on any CLI implementation.

Implementation

The first step involved to make MPICH available to use for F# platform. A wrapping library is implemented for MPICH. Mainly used MPICH functions made available to F#. Those MPI functions are implemented with the effective usage of types. MPI_Init,, MPI_Comm_size, MPI_Comm_rank, MPI_Finalize, MPI_Send, MPI_Recv, MPI_Abort,  MPI_Barrier, MPI_Bcast , MPI_Gather, MPI_Scatter, MPI_Reduce. In reality, you could write distributed programs with just the first six of those function as I will show on the samples.  When using the library you don’t have to worry about the types and data sizes as you usually do in C programming. The only thing important is the order of communication the same as socket programming.

Because MPICH is unmanaged library, it important to make the data types compatible using the interoperability libraries in .NET framework.  All of the exposed data types and functions are defined in “mpi.h” file in MPICH distribution. If you want to use a different MPI implementation then it is needed to change those functions definitions appropriately based on the documentation.

[<DllImport(@"mpich2.dll",EntryPoint="MPI_Send")>]
    extern int MPI_Send( void *buf, int count, int MPI_Datatype,
                                          int dest, int tag, int MPI_Comm)

Once the value data types such as int, char, byte, double and float types implemented which are pretty same with C implementation. Next step was to make the reference types of the virtual machine available. Unfortunately not all reference types are possible to send out to wire because of the state or impureness of the type. The types have to be serializable in order to send or receive. To make that possible binary serialization is used and passed as a byte array to the MPI. Implementing the reference types made also possible to pass the functions and lambda functions in to the channel.

For the MPI development, the key factor is the data types. The parties have to agree with the file types. Also the size of the file types should be fixed in order to communicate. However the types are properly handled by the library using the sophisticated type system capabilities. In the programs the order becomes really important. In order to get to the internals of MPI Tools, here is an implementation for standard MPICH type definitions and a type converter for it (shortened for simplicity).

 type MPI_Datatype = 
        | MPI_CHAR           =  0x4c000101
        | MPI_SIGNED_CHAR    =  0x4c000118
 
 let private TypeConvert (t) =                  
         let res =
          match (box t) with         
            | : ? byte    -> MPI_Datatype.MPI_BYTE         
            | : ? char    -> MPI_Datatype.MPI_CHAR
            | _       ->    failwith "not implemented data type 
         Enum.to_int res

The complicated, many parameter function calls in the unmanaged MPICH library becomes powerful function with a few arguments in the .NET library. For instance previously defined 6 argument MPI_Send function becomes a three argument polymorphic function. To make it easy, actually send function becomes in different flavours. Actually most of the communication functions come in different versions for different types. The version below is used for singular types. There are two more flavours one for arrays and other for matrix types.

let send(data,destination, tag) 
      'a * int * int -> unit 
 
let sendArray(data : 'a array,destination,tag) 
      a array * int * int -> unit 
 
let sendMatrix (data : matrix,destination,tag) 
      matrix * int * int -> unit

Similarly, the same pattern goes for the receive function. However, this time it is needed to specify the return type as a generic argument of the function.

 
let receive<'a>(source,tag)
      int*int -> 'a 
 
let receiveArray<'a> (source, tag) 
    int * int -> 'a array 
 
let receiveMatrix(source, tag) 
   int * int -> matrix

The other functions of MPI are implemented in a similar manner. You could also check the project as a tutorial as well. The library uses effectively active patterns, discriminated unions, interoperability and other functional structures

Usage

First of all MPICH needs to be installed prior to usage. The library is used just like another .NET library in your programs. However the execution is relatively different than usual. The programs have to be executed using the MPI daemon called “mpiexec”. You could look at more on how to configure a cluster in the MPICH documentation. To run the process in n processor or processes ā€œ-nā€ switch needs to be given as a command line argument followed by the name of the compiled program.

mpiexec -n 2 test.exe

Here is a very simple ping pong application using MPI Tools. You can find more samples on the MPI Tools Source code.

#light
#I @"..\MPITools.Bindings\"
#r @"MPITools.Bindings.dll"
open MPITools
 
MPI.initialize()
 
let procSize = MPI.size()
let curProcess =  MPI.rank()
 
let pingpong() =
    if curProcess = 0 then
        let i =  0
            MPI.send(i,1,0)
            let b = MPI.receive<int>(1,0)     
            ()        
    elif curProcess = 1 then        
            let b = MPI.receive<int>(0,0)
            MPI.send(b+1,0,0)            
pingpong()
MPI.finalize()

Conclusion

You can download MPI Tools from codeplex. Using MPI Tools, the distributed programs will be short, expressive and well typed with the help of the glorified type system of F#.

MPI Tools is built with F# 1.9.3.7 Compiler for the .NET Framework 2.0. However it would possibly work with any CLI implementation. In the future, some more MPI functions will be implemented, including some helper functions that hides the imperative style programming. and the side effects.

I hope it will help to solve your high computation problems effectively. Please feel free to ask questions or to contribute to the project.

Have fun!

Power of Functional Programming, its Features and its Future

Power of Functional Programming

Functional programming is one the oldest of major programming paradigms. Functional languages have been with us for a while. Languages like Lisp, Scheme, ML, OCaml, Haskell, Erlang and F# have well built compilers and tools and large user and development communities. However, the early success of the imperative programming languages made the procedural languages extremely popular for more than three decades. This lead to the rise of the object-oriented paradigm and formed the basis of commercial software development. Object oriented programming is still the most popular paradigm today.

Functional programming is a programming paradigm that uses the functions in their real mathematical sense. This means that functions are only computation objects where there is no mutable data and state information. This way it is more close to mathematical expressions. In contrast to imperative programming that is desperately dependent on the state of the objects, functional programming views all programs as collections of functions that accept arguments and return values.

Modern functional programming languages have a number of different techniques to help program development. That includes but not limited to the

  • Powerful typing systems : It is accomplished with the support of polymorphism, and type inference
  • Higher order functions: The higher-order functions are based on the concept of the first class values as functions mainly functions are actually like data.
  • Implicit recursion: Implicit recursion is supported functional languages through the powerful functional data structures and also with higher-order functions such as fold, map and filter.

In programming languages, static typing means the type of expression is determined at compile by a technique called static program analysis. On the other hand dynamically typed languages, determines the types at the runtime by the aid of a runtime type checker. Personally, I like everything to be typed at the compile time with an extensive support of polymorphic types; because it allows seeing the errors at compile time and also it gives an optimised performance because there is no need for a runtime type checker. Luckily most of the functional languages benefit from the polymorphic statically typed systems.

Pureness is another concept applied by some functional languages like Haskell. On the other hand, it might be a style of programming even if it is not forced by the compiler. In pure functional programming, side effects are not allowed in the program with immutable variable and no loops. Immutability means that when a value is assigned to an identifier (not a variable), it cannot change anymore. Loops can be achieved with recursive functions. This style is a bit more difficult to program and read as well, but the benefits could be massive. For instance, it might be possible to optimise the compiled code for multiple cores, because of the compositionality of the functions that form the program.

The Features of Functional Programming

Functional Programming is not all about functions treated as first class values and immutable state. Functional programming also provides powerful features that every programmer should benefit.

Lambda calculus is a formal system designed to investigate function definition, function application and recursion. Lambda calculus could be used to define a computable function. Lambda calculus is used to develop a formal set theory. Function can be passed as an argument to other functions. There is another concept called curried functions when using lambda functions. The function reduces the term and returns another function with the normal form. This is called curried functions. In the lambda calculus, functions can only be created using another method, with higher-order functions and currying, another function creation is provided. A function in curried form is called partial application.

The calculus has only functions of one argument. In the curried function systems, a function with multiple arguments is expressed using a function whose result is another function. Every argument is reduced by default and returns a function.

Pattern matching is another powerful concept that functional and logic languages sport. It is used for assigning values to variables and for controlling the execution flow of a program. Pattern matching is used to match patterns with terms. If a pattern and term have the same shape then the match will succeed and any variables occurring in the pattern will be bound to the data structures which occur in the corresponding positions in the term.

Recursion is a mechanism for iterating an instruction or simply for a code reuse. By recursion it is possible to write a compact program that can help generalization. It is mainly used in functional programming.

Nowadays collections are the most important data structures when writing programs in any programming languages because of the big datasets. In most of the functional languages there are special functions to make the programmers jobs easy. By the usage of first class functions and those methods ends up in a good way to handle the collections. Some of those higher-order functions are map, fold, filter etc.

  • Map : applies the function passed to each element of the collection. Resembles to for each loops for collections in procedural languages.
  • Fold : applies the function to each element while sharing a resulting object. One common uses us to apply a sum operation to each member. It is similar to SQL aggregate functions but we pass the applying function.
  • Filter : applies the filter function to each member of the collection and returns a list of object satisfying the conditional function.

Some pure languages implement those higher-order functions in a continuation style. But it could be implemented by simple loops in unpure languages as well.

Functional languages help rapid prototyping with the aid of powerful type systems and usually with an interactive window that allows executing expressions one or more at a time. It enables to code complex algorithms without ignoring the mathematical representation. It is much easier to create domain-specific languages (DSLs). It improves the productivity of the developer.

DSLs are specific programming languages made for use in a specific application domain. Programs written with a DSL are more clear and readable than those written with general-purpose languages. DSLs are more declarative than the imperative languages and they focus on the problem rather than the rules of the language. Metaprogramming is building a program which manipulates the syntactic structures of other programs. DSLs and metaprogramming are closely related to each other. It is possible to build a domain specific language by the metaprogramming features of a language.

The Future of Functional Programming

Object oriented paradigm is crucial in programming current industrial applications. A functional language has usually been considered as an academic language but I believe that this is going to change in the real feature, because functional programming has a big potential in this demanding industry. The expressiveness, powerful unique concepts such as laziness, immutability, powerful pattern matching, continuations etc. and the elegant style of programming makes the implementation of some tools easier than other paradigms. Tools such as static analysis tools, high-level modelling, compilation, interpretation and verification tools are one of the target areas of functional programming. Moreover one of the todays hot topics domain specific languages, eventually meta-programming and need for a “uniformed single language” for all solutions makes functional programming very important. Those tasks could be implemented with other languages as well, but functional languages fit better than the others. Also the distribution of tasks in multi-cores or multiple machines might be possible with functional programming if the program is written in an immutable style.

Considering today’s facts, running systems, interoperability with the systems, non-functional world; using functional programming shouldn’t mean staying functional all the time. For the difficulty of some problems and previous experience of the developers it is inevitable to use procedural and object oriented paradigms. That’s how the mix functional object oriented programming emerged and affected the programming languages.

Not surprisingly, some companies already work with functional programming in industry. For the moment their main areas of development is design, modelling, specification, or build compiler tools. But this is yet to change and augment.

Automatic iTunes Folder Playlist in Javascript and F#

The other day, I was struggling with my playlist on itunes. As you may know iTunes doesn’t have the notion of the folder structure of your mp3s. So if you do organise your music in folders you won’t have them in iTunes. I decided to write a script using the iTunes COM SDK for doing that.

Here is the scenario all the music is stored in d:\Music. So the script starts exploring everything inside in the folder and create an entry in iTunes, and it happens only in one level. First I did it with JavaScript with a lot of pain. I was pain because of the ugly interface and also unclear documentation. Luckily there were the samples in the SDK to play with it.

var	iTunesApp = WScript.CreateObject("iTunes.Application");
var	mainLibrary = iTunesApp.LibraryPlaylist;
var	mainLibrarySource = iTunesApp.LibrarySource;
var	tracks = mainLibrary.Tracks;
var	numTracks = tracks.Count;
var numPlaylistsCreated = 0;
var ITTrackKindFile	= 1;
var	albumArray = new Array();
var	playlists = mainLibrarySource.Playlists;
 
for (var i = 1; i <= numTracks; i++)
{
	var	currTrack = tracks.Item(i);
	if (currTrack.Kind == ITTrackKindFile)
	{
		if (currTrack.Location != "")
		{
            var p =currTrack.Location
 
            var location = p  
	        if ((location != undefined) && (location != ""))
	        {
		        if (albumArray[location] == undefined)
		        {		    
			        albumArray[location] = new Array();
		        }
 
		        albumArray[location].push(currTrack);
	        }
         }
	}
}
WScript.Echo("   Tracks Read " + numTracks);
for (var albumNameKey in albumArray)
{
    var trackArray = albumArray[albumNameKey];
    var p = albumNameKey;
    var ignoreFirst =   p.indexOf("music\\",0); //3     
    var firstChar = 0;             
    var secondChar = 0;         
    var firstChar = ignoreFirst +5 ;
    var myArr = new Array();
    while (true)
    {
        var firstChar = p.indexOf("\\", firstChar);
        var secondChar = p.indexOf("\\", firstChar+1);
 
        if (secondChar == -1 )
            break;
        var newStr = p.substring(firstChar+1, secondChar)
        myArr.push(newStr);
        firstChar = secondChar;
    }
    var plist = null;
    if (myArr.length == 0)
    {
        if (playlists.ItemByName("_") != null)
        {
          plist = playlists.ItemByName("_")   ;
        }
        else
        { 
            plist = iTunesApp.CreatePlaylist("_");
            numPlaylistsCreated++;
        }
    }
   else if ( playlists.ItemByName(myArr[0]) != null)
    {
        plist = playlists.ItemByName(myArr[0])
    }
    else 
    {
        plist = iTunesApp.CreatePlaylist( myArr[0]);
        numPlaylistsCreated++;
    }
    for (var trackIndex in trackArray)
    {
        var		currTrack = trackArray[trackIndex];
 
        plist.AddTrack(currTrack);
    }
}
	    //create playlist
if (numPlaylistsCreated > 0)
{
		WScript.Echo( numPlaylistsCreated + " (s) created.");
}
else
{
	WScript.Echo("No playlist createds");
}

.

(It is written in a quick way without caring about the beauty :) )

It was doing its job that I wanted but I also wanted to have my folder structure in itunes rather than a single entry for a folder. This time I implemented in F#. I have chosen F# mainly because of the type inference and the intellisense. In JavaScript without the documentation, you couldn’t do anything. F# is also chosen because of interoperability with COM and friends. So I have to admit that I didn’t like the COM interface of iTunes. All the best F# implementation with more functionality gave %40 less code

#light
#r "Interop.iTunesLib.dll"
open iTunesLib
open System.IO
open System.Text.RegularExpressions
 
let itunes = new iTunesLib.iTunesAppClass()
let playLists = itunes.LibrarySource.Playlists  
let tracks = itunes.LibraryPlaylist.Tracks
 
let track = tracks.Item(2)
for track in tracks do
    if track.Kind = ITTrackKind.ITTrackKindFile &&
 (track:?> IITFileOrCDTrack).Location <> null then
        let track = track :?> IITFileOrCDTrack        
        let regex = new System.Text.RegularExpressions.Regex( @"^( ?<Drive>([a-zA-Z])):\\_music\\
((?< directory>[\w\W]+)\\)*((?< 
playlist>[\w\W]+)\\)(?< 
filename>([\w\W]+.mp3))",
            RegexOptions.IgnoreCase)
 
        printf "%d\n" track.Index
 
        let resMatch = regex.Match(track.Location)       
        if resMatch.Length = 0 
            ()
        else                       
            let lastFolder = 
             resMatch.Groups.Item("directory").Captures > Seq.untyped_fold
                (fun (acc:IITPlaylist) (a: (System.Text.RegularExpressions.Capture)) ->                 
                    match (playLists.ItemByName(a.Value))  with
                     null -> 
                        if acc.Index = 1 then
                            itunes.CreateFolder(a.Value)     
                        else
                            (acc :?> IITUserPlaylist).CreateFolder(a.Value)
 
                     a1 -> 
                        if (a1 :?> IITUserPlaylist).SpecialKind = ITUserPlaylistSpecialKind.ITUserPlaylistSpecialKindFolder then
                            a1
                        else 
                          itunes.CreateFolder(a.Value)                          
                )
                (playLists.Item(1) )
 
            let pListName = resMatch.Groups.Item("playlist").Captures.Item(0).Value
 
            let lastPlayList= 
                match (playLists.ItemByName(pListName))  with
                 null ->
                    if lastFolder.Index = 1 then
                        itunes.CreatePlaylist(pListName) :?>    IITUserPlaylist    
                    else 
                        let fol = lastFolder :?> IITUserPlaylist                
                        fol.CreatePlaylist(pListName) :?>  IITUserPlaylist
                 a2 ->
                     if (a2 :?> IITUserPlaylist).SpecialKind = ITUserPlaylistSpecialKind.ITUserPlaylistSpecialKindFolder then                    
                        match (playLists.ItemByName("_" + pListName))      with
                         null ->   (a2:?> IITUserPlaylist).CreatePlaylist("_" + pListName) :?> IITUserPlaylist
                         a3 ->a3    :?> IITUserPlaylist
                     else
                        a2     :?> IITUserPlaylist
 
 
            lastPlayList.AddTrack(ref (track :> obj)) > ignore
done

.

The folder playlist is not implemented on iPod. Although it make sense to use in iTunes, there is no point doing it for iPod. I was a bit frustrated, it was my sole purpose. Anyway it was nice to play with it…

So I hope it will be helpful in a way, the folder playlist didn’t help me, maybe in the next firmware update… On the other hand, I still use the first implementation for my automatic one-level folder playlist.

Happy New Year

Happy new year to everyone! 2007 has been a great year for me. 2008 didn’t start bad either, I hope to start a new project in a week or so which I’m really excited about it.

I wish you a great new year with lots of fun and happiness. I hope a lot more positive new year for all.

Recently I received a copy of the book “Expert F#’. I will write a broad review when I finish it. For the moment it really keeps me busy (in a good way). Although I have read more than half of the book, I still enjoy and find a lot of interesting topics. So I would definitely recommend to have it in your bookshelf if you want to have an advanced look at the functional programming in .NET.

Anyway it’s time to work again.And yes! Revitalised for more coding this year!