Archive for February 2007

Parallel Web with Yahoo Pipes

Yahoo! Pipes is a great idea implemented by Yahoo! with an incredibly amazing user interface. When I first played with I just shocked and suspected about flashy stuff. However it is based on only Javascript with the help of Yahoo’s great Yahoo! User Interface Library on the back-end.

The idea is simple to explain if you know the pipes from the operating systems. Basically linking the output of something to the input of something else or vice versa. Something in this case is the RSS feeds or web services provided from different sources.

Pipes are one of the main concepts in parallel computing for pipelining the process. Since web is enormously distributed, what you can do is infinite. There are some operators available like for each, count, sort, filter that you can do various operations to the feeds. So with this tool, you can parallellise web and get the processed result.

The create pipe user interface is very similar to the IDE interface even it has a debugger as we used to do. On the left you have the toolbox and you have the design surface where you drag and drop the components.  Beside creating your own pipes, you can also use people’s pipes as well.

Emulated Managed Windows Pipes with Standard Input and Output

Pipes are around since Unix, Linux and Windows 95. Pipes are a way of communication between two programs. It’s a virtual channel to the process. Normally it is a system function call in the kernel.dll.

A pipe is a one way channel that you can write or read. So to write and read basically we need to have a two channels. One for reading and one for writing.

In managed world, we don’t have spawn (Windows) function or fork (Linux) function to create a copy of the process. So the first trick is to create the same process with different arguments. Than redirect the standard input and output to our current process. Basically these are kind of pipes. You may know the pipe operator (|) available from the shell which redirects the standard output of the program to the standard input of the main process. What’s more in this is we write to the child process’ standard output. So we program in a way that the child process waits some commands from the parent process, execute the command and output results to parent process.

In this sample there are two processes, talking to each other.

using System;
using System.Collections.Generic;
using System.Text;
 
namespace WinPipes
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length == 0)
            {
                //parent
                Console.WriteLine("Parent");
                System.Diagnostics.Process proc = new System.Diagnostics.Process();
                proc.StartInfo.FileName = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
 
                proc.StartInfo.Arguments = "child";
                proc.StartInfo.UseShellExecute = false;
                proc.StartInfo.RedirectStandardOutput = true;
                proc.StartInfo.RedirectStandardInput = true;
 
                proc.Start();
                System.IO.StreamReader str = proc.StandardOutput;
 
                Console.WriteLine("How many messages ?");
                int messageCount ;
                int.TryParse(Console.ReadLine(), out messageCount);
 
                proc.StandardInput.WriteLine(messageCount);
 
                for (int i = 0; i < messageCount; i++)
                {
                    Console.WriteLine("Sending Hello" + (i + 1));
                    proc.StandardInput.WriteLine("hello" + (i + 1));
                    Console.WriteLine(str.ReadLine());
                }
                Console.ReadLine();
            }
            else
            {
                // child
                int count = Convert.ToInt32(Console.ReadLine());
                for (int i = 0; i < count; i++)
                {
                    Console.WriteLine("Re - " + Console.ReadLine());
                }
 
            }
        }
    }
}

Parent
How many messages ?
3
Sending Hello1
Re - hello1
Sending Hello2
Re - hello2
Sending Hello3
Re - hello3

Pipes are not available in .Net Framework 2.0 and 3.0. However .Net Framework 3.5 will include managed classes for pipes. The best way to do pipes in a system level to invoke the system calls using some unmanaged code. For kernel the function name is CreatePipe or CreateNamedPipe. There is also a function int the standard C library called _pipe. Pinvoke.net is the best reference for calling kernel functions from the managed environment.