Friday, April 07, 2006

C# futures

Its been interesting to see the path static languages like C# and Java have taken over the past year with their 2.0 and 5.0 releases respectively. There has been a drive towards more 'staticness' with generics i.e. more type specification at compile time. This has generally been seen as a good thing... more descriptive, more type safe and more performant (in the case of C# ;-) code. But then you have dynamic languages like Python and Ruby which are essentially the complete opposite with no static type specification at all. All the variables only have dynamic types with everything being inferred.

What's even more interesting is that inspite of this core difference C# has borrowed features for its 2.0 release from dynamic languages with more on the way for 3.0.

Iterators are a well known design pattern for traversing collections. They are a good way to loosley couple collections from the actual iterating process. So one can have multiple iterators, iterating collections in different ways. Iterators are sprinkled all throughout the Java Collection Framework and similarly C# has Enumerators having pretty much the same interface. Creating these iterators/enumerators invovles creating classes which keep track of state. Thats pretty much all they do... some logic to know where you are and to provide the next element. C# 2.0 introduced the yield keyword which generates iterators dynamically that manage the state automatically. Ruby and Python both have it. Here's a simple ex...

class MyCollection {
  private int[] myElements;

  public IEnumerator GetEnumerator() {
   foreach ( int i in this.myElements ) {
    yield i;

Thats it. No creating a class which implements IEnumerator. It's all generated automatically dynamically. Huge productivity booster. Complements the foreach functionality nicely.

C# 3.0 which is a ways off from being released has a lot more in store. Probably the biggest annoucement was about LINQ (Language Integrated Query) which introduces new syntax within the language to work with datasets - collections, relational databases (DLINQ) and xml (XLINQ). This major feature brings with it many smaller ones which again seem to borrow a lot from dynamic languages...

Probably, the most surprising one is Implicit Typing. You can do things like

var i = 1;
var s = "string";
var d = 1.11;
var numbers = new int[] { 0, 1, 2 };

Surprising since it goes against what C#/Java type languages have been known for. But this is a feature needed to make LINQ work since you don't know the final type which will be the result of queries.

Another interesting one is Extension Methods. Ruby has this feature. You can add new methods to existing types without being part of any type hierarchy. You can even add methods to sealed classes like String. As an example, think about a method that checks if a string is a palindrome. Normally, one would create something like a Palindrome class which has a static isPalindrome method which takes a string... Palindrome.IsPalindrome( text ). Pretty inelegant. With extension methods, you can define a method like this

static boolean IsPalindrome( this string text ) { ... }
And call it like this

string text = "civic";
bool palindrome = text.IsPalindrome();

It's a pretty cool feature which, again, is needed to add some LINQ funtionality. But I think there is potential for abuse here and without proper documentation it might cause some confusion.

Lambda Expressions have been available in many languages for a while. C# is finally getting this feature in 3.0. These expressions are popular when filtering datasets and as you can imagine would be an integral part of LINQ. Here's a simple example...

List<int> numbers = new List<int>;
numbers.add( 0 );
numbers.add( 1 );
numbers.add( 2 );
numbers.add( 3 );
numbers.add( 4 );

List<int> evenNumbers = numbers.FindAll( i => ( i % 2 ) == 0 );

So FindAll() will filter the list based on the lambda expression. Syntax seems a bit strange.

A final interesting feature was Anonymous Types. Languages like Python and Ruby have this concept of a tuple which can hold multiple values. So you can have methods returning multiple values. In C# or Java this isn't possible. What many end up doing is to return an array with 2 or more values. Pretty inelegant. Or you have to actually define a type which just holds those values and return that. It's a hassle. With anonymous types you can again dynamically create types without (as the name would suggest) giving it a name...

var person = new { Name = "C Sharp", Age = 4 };
Console.WriteLine( "Name: {0}, Age: {1}", person.Name, person.Age );

Again, as you can guess this is another needed feature for LINQ.

I haven't mentioned much about LINQ itself since I've only read a little about it and seen a video by the man. So dunno a lot of details myself. What would be interesting is to see all the IL that is generated to make all these abstractions work.

Anyway, it's something to look out for. Maybe Java will also be including some data related features for their Dolphin release.

No comments: