Thursday, December 25, 2003

Assemblies and Metadata

Assemblies are just an abstraction... as in there is no .asm file extention. Physically assemblies exist as DLLs and EXEs. DLLs and EXEs are basically Potable Executable (PE) files stored in the Common Object File Format (COFF). This is just some format for storing files in Windows (maybe in other OS's as well... I'm not sure). When you execute either a DLL or an EXE, the OS loader will open up the PE file, process the information in there (which is in COF format) and use that to run it. So, I guess since you can't really execute DLL's directly, the OS loader will throw an exception or something. For EXE's, it will look for the program entry point (main) and run that. Now, assemblies are just PE files, with a .DLL or .EXE extention, but slightly modified. They add some additional data (headers) in these PE files that says tells the OS loader it needs to be managed and executed by the CLR. So, now when the OS loads an assembly, it recognizes that there is a CLR header and tranfers control to the CLR.

Assemblies contain a manifest, one or more modules and any resources like images. They can be either a single file assembly or a multi file assembly. A single file assembly contains everything in one file (gotta be a genius to figure that out?). With multi file assemblies you can have modules as separate entities and just have a reference to them in the assembly manifest. I suppose you can think of them as by value and by reference.

The main difference between a module and an assembly is that ONLY assemblies have a manifest and only assemblies can be executed by the CLR. Modules have metadata about the types it exposes, but since there is no manifest, the CLR doesn't have the information to load/verify/execute it.

Modules contain metadata and IL code. What exactly is the metadata? Basically, it is binary
information that describes every type and member defined in your code. It includes the name and visibility (public/private) of the types and also what base classes/interfaces they implement. And what members (methods/fields/properts/etc...) the types define. These are stored in tables. For ex, there is a Methods table which tells you all the methods that are defined. Each row in the table is given a unique name (token). The IL code, which is also part of the module, refers to this token in the code when it has to call methods... sort of like a pointer.

So I guess this will be much easier to understand if you see an example. In the process I'm going to take my obsession with Oasis to new heights...


// Album.cs

namespace Albums
{
   public interface IAlbum
   {
      // Properties

      string Name
      {
            get; // Read only
      }

      int NumberOfSongs
      {
            get; // Read only
      }



      // Methods

      string BestSong();
   }
}

This is just an Interface "IAlbum" within the namespace "Albums". I'm going to compile this to a library (dll) assembly...

csc /target:library Album.cs

csc is the C# compiler. /target:library tells the compiler to compile it to a DLL.

I get "Album.dll". This is an assembly with a manifest and some IL. Nothing great here
since this is just defining an interface.

Now we define a couple of albums that implement this interface...

// DM.cs

namespace Albums
{
   public class DM : IAlbum
   {
      // Fields

      private string name;

      private int numberOfSongs;



      // Properties

      public string Name
      {
         get
         {
            return this.name;
         }
      }

      public int NumberOfSongs
      {
         get
         {
            return this.numberOfSongs;
         }
      }



      // Methods

      public DM()
      {
         this.name = "Definitely Maybe";

         this.numberOfSongs = 11;
      }

      public string BestSong()
      {
         return "Live Forever";
      }
   }
}


// WTSMG.cs

namespace Albums
{
   public class WTSMG : IAlbum
   {
      // Fields

      private string name;

      private int numberOfSongs;



      // Properties

      public string Name
      {
         get
         {
            return this.name;
         }
      }

      public int NumberOfSongs
      {
         get
         {
            return this.numberOfSongs;
         }
      }



      // Methods

      public WTSMG()
      {
         this.name = "(What's The Story) Morning Glory?";

         this.numberOfSongs = 12;
      }

      public string BestSong()
      {
         return "Wonderwall";
      }
   }
}

I'll compile both of these as modules...

csc /target:module /r:Album.dll DM.cs

csc /target:module /r:Album.dll WTSMG.cs

/target:module tells the compiler to compile to a module which is of extension .netmodule.
/r:Album.dll tells it that this module uses some types from "Album.dll" (IAlbum).

I get "DM.netmodule" and "WTSMG.netmodule". They ONLY contain metadata about the types it exposes (class DM and class WTSMG), members and every method's IL code. I can't use these two modules anywhere. The CLR can't execute them since there is NO manifest to query. To be able to use these you have to add them to an assembly...

csc /target:library /addmodule:DM.netmodule;WTSMG.netmodule /out:Albums.dll

This tells the compiler to create a dll called "Albums.dll" and to add the DM and WTSMG modules to it.

Now "Albums.dll" has a manifest and two modules. We can now use "Albums.dll" from other
assemblies...

// Oasis.cs

using Albums;

namespace Bands
{
   public class Oasis
   {
      // Fields

      private IAlbum firstAlbum;

      private IAlbum secondAlbum;


      // Properties

      public bool IsBestBandEver
      {
         get
         {
            return true;
         }
      }

      public IAlbum FirstAlbum
      {
         get
         {
            return this.firstAlbum;
         }
      }

      public IAlbum SecondAlbum
      {
         get
         {
            return this.secondAlbum;
         }
      }


      // Methods

      public Oasis()
      {
         this.firstAlbum = new DM();

         this.secondAlbum = new WTSMG();
      }
   }
}

And again compile this to a dll assembly... "Oasis.dll"

csc /target:library /r:Album.dll;Albums.dll Oasis.cs

Here we tell the compiler that we refer to types from both "Album.dll" (IAlbum) and from "Albums.dll" (DM and WTSMG).

So what we have so far is two single file assemblies - Oasis.dll and Album.dll and a multi file assembly - Albums.dll which reference DM.netmodule and WTSMG.netmodule.

Finally we have the App exe.

// App.cs

using System;
using Bands;
using Albums;

namespace Assemblies
{
   public class App
   {
      public static void Main()
      {
         Oasis o = new Oasis();

         Console.WriteLine();
         Console.WriteLine( "Is Best Band Ever? {0}", o.IsBestBandEver );
         Console.WriteLine();

         IAlbum album;

         album = o.FirstAlbum;

         Console.WriteLine( "First Album:" );
         Console.WriteLine( " Name: {0}", album.Name );
         Console.WriteLine( " Number of Songs: {0}", album.NumberOfSongs );
         Console.WriteLine( " Best Song: {0}", album.BestSong() );
         Console.WriteLine();

         album = o.SecondAlbum;

         Console.WriteLine( "Second Album:" );
         Console.WriteLine( " Name: {0}", album.Name );
         Console.WriteLine( " Number of Songs: {0}", album.NumberOfSongs );
         Console.WriteLine( " Best Song: {0}", album.BestSong() );
      }
   }
}

Now we create an exe we can execute...

csc /target:exe /r:Album.dll;Oasis.dll App.cs

Here we tell it to create an exe using /target:exe and tell it we refer to types from "Album.dll" (IAlbum) and from "Oasis.dll" (Oasis). We get App.exe

When we run this we get...

Is Best Band Ever? True

First Album:
    Name: Definitely Maybe
    Number of Songs: 11
    Best Song: Live Forever

Second Album:
    Name: (What's The Story) Morning Glory?
    Number of Songs: 12
    Best Song: Wonderwall


So, hope this has explained a bit (better) about assemblies and metadata.

As of now the need for config files has not arisen. I'll write about that in the next blog.

No comments: