.NET Assembly Language

There’s always a .NET version of pretty much every programming language.

  • C++ = Visual C++
  • BASIC = Visual Basic .NET
  • C++ & Java = C#
  • Java = J#
  • Python = IronPython
  • Ruby = IronRuby
  • Scheme = IronScheme
  • Lisp = L#
  • Ada = A#
  • Prolog = P#
  • Cobol = Visual Cobol
  • Delphi = Delphi.NET
  • Smalltalk = #Smalltalk

But all of these languages are worthless…without CIL. CIL stands for Common Intermediate Language, it is the lowest level human readable language used by the .NET Framework. It is like the .NET Assembly language that all the .NET languages compile to. Unlike real assembly, CIL doesn’t use registers and is entirely stack based. If you try to disassemble .net programs, you get nothing but irrelevant native asm code. But there is a special tool for disassembling .net programs that comes with the .NET SDK called ILDasm; also there’s ILASM for assembling CIL language into the .NET bytecode.

There’s not much tutorials on CIL (formally called MSIL (Microsoft Intermediate Language)), I learned by creating simple .net programs and then disassembling to their CIL code with ILDasm. But the time and effort was well worth it. Any n00b who sends you a trojan and/or keylogger that they made vb.net, you can disassemble it and find out their email/ftp and pass of where the logs get sent to and hack them back (There’s also a tool in .NET SDK for code obfuscation to prevent this from happening to you).

Before we get into this garbage, we need to find ildasm.exe. You can find it in C:\Windows\Microsoft.NET\Framework\VERSION. Now here’s a hello world (with pause):

.assembly extern mscorlib {}

.assembly Test {
    .ver 1:0:1:0

.method void main() {
    .maxstack 1
    ldstr "Hello world!"
    call void [mscorlib]System.Console::WriteLine (string)
    call int32 [mscorlib]System.Console::Read ()
  • “.assembly extern mscorlib {}” Any statement beginning with a dot is an assembler directive (like pre-processor directives in C). .assembly helps define part of the assembly manifest (wiki it). Used with keyword extern, it specifies the import of a library. mscorlib.dll is already in the current directory of ildasm. This is the library of where .net programs make all the functions calls and objects. It includes the whole System and Microsoft.Win32 namespaces.
  • “.assembly Test {.ver 1:0:0:0}” Here we define the assembly manifest that’s for our program (You can call it what ever you want). I specified version number for this program, there’s more. Open up a .net program in ildasm and double click on MANIFEST to view the assembly manifest of that file. All programs must have an assembly manifest. So if you don’t want to fill out manifest info, you can at least define the manifest with no member declarations like this: “.assembly whatever {}”
  • “.method void main() {” Define a method with return type void with name main with no arguments (empty parenthesis).
  • “.maxstack 1” Defines how much space for items to allocate on the stack. This allocates enough space for 1 item (native int or pointer) on the stack.
  • “.entrypoint” Used within the declaration of a method, this defines that the current method is where the program’s entrypoint will be.
  • “ldstr “Hello world!”” ldstr is short for Load String. It copies a pointer of the specified string to the top of the stack.
  • “call void [mscorlib]System.Console::WriteLine (string)” When calling a function, you specify it in this order: return type, library name enclosed in brackets, namespace, class, method, argument type overloads in parenthesis.
  • “pop” Removes a value from the top of the stack. This is necessary because, the call to System.Console::Read() has returned an int32 value on top of stack, the program would be invalid and crash if we tried to return from this function with a returning value on top of the stack because our method is declared as void, that is, with no returning value.
  • “ret” means return. It returns from the calling processes. A return value is placed on top of the stack before this instruction.

So let’s look at a simple C# snippet and get the CIL disassembly of that:

static void Main() {
int i = 42;
.method void main() {
.maxstack 1
.locals init ([0] int32)
ldc.i4.s 42
call void [mscorlib]System.Console::WriteLine (int32)
  • “.locals init ([0] int32)” Initializes space for local variables on a separate place of memory. Within parenthesis, you specify each memory index and the data type that it’s supposed to hold. Each declared variable is delimited by commas.
  • “ldc.i4.s 42” ldc means Load Constant, i4 means 32-bit Integer. Don’t worry about the s (no it does not stand for signed or unsigned). This instruction loads the 4 byte int value of 42 on top of the stack.
  • “stloc.0” stands for store local. It takes the value off of the top of the stack and stores it in a local variable specified by it’s index number.
  • “ldloc.0” stands for load local. It takes the value from memory specified by it’s index number and loads it on top of the stack.
int i = 42;
int j = 7;
int k = i + j;
.locals init ([0] int32, [1] int32, [2] int32)
ldc.i4.s 42        //Load int 42
stloc.0            //Store it in i
ldc.i4.s 7         //Load int 7
stloc.1            //Store it in j
ldloc.0            //Load i
ldloc.1            //Load j
add                //Add them
stloc.2            //Store result in k
ldloc.2            //Load k
call void [mscorlib]System.Console::WriteLine(int32)   //Print k

The previous example introduced a new instruction. The ADD instruction pops the top 2 values from the stack, adds them, and then pushes the result back to the stack. You may have also noticed I used comments. Yes comments are the same as in C, // for single line and /* + */ for multi-line comments.
Basic math instructions include:

  • add = Pop 2 off, add them, push result
  • sub = Pop 2 off, subtract them, push result
  • mul = Pop 2 off, multiply them, push result
  • div = Pop 2 off, divide them, push result
  • rem = Pop 2 off, divide them, push remainder
string a = "hello ";
string b = "world!";
Console.WriteLine(a + b);
.locals init ([0] string, [1] string)
ldstr "hello "       //Load string "hello "
stloc.0              //Store it in a
ldstr "world!"       //Load string "world!"
stloc.1              //Store it in b
ldloc.0              //Load string a
ldloc.1              //Load string b
//Concatenate them to one string
call string [mscorlib]System.String::Concat(string,string)
//Print returned value of concatenated string
call void [mscorlib]System.Console::WriteLine(string)
static void Main() {
int i = Convert.ToInt32(Console.ReadLine());
if (i > 100) { 
Console.WriteLine("i is greater than 100");
} else {
Console.WriteLine("i is less than 100");
.method void main() {
.maxstack 2
.locals init ([0] int32 i)
//Read string Input
call string [mscorlib]System.Console::ReadLine()
//Convert to Integer
call int32 [mscorlib]System.Convert::ToInt32(string)
stloc.0             //Store it in i
ldloc.0             //Load i
ldc.i4.s 100        //Load int 100
cgt                 //Test if i > 100
brfalse greater     //If not, branch to greater label
ldstr "i is greater than 100"
call void [mscorlib]System.Console::WriteLine(string)
br endif            //branch to endif
ldstr "i is less than or equal to 100"
call void [mscorlib]System.Console::WriteLine(string)
call int32 [mscorlib]System.Console::Read ()
pop                //Remove returning value

New instructions:

  • cgt, clt, ceq: pops 2 values off stack, compares them (gt = if greater than, lt = if less than, eq = if equal) and pushes back 1 if true or 0 if false. The comparison compares in the order that you pushed the arguments to the stack. so if you push value 5 and then push value 7 and then do a clt instruction, then the comparison will be: if (5 < 7); the result will yield true and the value 1 will be pushed back as the result.
  • brtrue, brfalse: pops value off stack and goes to the specified label if the value is true or false. brtrue = “go to this label if value is true” brfalse = “go this label if value is false”.
  • Labels are declared just like that in batch except the colon is placed right after the label name instead of before it.

Well that’s it for my tutorial. Since we know how to make variables and do math with them, and call .net functions, and go to labels and branch, you could also implement for loops and switches. Have fun reverse engineering .NET programs!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: