# Chapter 4 Type Fundamentals

# 4.1 All Types Are Derived from System.Object

The runtime requires every type to ultimately be derived from the System.Object type. This means that the following two type definitions are identical.

// Implicitly derived from Object
class Employee { 
	...
}
// Explicitly derived from Object
class Employee : System.Object {
	...
}

Because all types are ultimately derived from System.Object, you are guaranteed that every object of every type has a minimum set of methods. Specifically, the System.Object class offers the public instance methods listed in Table 4-1.

image-20220712112138138

In addition, types that derive from System.Object have access to the protected methods listed in Table 4-2.

image-20220712112211083

The CLR requires all objects to be created using the new operator. The following line shows how to create an Employee object.

Employee e = new Employee("ConstructorParam1");

Here’s what the new operator does:

  1. It calculates the number of bytes required by all instance fields defined in the type and all of its base types up to and including System.Object (which defines no instance fields of its own). Every object on the heap requires some additional members—called the type object pointer and the sync block index—used by the CLR to manage the object. The bytes for these additional members are added to the size of the object.
  2. It allocates memory for the object by allocating the number of bytes required for the specified type from the managed heap; all of these bytes are then set to zero (0).
  3. It initializes the object’s type object pointer and sync block index members.
  4. The type’s instance constructor is called, passing it any arguments (the string "ConstructorParam1" in the preceding example) specified in the call to new. Most compilers automatically emit code in a constructor to call a base class’s constructor. Each constructor is responsible for initializing the instance fields defined by the type whose constructor is being called. Eventually, System.Object’s constructor is called, and this constructor method does nothing but return.

After new has performed all of these operations, it returns a reference (or pointer) to the newly created object. In the preceding code example, this reference is saved in the variable e, which is of type Employee.

By the way, the new operator has no complementary delete operator; that is, there is no way to explicitly free the memory allocated for an object. The CLR uses a garbage-collected environment (described in Chapter 21) that automatically detects when objects are no longer being used or accessed and frees the object’s memory automatically.

💡 小结:“运行时” 要求每个类型最终都从 System.Object 类型派生。当用 new 操作符创建一个对象的实例时,会做以下几件事:1. 计算类型机器所有基类型中定义的所有实例字段需要的字节数。堆上每个对象都需要一些额外的成员,包括 “类型对象指针”(type object pointer)和 “同步块索引”(sync block index)。CLR 利用这些成员管理对象。额外成员的字节数要计入对象大小。2. 从托管堆中分配类型要求的字节数,从而分配对象的内存,分配的所有字节都设为 0。3. 初始化对象的 “类型对象指针” 和 “同步块索引” 成员。4. 调用类型的实例构造器,传递在 new 调用中指定的实参。大多数编译器都在构造器中自动生成代码来调用基类构造器。每个类型的构造器都负责初始化该类型定义的实例字段。最终调用 System.Object 的构造器,该构造器什么都不做,简单地返回。C# 中没有与 new 操作符对应的 delete 操作符,是因为 CLR 采用了垃圾回收机制,能自动检测到一个对象不再被使用或访问,并自动释放对象的内存。

# 4.2 Casting Between Types

One of the most important features of the CLR is type safety. At run time, the CLR always knows what type an object is. You can always discover an object’s exact type by calling the GetType method. Because this method is nonvirtual, it is impossible for a type to spoof another type. For example, the Employee type can’t override the GetType method and have it return a type of SuperHero.

Developers frequently find it necessary to cast an object to various types. The CLR allows you to cast an object to its type or to any of its base types. Your choice of programming language dictates how to expose casting operations to the developer. For example, C# doesn’t require any special syntax to cast an object to any of its base types, because casts to base types are considered safe implicit conversions. However, C# does require the developer to explicitly cast an object to any of its derived types because such a cast could fail at run time. The following code demonstrates casting to base and derived types.

// This type is implicitly derived from System.Object. 
internal class Employee { 
 ... 
} 
public sealed class Program { 
 public static void Main() { 
 // No cast needed since new returns an Employee object 
 // and Object is a base type of Employee. 
 Object o = new Employee();
 // Cast required since Employee is derived from Object. 
 // Other languages (such as Visual Basic) might not require 
 // this cast to compile. 
 Employee e = (Employee) o; 
 } 
}

This example shows what is necessary for your compiler to compile your code. Now I’ll explain what happens at run time. At run time, the CLR checks casting operations to ensure that casts are always to the object’s actual type or any of its base types. For example, the following code will compile, but at run time, an InvalidCastException will be thrown.

internal class Employee { 
 ... 
} 
internal class Manager : Employee { 
 ... 
} 
public sealed class Program { 
 public static void Main() { 
 // Construct a Manager object and pass it to PromoteEmployee. 
 // A Manager IS-A Object: PromoteEmployee runs OK. 
 Manager m = new Manager(); 
 PromoteEmployee(m); 
 // Construct a DateTime object and pass it to PromoteEmployee. 
 // A DateTime is NOT derived from Employee. PromoteEmployee 
 // throws a System.InvalidCastException exception. 
 DateTime newYears = new DateTime(2013, 1, 1); 
 PromoteEmployee(newYears); 
 } 
 public static void PromoteEmployee(Object o) { 
 // At this point, the compiler doesn't know exactly what 
 // type of object o refers to. So the compiler allows the 
 // code to compile. However, at run time, the CLR does know 
 // what type o refers to (each time the cast is performed) and 
 // it checks whether the object's type is Employee or any type 
 // that is derived from Employee. 
 Employee e = (Employee) o; 
 ...
 } 
}

Casting with the C# is and as Operators

Another way to cast in the C# language is to use the is operator. The is operator checks whether an object is compatible with a given type, and the result of the evaluation is a Boolean: true or false. The is operator will never throw an exception. The following code demonstrates.

Object o = new Object(); 
Boolean b1 = (o is Object); // b1 is true. 
Boolean b2 = (o is Employee); // b2 is false.

If the object reference is null, the is operator always returns false because there is no object available to check its type.

The is operator is typically used as follows.

if (o is Employee) { 
 Employee e = (Employee) o; 
 // Use e within the remainder of the 'if' statement. 
}

n this code, the CLR is actually checking the object’s type twice: The is operator first checks to see if o is compatible with the Employee type. If it is, inside the if statement, the CLR again verifies that o refers to an Employee when performing the cast. The CLR’s type checking improves security, but it certainly comes at a performance cost, because the CLR must determine the actual type of the object referred to by the variable (o), and then the CLR must walk the inheritance hierarchy, checking each base type against the specified type (Employee). Because this programming paradigm is quite common, C# offers a way to simplify this code and improve its performance by providing an as operator.

Employee e = o as Employee; 
if (e != null) { 
 // Use e within the 'if' statement. 
}

In this code, the CLR checks if o is compatible with the Employee type, and if it is, as returns a non-null reference to the same object. If o is not compatible with the Employee type, the as operator returns null. Notice that the as operator causes the CLR to verify an object’s type just once. The if statement simply checks whether e is null; this check can be performed faster than verifying an object’s type.

The as operator works just as casting does except that the as operator will never throw an exception. Instead, if the object can’t be cast, the result is null. You’ll want to check to see whether the resulting reference is null, or attempting to use the resulting reference will cause a System.NullReferenceException to be thrown. The following code demonstrates.

Object o = new Object(); // Creates a new Object object
Employee e = o as Employee; // Casts o to an Employee 
// The cast above fails: no exception is thrown, but e is set to null. 
e.ToString(); // Accessing e throws a NullReferenceException.

To make sure you understand everything just presented, take the following quiz. Assume that these two class definitions exist.

internal class B { // Base class 
} 
internal class D : B { // Derived class 
}

Now examine the lines of C# code in Table 4-3. For each line, decide whether the line would compile and execute successfully (marked OK in Table 4-3), cause a compile-time error (CTE), or cause a run-time error (RTE).

image-20220927141607431

注意 C# 允许类型定义转换操作符方法,详细参见 8.5 节 “转换操作符方法”。只有在使用转型表达式时才调用这些方法;使用 C# asis 操作符时永远不调用它们。

💡小结:CLR 最重要的特性之一就是类型安全。在运行时,CLR 总是知道对象的类型是什么。调用 GetType 可以知道对象的具体类型,这是一个非虚方法,所以一个类型不可能伪装成另一个类型。C# 不要求任何特殊语法即可将对象转换为它的任何基类型,因为向基类型的转换被认为是一种安全的隐式转换。然而在将对象转换为它的某个派生对象时,C# 要求进行显式转换,因为这种转换可能在运行时失败。可以使用 C# 的 is 和 as 操作符来转型。is 检查对象是否兼容于指定类型,返回 Boolean 值 true 或 false,并且永远不会抛出异常。as 操作符的工作方式和强制类型转换一样,同样也不会抛出异常,而过对象不能转型,结果就是 null,其他情况正常转型。具体类型安全性检测结果可参考上表。

# 4.3 Namespaces and Assemblies

Namespaces allow for the logical grouping of related types, and developers typically use them to make it easier to locate a particular type. For example, the System.Text namespace defines a bunch of types for performing string manipulations, and the System.IO namespace defines a bunch of types for performing I/O operations. Here’s some code that constructs a System.IO.FileStream object and a System.Text.StringBuilder object.

public sealed class Program { 
 public static void Main() { 
 System.IO.FileStream fs = new System.IO.FileStream(...); 
 System.Text.StringBuilder sb = new System.Text.StringBuilder(); 
 } 
}

As you can see, the code is pretty verbose; it would be nice if there were some shorthand way to refer to the FileStream and StringBuilder types to reduce typing. Fortunately, many compilers do offer mechanisms to reduce programmer typing. The C# compiler provides this mechanism via the using directive. The following code is identical to the previous example.

using System.IO; // Try prepending "System.IO." 
using System.Text; // Try prepending "System.Text." 
public sealed class Program { 
 public static void Main() { 
 FileStream fs = new FileStream(...); 
 StringBuilder sb = new StringBuilder(); 
 } 
}

💡重要提示:CLR 对 “命名空间” 一无所知。访问类型时, CLR 需要知道类型的完整名称 (可能是相当长的、包含句点符号的名称) 以及该类型的定义具体在哪个程序集中。这样 “运行时” 才能加载正确程序集,找到目标类型,并对其进行操作。

In the previous code example, the compiler needs to ensure that every type referenced exists and that my code is using that type in the correct way: calling methods that exist, passing the right number of arguments to these methods, ensuring that the arguments are the right type, using the method’s return value correctly, and so on. If the compiler can’t find a type with the specified name in the source files or in any referenced assemblies, it prepends System.IO. to the type name and checks if the generated name matches an existing type. If the compiler still can’t find a match, it prepends System.Text. to the type’s name. The two using directives shown earlier allow me to simply type FileStream and StringBuilder in my code—the compiler automatically expands the references to System.IO.FileStream and System.Text.StringBuilder. I’m sure you can easily imagine how much typing this saves, as well as how much cleaner your code is to read.

When checking for a type’s definition, the compiler must be told which assemblies to examine by using the /reference compiler switch as discussed in Chapter 2, “Building, Packaging, Deploying, and Administering Applications and Types,” and Chapter 3, “Shared Assemblies and Strongly Named Assemblies.” The compiler will scan all of the referenced assemblies looking for the type’s definition. After the compiler finds the proper assembly, the assembly information and the type information is emitted into the resulting managed module’s metadata. To get the assembly information, you must pass the assembly that defines any referenced types to the compiler. The C# compiler, by default, automatically looks in the MSCorLib.dll assembly even if you don’t explicitly tell it to. The MSCorLib.dll assembly contains the definitions of all of the core Framework Class Library (FCL) types, such as Object, Int32, String, and so on.

As you might imagine, there are some potential problems with the way that compilers treat namespaces: it’s possible to have two (or more) types with the same name in different namespaces. Microsoft strongly recommends that you define unique names for types. However, in some cases, it’s simply not possible. The runtime encourages the reuse of components. Your application might take advantage of a component that Microsoft created and another component that Wintellect created. These two companies might both offer a type called Widget—Microsoft’s Widget does one thing, and Wintellect’s Widget does something entirely different. In this scenario, you had no control over the naming of the types, so you can differentiate between the two widgets by using their fully qualified names when referencing them. To reference Microsoft’s Widget, you would use Microsoft.Widget, and to reference Wintellect’s Widget, you would use Wintellect.Widget. In the following code, the reference to Widget is ambiguous, so the C# compiler generates the following message: error CS0104: 'Widget' is an ambiguous reference between 'Microsoft.Widget' and 'Wintellect.Widget'.

using Microsoft; // Try prepending "Microsoft." 
using Wintellect; // Try prepending "Wintellect." 
public sealed class Program { 
 public static void Main() { 
 Widget w = new Widget();// An ambiguous reference 
 } 
}

To remove the ambiguity, you must explicitly tell the compiler which Widget you want to create.

using Microsoft; // Try prepending "Microsoft." 
using Wintellect; // Try prepending "Wintellect." 
public sealed class Program { 
 public static void Main() { 
 Wintellect.Widget w = new Wintellect.Widget(); // Not ambiguous 
 } 
}

These methods of disambiguating a type are useful, but in some scenarios, you need to go further. Imagine that the Australian Boomerang Company (ABC) and the Alaskan Boat Corporation (ABC) are each creating a type, called BuyProduct, which they intend to ship in their respective assemblies. It’s likely that both companies would create a namespace called ABC that contains a type called BuyProduct. Anyone who tries to develop an application that needs to buy both boomerangs and boats would be in for some trouble unless the programming language provides a way to programmatically distinguish between the assemblies, not just between the namespaces. Fortunately, the C# compiler offers a feature called extern aliases that gives you a way to work around this rarely occurring problem. Extern aliases also give you a way to access a single type from two (or more) different versions of the same assembly. For more information about extern aliases, see the C# Language Specification.

In your library, when you’re designing types that you expect third parties to use, you should define these types in a namespace so that compilers can easily disambiguate them. In fact, to reduce the likelihood of conflict, you should use your full company name (not an acronym or abbreviation) to be your top-level namespace name. Referring to the Microsoft .NET Framework SDK documentation, you can see that Microsoft uses a namespace of “Microsoft” for Microsoft-specific types. (See the Microsoft.CSharp, Microsoft.VisualBasic, and Microsoft.Win32 namespaces as examples.)

Creating a namespace is simply a matter of writing a namespace declaration into your code as follows (in C#).

namespace CompanyName { 
 public sealed class A { // TypeDef: CompanyName.A 
 } 
 namespace X { 
 public sealed class B { ... } // TypeDef: CompanyName.X.B 
 } 
}

The comment on the right of the preceding class definitions indicates the real name of the type the compiler will emit into the type definition metadata table; this is the real name of the type from the CLR’s perspective.

Some compilers don’t support namespaces at all, and other compilers are free to define what “namespace” means to a particular language. In C#, the namespace directive simply tells the compiler to prefix each type name that appears in source code with the namespace name so that programmers can do less typing.

命名空间和程序集的关系

注意,命名空间和程序集 (实现类型的文件) 不一定相关。特别是,同一个命名空间中的类型可能在不同程序集中实现。例如, System.IO.FileSystem 类型在 MSCorLib.dll 程序集中实现,而 System.IO.FileSystemWatcher 类型在 System.dll 程序集中实现。同一个程序集也可能包含不同命名空间中的类型。例如, System.Int32System.Text.StringBuilder 类型都在 MSCorLib.dll 程序集中。

在文档中查找类型时,文档会明确指出类型所属的命名空间,以及实现了该类型的程序集。如果 4-1 所示,可以清楚地看到 (在” 语法 “小节的上方), ResXFileRef 类型是 System.Resources 命名空间的一部分,在 System.Windows.Forms.dll 程序集中实现。为了编译引用了 ResXFileRef 类型的代码,要在源代码中添加 using System.Resources; 指令,而且要使用 /r:System.Windows.Forms.dll 编译器开关。
4_1
图 4-1 文档显示了类型的命名空间和程序集信息

💡小结:命名空间对相关的类型进行逻辑分组,开发人员可通过命名空间方便地定位类型。对于编译器来说,命名空间的作用就是为类型名称附加以句点分隔的符号,使名称变得更长,更可能具有唯一性。CLR 访问类型时需要知道类型的完整名称以及该类型的定义具体在哪个程序集中。这样 “运行时” 才能加载正确程序集,找到目标类型,并对其进行操作。检查类型定义时,编译器必须知道要在什么程序集中检查,者通过 /reference 编译器开关实现,一旦找到正确的程序集,程序集信息和类型就嵌入生成的托管模块元数据中。编译器对待命名空间的方式存在潜在问题:可能两个(或更多)类型在不同命名空间中同名。如果遇到这种问题,要么使用类型时使用类型的完整名称,要么使用通过 using 指令指定类型或命名空间的别名。如果连类型的完整名称都冲突了,可以考虑使用外部别名(extern alias)。外部别名还允许从同一个程序集的两个(或更多)不同的版本中访问一个类型。

# 4.4 How Things Relate at Run Time

In this section, I’m going to explain the relationship at run time between types, objects, a thread’s stack, and the managed heap. Furthermore, I will also explain the difference between calling static methods, instance methods, and virtual methods. Let’s start off with some fundamentals of computers. What I’m about to describe is not specific to the CLR at all, but I’m going to describe it so that we have a working foundation, and then I’ll modify the discussion to incorporate CLR-specific information.

Figure 4-2 shows a single Windows process that has the CLR loaded into it. In this process there may be many threads. When a thread is created, it is allocated a 1-MB stack. This stack space is used for passing arguments to a method and for local variables defined within a method. In Figure 4-2, the memory for one thread’s stack is shown (on the right). Stacks build from high-memory addresses to low-memory addresses. In the figure, this thread has been executing some code, and its stack has some data on it already (shown as the shaded area at the top of the stack). Now, imagine that the thread has executed some code that calls the M1 method.

image-20220927110819234

All but the simplest of methods contain some prologue code, which initializes a method before it can start doing its work. These methods also contain epilogue code, which cleans up a method after it has performed its work so that it can return to its caller. When the M1 method starts to execute, its prologue code allocates memory for the local name variable from the thread’s stack (see Figure 4-3).

image-20220927110859658

Then, M1 calls the M2 method, passing in the name local variable as an argument. This causes the address in the name local variable to be pushed on the stack (see Figure 4-4). Inside the M2 method, the stack location will be identified using the parameter variable named s. (Note that some architectures pass arguments via registers to improve performance, but this distinction is not important for this discussion.) Also, when a method is called, the address indicating where the called method should return to in the calling method is pushed on the stack (also shown in Figure 4-4).

image-20220927110937502

When the M2 method starts to execute, its prologue code allocates memory for the local length and tally variables from the thread’s stack (see Figure 4-5). Then the code inside method M2 executes. Eventually, M2 gets to its return statement, which causes the CPU’s instruction pointer to be set to the return address in the stack, and M2’s stack frame is unwound so that it looks the way it did in Figure 4-3. At this point, M1 is continuing to execute its code that immediately follows the call to M2, and its stack frame accurately reflects the state needed by M1.

Eventually, M1 will return back to its caller by setting the CPU’s instruction pointer to be set to the return address (not shown on the figures, but it would be just above the name argument on the stack), and M1’s stack frame is unwound so that it looks the way it did in Figure 4-2. At this point, the method that called M1 continues to execute its code that immediately follows the call to M1, and its stack frame accurately reflects the state needed by that method.

image-20220927111037511

Now, let’s start gearing the discussion toward the CLR. Let’s say that we have these two class definitions.

internal class Employee { 
 public Int32 GetYearsEmployed() { ... } 
 public virtual String GetProgressReport() { ... } 
 public static Employee Lookup(String name) { ... } 
} 
internal sealed class Manager : Employee { 
 public override String GetProgressReport() { ... } 
}

Our Windows process has started, the CLR is loaded into it, the managed heap is initialized, and a thread has been created (along with its 1 MB of stack space). This thread has already executed some code, and this code has decided to call the M3 method. All of this is shown in Figure 4-6. The M3 method contains code that demonstrates how the CLR works; this is not code that you would normally write, because it doesn’t actually do anything useful.

image-20220927111242121

As the just-in-time (JIT) compiler converts M3’s Intermediate Language (IL) code into native CPU instructions, it notices all of the types that are referred to inside M3: Employee, Int32, Manager, and String (because of "Joe"). At this time, the CLR ensures that the assemblies that define these types are loaded. Then, using the assembly’s metadata, the CLR extracts information about these types and creates some data structures to represent the types themselves. The data structures for the Employee and Manager type objects are shown in Figure 4-7. Because this thread already executed some code prior to calling M3, let’s assume that the Int32 and String type objects have already been created (which is likely because these are commonly used types), and so I won’t show them in the figure.

image-20220927111333850

Let’s take a moment to discuss these type objects. As discussed earlier in this chapter, all objects on the heap contain two overhead members: the type object pointer and the sync block index. As you can see, the Employee and Manager type objects have both of these members. When you define a type, you can define static data fields within it. The bytes that back these static data fields are allocated within the type objects themselves. Finally, inside each type object is a method table with one entry per method defined within the type. This is the method table that was discussed in Chapter 1, “The CLR’s Execution Model.” Because the Employee type defines three methods (GetYearsEmployed, GetProgressReport, and Lookup), there are three entries in Employee’s method table. Because the Manager type defines one method (an override of GetProgressReport), there is just one entry in Manager’s method table.

Now, after the CLR has ensured that all of the type objects required by the method are created and the code for M3 has been compiled, the CLR allows the thread to execute M3’s native code. When M3’s prologue code executes, memory for the local variables must be allocated from the thread’s stack, as shown in Figure 4-8. By the way, the CLR automatically initializes all local variables to null or 0 (zero) as part of the method’s prologue code. However, the C# compiler issues a Use of unassigned local variable error message if you write code that attempts to read from a local variable that you have not explicitly initialized in your source code.

image-20220927111428084

Then, M3 executes its code to construct a Manager object. This causes an instance of the Manager type, a Manager object, to be created in the managed heap, as shown in Figure 4-9. As you can see, the Manager object—as do all objects—has a type object pointer and sync block index. This object also contains the bytes necessary to hold all of the instance data fields defined by the Manager type, as well as any instance fields defined by any base classes of the Manager type (in this case, Employee and Object). Whenever a new object is created on the heap, the CLR automatically initializes the internal type object pointer member to refer to the object’s corresponding type object (in this case, the Manager type object). Furthermore, the CLR initializes the sync block index and sets all of the object’s instance fields to null or 0 (zero) prior to calling the type’s constructor, a method that will likely modify some of the instance data fields. The new operator returns the memory address of the Manager object, which is saved in the variable e (on the thread’s stack).

The next line of code in M3 calls Employee’s static Lookup method. When calling a static method, the JIT compiler locates the type object that corresponds to the type that defines the static method. Then, the JIT compiler locates the entry in the type object’s method table that refers to the method being called, JITs the method (if necessary), and calls the JITted code. For our discussion, let’s say that Employee’s Lookup method queries a database to find Joe. Let’s also say that the database indicates that Joe is a manager at the company, and therefore, internally, the Lookup method constructs a new Manager object on the heap, initializes it for Joe, and returns the address of this object. The address is saved in the local variable e. The result of this operation is shown in Figure 4-10.

image-20220927111552378

image-20220927111626746

Note that e no longer refers to the first Manager object that was created. In fact, because no variable refers to this object, it is a prime candidate for being garbage collected in the future, which will reclaim (free) the memory used by this object.

The next line of code in M3 calls Employee’s nonvirtual instance GetYearsEmployed method. When calling a nonvirtual instance method, the JIT compiler locates the type object that corresponds to the type of the variable being used to make the call. In this case, the variable e is defined as an Employee. (If the Employee type didn’t define the method being called, the JIT compiler walks down the class hierarchy toward Object looking for this method. It can do this because each type object has a field in it that refers to its base type; this information is not shown in the figures.) Then, the JIT compiler locates the entry in the type object’s method table that refers to the method being called, JITs the method (if necessary), and then calls the JITted code. For our discussion, let’s say that Employee’s GetYearsEmployed method returns 5 because Joe has been employed at the company for five years. The integer is saved in the local variable year. The result of this operation is shown in Figure 4-11.

image-20220927111741259

The next line of code in M3 calls Employee’s virtual instance GetProgressReport method. When calling a virtual instance method, the JIT compiler produces some additional code in the method, which will be executed each time the method is invoked. This code will first look in the variable being used to make the call and then follow the address to the calling object. In this case, the variable e points to the Manager object representing “Joe.” Then, the code will examine the object’s internal type object pointer member; this member refers to the actual type of the object. The code then locates the entry in the type object’s method table that refers to the method being called, JITs the method (if necessary), and calls the JITted code. For our discussion, Manager’s GetProgressReport implementation is called because e refers to a Manager object. The result of this operation is shown in Figure 4-12.

Note that if Employee’s Lookup method had discovered that Joe was just an Employee and not a Manager, Lookup would have internally constructed an Employee object whose type object pointer member would have referred to the Employee type object, causing Employee’s implementation of GetProgressReport to execute instead of Manager’s implementation.

image-20220927111841679

At this point, we have discussed the relationship between source code, IL, and JITted code. We have also discussed the thread’s stack, arguments, local variables, and how these arguments and variables refer to objects on the managed heap. You also see how objects contain a pointer to their type object (containing the static fields and method table). We have also discussed how the JIT compiler determines how to call static methods, nonvirtual instance methods, and virtual instance methods. All of this should give you great insight into how the CLR works, and this insight should help you when architecting and implementing your types, components, and applications. Before ending this chapter, I’d like to give you just a little more insight as to what is going on inside the CLR.

You’ll notice that the Employee and Manager type objects both contain type object pointer members. This is because type objects are actually objects themselves. When the CLR creates type objects, the CLR must initialize these members. “To what?” you might ask. Well, when the CLR starts running in a process, it immediately creates a special type object for the System.Type type (defined in MSCorLib.dll). The Employee and Manager type objects are “instances” of this type, and therefore, their type object pointer members are initialized to refer to the System.Type type object, as shown in Figure 4-13.

image-20220927111920811

Of course, the System.Type type object is an object itself and therefore also has a type object pointer member in it, and it is logical to ask what this member refers to. It refers to itself because the System.Type type object is itself an “instance” of a type object. And now you should understand the CLR’s complete type system and how it works. By the way, System.Object’s GetType method simply returns the address stored in the specified object’s type object pointer member. In other words, the GetType method returns a pointer to an object’s type object, and this is how you can determine the true type of any object in the system (including type objects).

💡小结:一个进程可能有多个线程。在线程创建时会分配到 1MB 的栈。栈空间用于向方法传递实参,方法内部定义的局部变量也在栈上。栈从高位内存地址向低位内存地址构建。栈帧(stack frame)代表当前线程的调用栈中的一个方法调用。执行线程的过程中,进行的每个方法调用都会在调用栈中创建并压入一个 StackFrame。当方法内部 return 时,CPU 的指令指针被设置成栈中的返回地址,方法的栈帧展开(unwind)。这里多说一句,调用方法时压入栈帧,称为 wind;方法执行完毕,弹出栈帧,称为 unwind。把线程栈看成一个线圈,就很容易理解这种翻译。堆上所有对象都包含两个额外成员:类型对象指针(type object pointer)和同步块索引(sync block index)。定义类型时,可以在类型内部定义静态数据字段。为这些静态数据提供支援的字节在类型对象自身中分配。每个类型对象最后都包含一个方法表。在方法表中,类型定义的每个方法都有对应的记录项。任何时候在堆上新建对象,CLR 都自动初始化内部的 “类型对象指针” 成员来引用和对象对应的类型对象。此外,在调用类型的构造器(本质上是可能修改某些实例数据字段的方法)之前,CLR 会先初始化同步块索引,并将对象的所有实例字段设为 null 或 0。调用静态方法时,CLR 会定位与定义静态方法的类型对应的类型对象。然后,JIT 编译器在类型对象的方法表中查找与被调用方法对应的记录项,对方法进行 JIT 编译(如果需要的话),再调用 JIT 编译好的代码。调用非虚实例方法时,JIT 编译器会找到与 “发出调用的那个变量的类型” 对应的类型对象。如果该类型并没有定义正在调用的那个方法,JIT 编译器会回溯类层次结构(一直回溯到 Object),并再沿途的每个类型中查找该方法。之所以能这样回溯,是因为每个类型对象都有一个字段引用了它的基类型。调用虚实例方法时,JIT 编译器要在方法中生成一些额外的代码;方法每次调用时都会执行这些代码。这些代码首先检查发出调用的变量,并跟随地址来到发出调用的对象。然后,代码检查对象内部的 “类型对象指针” 成员,该变量指向对象的实际类型。然后,代码在类型对象的方法表中检查引用了被调用方法的记录项,对方法进行 JIT 编译(如果需要的话),再调用 JIT 编译好的代码。注意,类型对象本身也包含 “类型对象指针” 成员。这是由于类型对象本质上也是对象。CLR 开始再一个进程中运行时,会立即为 MSCorLib.dll 中定义的 System.Type 类型创建一个特殊的类型对象。因此,类型对象的类型对象指针会初始化成对 System.Type 类型对象的引用。而 System.Type 类型对象的类型对象指针指向它本身。顺便说一句,System.Object 的 GetType 方法返回存储在指定对象的 “类型对象指针” 成员中的地址。也就是说,GetType 方法返回指向对象的类型对象的指针。这样就可判断类型中的任何对象(包括类型对象本身)的真实类型。