# Chapter 5 Primitive, Reference, and Value Types

# Programming Language Primitive Types

Certain data types are so commonly used that many compilers allow code to manipulate them using simplified syntax. For example, you could allocate an integer by using the following syntax.

System.Int32 a = new System.Int32();

But I’m sure you’d agree that declaring and initializing an integer by using this syntax is rather cumbersome. Fortunately, many compilers (including C#) allow you to use syntax similar to the following instead.

int a = 0;

This syntax certainly makes the code more readable and generates identical Intermediate Language (IL) to that which is generated when System.Int32 is used. Any data types the compiler directly supports are called primitive types. Primitive types map directly to types existing in the Framework Class Library (FCL). For example, in C#, an int maps directly to the System.Int32 type. Because of this, the following four lines of code all compile correctly and produce exactly the same IL.

int a = 0; // Most convenient syntax 
System.Int32 a = 0; // Convenient syntax 
int a = new int(); // Inconvenient syntax 
System.Int32 a = new System.Int32(); // Most inconvenient syntax

Table 5-1 shows the FCL types that have corresponding primitives in C#. For the types that are compliant with the Common Language Specification (CLS), other languages will offer similar primitive types. However, languages aren’t required to offer any support for the non–CLS-compliant types.

image-20220928112238217

Another way to think of this is that the C# compiler automatically assumes that you have the following using directives (as discussed in Chapter 4, “Type Fundamentals”) in all of your source code files.

using sbyte = System.SByte; 
using byte = System.Byte; 
using short = System.Int16; 
using ushort = System.UInt16; 
using int = System.Int32; 
using uint = System.UInt32; 
...

The C# language specification states, “As a matter of style, use of the keyword is favored over use of the complete system type name.” I disagree with the language specification; I prefer to use the FCL type names and completely avoid the primitive type names. In fact, I wish that compilers didn’t even offer the primitive type names and forced developers to use the FCL type names instead. Here are my reasons:

■ I’ve seen a number of developers confused, not knowing whether to use string or String in their code. Because in C# string (a keyword) maps exactly to System.String (an FCL type), there is no difference and either can be used. Similarly, I’ve heard some developers say that int represents a 32-bit integer when the application is running on a 32-bit operating system and that it represents a 64-bit integer when the application is running on a 64-bit operating system. This statement is absolutely false: in C#, an int always maps to System.Int32, and therefore it represents a 32-bit integer regardless of the operating system the code is running on. If programmers would use Int32 in their code, then this potential confusion is also eliminated.

■ In C#, long maps to System.Int64, but in a different programming language, long could map to an Int16 or Int32. In fact, C++/CLI does treat long as an Int32. Someone reading source code in one language could easily misinterpret the code’s intention if he or she were used to programming in a different programming language. In fact, most languages won’t even treat long as a keyword and won’t compile code that uses it.

■ The FCL has many methods that have type names as part of their method names. For example, the BinaryReader type offers methods such as ReadBoolean, ReadInt32, ReadSingle, and so on, and the System.Convert type offers methods such as ToBoolean, ToInt32, ToSingle, and so on. Although it’s legal to write the following code, the line with float feels very unnatural to me, and it’s not obvious that the line is correct.

BinaryReader br = new BinaryReader(...); 
float val = br.ReadSingle(); // OK, but feels unnatural 
Single val = br.ReadSingle(); // OK and feels good

■ Many programmers that use C# exclusively tend to forget that other programming languages can be used against the CLR, and because of this, C#-isms creep into the class library code. For example, Microsoft’s FCL is almost exclusively written in C# and developers on the FCL team have now introduced methods into the library such as Array’s GetLongLength, which returns an Int64 value that is a long in C# but not in other languages (like C++/CLI). Another example is System.Linq.Enumerable’s LongCount method.

In many programming languages, you would expect the following code to compile and execute correctly.

Int32 i = 5; // A 32-bit value 
Int64 l = i; // Implicit cast to a 64-bit value

However, based on the casting discussion presented in Chapter 4, you wouldn’t expect this code to compile. After all, System.Int32 and System.Int64 are different types, and neither one is derived from the other. Well, you’ll be happy to know that the C# compiler does compile this code correctly, and it runs as expected. Why? The reason is that the C# compiler has intimate knowledge of primitive types and applies its own special rules when compiling the code. In other words, the compiler recognizes common programming patterns and produces the necessary IL to make the written code work as expected. Specifically, the C# compiler supports patterns related to casting, literals, and operators, as shown in the following examples.

First, the compiler is able to perform implicit or explicit casts between primitive types such as the following.

Int32 i = 5; // Implicit cast from Int32 to Int32 
Int64 l = i; // Implicit cast from Int32 to Int64 
Single s = i; // Implicit cast from Int32 to Single 
Byte b = (Byte) i; // Explicit cast from Int32 to Byte 
Int16 v = (Int16) s; // Explicit cast from Single to Int16

C# allows implicit casts if the conversion is “safe,” that is, no loss of data is possible, such as converting an Int32 to an Int64. But C# requires explicit casts if the conversion is potentially unsafe. For numeric types, “unsafe” means that you could lose precision or magnitude as a result of the conversion. For example, converting from Int32 to Byte requires an explicit cast because precision might be lost from large Int32 numbers; converting from Single to Int16 requires a cast because Single can represent numbers of a larger magnitude than Int16 can.

Be aware that different compilers can generate different code to handle these cast operations. For example, when casting a Single with a value of 6.8 to an Int32, some compilers could generate code to put a 6 in the Int32, and others could perform the cast by rounding the result up to 7. By the way, C# always truncates the result. For the exact rules that C# follows for casting primitive types, see the “Conversions” section in the C# language specification.

In addition to casting, primitive types can be written as literals. A literal is considered to be an instance of the type itself, and therefore, you can call instance methods by using the instance as shown here.

Console.WriteLine(123.ToString() + 456.ToString()); // "123456"

Also, if you have an expression consisting of literals, the compiler is able to evaluate the expression at compile time, improving the application’s performance.

Boolean found = false; // Generated code sets found to 0 
Int32 x = 100 + 20 + 3; // Generated code sets x to 123 
String s = "a " + "bc"; // Generated code sets s to "a bc"

Finally, the compiler automatically knows how and in what order to interpret operators (such as +, -, *, /, %, &, ^, |, ==, !=, >, <, >=, <=, <<, >>, ~, !, ++, --, and so on) when used in code.

Int32 x = 100; // Assignment operator 
Int32 y = x + 23; // Addition and assignment operators 
Boolean lessThanFifty = (y < 50); // Less-than and assignment operators

# Checked and Unchecked Primitive Type Operations

Programmers are well aware that many arithmetic operations on primitives could result in an overflow.

Byte b = 100; 
b = (Byte) (b + 200); // b now contains 44 (or 2C in Hex).

💡重要提示:执行上述算术运算时,第一步要求所有操作数都扩大为 32 位值 (或者 64 位值,如果任何操作数需要超过 32 位来表示的话)。所以 b 和 200 (两个都不超过 32 位) 首先转换成 32 位值,然后加到一起。结果是一个 32 位值 (十进制 300,或十六进制 12C)。该值在寄回变量 b 前必须转型为 Byte 。C# 不隐式执行这个转型操作,这正是第二行代码需要强制转型 Byte 的原因。

In most programming scenarios, this silent overflow is undesirable and if not detected causes the application to behave in strange and unusual ways. In some rare programming scenarios (such as calculating a hash value or a checksum), however, this overflow is not only acceptable but is also desired.

Different languages handle overflows in different ways. C and C++ don’t consider overflows to be an error and allow the value to wrap; the application continues running. Microsoft Visual Basic, on the other hand, always considers overflows to be errors and throws an exception when it detects one.

The CLR offers IL instructions that allow the compiler to choose the desired behavior. The CLR has an instruction called add that adds two values together. The add instruction performs no overflow checking. The CLR also has an instruction called add.ovf that also adds two values together. However, add.ovf throws a System.OverflowException if an overflow occurs. In addition to these two IL instructions for the add operation, the CLR also has similar IL instructions for subtraction (sub/ sub.ovf), multiplication (mul/mul.ovf), and data conversions (conv/conv.ovf).

C# allows the programmer to decide how overflows should be handled. By default, overflow checking is turned off. This means that the compiler generates IL code by using the versions of the add, subtract, multiply, and conversion instructions that don’t include overflow checking. As a result, the code runs faster—but developers must be assured that overflows won’t occur or that their code is designed to anticipate these overflows.

One way to get the C# compiler to control overflows is to use the /checked+ compiler switch. This switch tells the compiler to generate code that has the overflow-checking versions of the add, subtract, multiply, and conversion IL instructions. The code executes a little slower because the CLR is checking these operations to determine whether an overflow occurred. If an overflow occurs, the CLR throws an OverflowException.

In addition to having overflow checking turned on or off globally, programmers can control overflow checking in specific regions of their code. C# allows this flexibility by offering checked and unchecked operators. Here’s an example that uses the unchecked operator.

UInt32 invalid = unchecked((UInt32) (-1)); // OK

And here is an example that uses the checked operator.

Byte b = 100; 
b = checked((Byte) (b + 200)); // OverflowException is thrown

In this example, b and 200 are first converted to 32-bit values and are then added together; the result is 300. Then 300 is converted to a Byte due to the explicit cast; this generates the OverflowException. If the Byte were cast outside the checked operator, the exception wouldn’t occur.

b = (Byte) checked(b + 200); // b contains 44; no OverflowException

In addition to the checked and unchecked operators, C# also offers checked and unchecked statements. The statements cause all expressions within a block to be checked or unchecked.

In fact, if you use a checked statement block, you can now use the += operator with the Byte, which simplifies the code a bit.

checked { // Start of checked block 
 Byte b = 100; 
 b += 200; // This expression is checked for overflow. 
}

💡重要提示:由于 checked 操作符和 checked 语句唯一的作用就是决定生成哪个版本的加、减、乘和数据转换 IL 指令,所以在 checked 操作符或语句中调用方法,不会对该方法造成任何影响,如下例所示::

checked { 
 // Assume SomeMethod tries to load 400 into a Byte. 
 SomeMethod(400); 
 // SomeMethod might or might not throw an OverflowException. 
 // It would if SomeMethod were compiled with checked instructions. 
}

In my experience, I've seen a lot of calculations produce surprising results. Typically, this is due to invalid user input, but it can also be due to values returned from parts of the system that a programmer just doesn't expect. And so, I now recommend that programmers do the following:

■ Use signed data types (such as Int32 and Int64) instead of unsigned numeric types (such as UInt32 and UInt64) wherever possible. This allows the compiler to detect more overflow/ underflow errors. In addition, various parts of the class library (such as Array's and String's Length properties) are hard-coded to return signed values, and less casting is required as you move these values around in your code. Fewer casts make source code cleaner and easier to maintain. In addition, unsigned numeric types are not CLS-compliant.

■ As you write your code, explicitly use checked around blocks where an unwanted overflow might occur due to invalid input data, such as processing a request with data supplied from an end user or a client machine. You might want to catch OverflowException as well, so that your application can gracefully recover from these failures.

■ As you write your code, explicitly use unchecked around blocks where an overflow is OK, such as calculating a checksum.

■ For any code that doesn’t use checked or unchecked, the assumption is that you do want an exception to occur on overflow, for example, calculating something (such as prime numbers) where the inputs are known, and overflows are bugs.

Now, as you develop your application, turn on the compiler’s /checked+ switch for debug builds. Your application will run more slowly because the system will be checking for overflows on any code that you didn’t explicitly mark as checked or unchecked. If an exception occurs, you’ll easily detect it and be able to fix the bug in your code. For the release build of your application, use the compiler’s /checked-switch so that the code runs faster and overflow exceptions won’t be generated. To change the Checked setting in Microsoft Visual Studio, display the properties for your project, select the Build tab, click Advanced, and then select the Check For Arithmetic Overflow/Underflow option, as shown in Figure 5-1.

image-20220928121251436

If your application can tolerate the slight performance hit of always doing checked operations, then I recommend that you compile with the /checked command-line option even for a release build because this can prevent your application from continuing to run with corrupted data and possible security holes. For example, you might perform a multiplication to calculate an index into an array; it is much better to get an OverflowException as opposed to accessing an incorrect array element due to the math wrapping around.

💡重要提示: System.Decimal 是非常特殊的类型。虽然许多编程语言 (包括 C# 和 Visual Basic) 将 Decimal 视为基元类型,但 CLR 不然。这意味着 CLR 没有知道如何处理 Decimal 值的 IL 指令。在文档中查看 Decimal 类型,可以看到它提供了一系列 pulbic static 方法,包括 AddSubtractMultiplyDivide 等。此外, Decimal 类型还为 +-\*/ 等提供了操作重载方法。

编译使用了 Decimal 值的程序时,编译器会生成代码来调用 Decimal 的成员,并通过这些成员来执行实际运算。这意味着 Decimal 值的处理速度慢于 CLR 基元类型的值。另外,由于没有相应的 IL 指令来处理 Decimal 值,所以 checkedunchecked 操作符、语句以及编译器开关都失去了作用。如果对 Decimal 值执行的运算是不安全的,肯定会抛出 OverflowException 异常。

类似地, System.Numerics.BigInteger 类型也在内部使用 UInt32 数组来表示任意大的整数,它的值没有上限和下限。因此,对 BigInteger 执行的运算永远不会造成 OverflowException 异常。但如果值太大,没有足够多的内存来改变数组大小,对 BigInteger 的运算可能抛出 OutOfMemoryException 异常。​

💡小结:基元类型指的是编译器能直接支持的数据类型,它们直接映射到 Framework 类库(FCL)中存在的类型。并且这两种写法会生成完全相同的 IL 代码。只要是符合公共语言规范(CLS)的类型,其他语言都提供了类似的基元类型。但是,不符合 CLS 的类型语言就不一定要支持了。本书作者 Jeffrey Richter 推荐使用 FCL 类型名称,而不去用基元类型名称,主要有以下几点:1.Int32 能清楚的表明这是一个 32 位的整数。2. 使用关键字容易引起不同语言使用者错误理解代码意图。例如 C# 和 C++ 对 long 型整数的长度定义就不一样。3.FCL 的许多方法都将类型名作为方法名的一部分。例如 BinaryReader 类型的方法就包括 ReadInt32,ReadSingle 等。C# 编译器非常熟悉基元类型,会在编译代码时应用自己的特殊规则。因此例如像 Int64 和 Int32 之间互相的转换编译器会生成必要的 IL,使写好的代码能像预期的那样工作。只有在转换 “安全” 的时候,C# 才允许隐式转型。所谓 “安全”,是指不会发生数据丢失的情况,比如从 Int32 转换为 Int64。但如果可能不安全,C# 就要求显式转型。对于显式转型,C# 总是对结果进行截断,而不进行向上取整。基本类型还可以写成字面值(literal),字面值可被看成是类型本身的实例,并且可以在实例上调用实例方法。不同语言处理溢出的方式不同。C/C++ 不将溢出视为错误,允许值回滚(wrap)。CLR 提供了一些特殊的 IL 指令,允许编译器选择它认为最恰当的行为。CLR 提供了加、减、乘和数据转换的不执行溢出检查和执行溢出检查的 IL 指令,分别是 add/add.ovf,sub/sub.ovf,mul/mul.ovf 和 conv/conv.ovf。C# 允许程序员自己决定如何处理溢出。溢出检查默认关闭,编译器生成 IL 代码时,将自动使用指令的无溢出检查版本。让 C# 编译器控制溢出的一个方法是使用 /checked + 编译器开关。该开关只是编译器在生成代码时使用指令的溢出检查版本。除了全局性地打开或关闭溢出检查,我们还可以在代码的特定区域控制溢出检查。C# 通过 checked 和 unchecked 操作符来提供这种灵活性。C# 还支持 checked 和 unchecked 语句块,在语句块中的所有表达式都进行或不进行溢出检查。

# Reference Types and Value Types

The CLR supports two kinds of types: reference types and value types. Although most types in the FCL are reference types, the types that programmers use most often are value types. Reference types are always allocated from the managed heap, and the C# new operator returns the memory address of the object—the memory address refers to the object’s bits. You need to bear in mind some performance considerations when you’re working with reference types. First, consider these facts:

■ The memory must be allocated from the managed heap.

■ Each object allocated on the heap has some additional overhead members associated with it that must be initialized.

■ The other bytes in the object (for the fields) are always set to zero.

■ Allocating an object from the managed heap could force a garbage collection to occur.

If every type were a reference type, an application’s performance would suffer greatly. Imagine how poor performance would be if every time you used an Int32 value, a memory allocation occurred! To improve performance for simple, frequently used types, the CLR offers lightweight types called value types. Value type instances are usually allocated on a thread’s stack (although they can also be embedded as a field in a reference type object). The variable representing the instance doesn’t contain a pointer to an instance; the variable contains the fields of the instance itself. Because the variable contains the instance’s fields, a pointer doesn’t have to be dereferenced to manipulate the instance’s fields. Value type instances don’t come under the control of the garbage collector, so their use reduces pressure in the managed heap and reduces the number of collections an application requires over its lifetime.

The .NET Framework SDK documentation clearly indicates which types are reference types and which are value types. When looking up a type in the documentation, any type called a class is a reference type. For example, the System.Exception class, the System.IO.FileStream class, and the System.Random class are all reference types. On the other hand, the documentation refers to each value type as a structure or an enumeration. For example, the System.Int32 structure, the System.Boolean structure, the System.Decimal structure, the System.TimeSpan structure, the System.DayOfWeek enumeration, the System.IO.FileAttributes enumeration, and the System.Drawing.FontStyle enumeration are all value types.

All of the structures are immediately derived from the System.ValueType abstract type. System.ValueType is itself immediately derived from the System.Object type. By definition, all value types must be derived from System.ValueType. All enumerations are derived from the System.Enum abstract type, which is itself derived from System.ValueType. The CLR and all programming languages give enumerations special treatment. For more information about enumerated types, refer to Chapter 15, “Enumerated Types and Bit Flags.”

Even though you can’t choose a base type when defining your own value type, a value type can implement one or more interfaces if you choose. In addition, all value types are sealed, which prevents a value type from being used as a base type for any other reference type or value type. So, for example, it’s not possible to define any new types using Boolean, Char, Int32, UInt64, Single, Double, Decimal, and so on as base types.

💡重要提示:对于许多开发人员 (比如非托管 C/C++ 开发人员),最初接触引用类型和值类型时都觉得有些不解。在非托管 C/C++ 中声明类型后,使用该类型的代码会决定是在线程栈上还是在应用程序的堆中分配类型的实例。但在托管代码中,要由定义类型的开发人员决定在什么地方分配类型的实例,使用类型的人对此并无控制权。

The following code and Figure 5-2 demonstrate how reference types and value types differ.

// Reference type (because of 'class') 
class SomeRef { public Int32 x; } 
// Value type (because of 'struct') 
struct SomeVal { public Int32 x; } 
static void ValueTypeDemo() { 
 SomeRef r1 = new SomeRef(); // Allocated in heap 
 SomeVal v1 = new SomeVal(); // Allocated on stack 
 r1.x = 5; // Pointer dereference 
 v1.x = 5; // Changed on stack 
 Console.WriteLine(r1.x); // Displays "5" 
 Console.WriteLine(v1.x); // Also displays "5" 
 // The left side of Figure 5-2 reflects the situation 
 // after the lines above have executed. 
 
 SomeRef r2 = r1; // Copies reference (pointer) only 
 SomeVal v2 = v1; // Allocate on stack & copies members 
 r1.x = 8; // Changes r1.x and r2.x 
 v1.x = 9; // Changes v1.x, not v2.x 
 Console.WriteLine(r1.x); // Displays "8" 
 Console.WriteLine(r2.x); // Displays "8" 
 Console.WriteLine(v1.x); // Displays "9" 
 Console.WriteLine(v2.x); // Displays "5" 
 // The right side of Figure 5-2 reflects the situation 
 // after ALL of the lines above have executed. 
}

In this code, the SomeVal type is declared using struct instead of the more common class. In C#, types declared using struct are value types, and types declared using class are reference types. As you can see, the behavior of reference types and value types differs quite a bit. As you use types in your code, you must be aware of whether the type is a reference type or a value type because it can greatly affect how you express your intentions in the code.

image-20221006155203566

In the preceding code, you saw this line.

SomeVal v1 = new SomeVal(); // Allocated on stack

The way this line is written makes it look as if a SomeVal instance will be allocated on the managed heap. However, the C# compiler knows that SomeVal is a value type and produces code that allocates the SomeVal instance on the thread’s stack. C# also ensures that all of the fields in the value type instance are zeroed.

The preceding line could have been written like this instead.

SomeVal v1; // Allocated on stack

This line also produces IL that allocates the instance on the thread’s stack and zeroes the fields. The only difference is that C# “thinks” that the instance is initialized if you use the new operator. The following code will make this point clear

// These two lines compile because C# thinks that 
// v1's fields have been initialized to 0. 
SomeVal v1 = new SomeVal(); 
Int32 a = v1.x; 
// These two lines don't compile because C# doesn't think that 
// v1's fields have been initialized to 0. 
SomeVal v1; 
Int32 a = v1.x; // error CS0170: Use of possibly unassigned field 'x'

When designing your own types, consider carefully whether to define your types as value types instead of reference types. In some situations, value types can give better performance. In particular, you should declare a type as a value type if all the following statements are true:

■ The type acts as a primitive type. Specifically, this means that it is a fairly simple type that has no members that modify any of its instance fields. When a type offers no members that alter its fields, we say that the type is immutable. In fact, it is recommended that many value types mark all their fields as readonly (discussed in Chapter 7, "Constants and Fields").

■ The type doesn’t need to inherit from any other type.

■ The type won’t have any other types derived from it.

The size of instances of your type is also a condition to take into account because by default, arguments are passed by value, which causes the fields in value type instances to be copied, hurting performance. Again, a method that returns a value type causes the fields in the instance to be copied into the memory allocated by the caller when the method returns, hurting performance. So, in addition to the previous conditions, you should declare a type as a value type if one of the following statements is true:

■ Instances of the type are small (approximately 16 bytes or less).

■ Instances of the type are large (greater than 16 bytes) and are not passed as method parameters or returned from methods.

The main advantage of value types is that they’re not allocated as objects in the managed heap. Of course, value types have several limitations of their own when compared to reference types. Here are some of the ways in which value types and reference types differ:

■ Value type objects have two representations: an unboxed form and a boxed form (discussed in the next section). Reference types are always in a boxed form.

■ Value types are derived from System.ValueType. This type offers the same methods as defined by System.Object. However, System.ValueType overrides the Equals method so that it returns true if the values of the two objects’ fields match. In addition, System.ValueType overrides the GetHashCode method to produce a hash code value by using an algorithm that takes into account the values in the object’s instance fields. Due to performance issues with this default implementation, when defining your own value types, you should override and provide explicit implementations for the Equals and GetHashCode methods. I’ll cover the Equals and GetHashCode methods at the end of this chapter.

■ Because you can’t define a new value type or a new reference type by using a value type as a base class, you shouldn’t introduce any new virtual methods into a value type. No methods can be abstract, and all methods are implicitly sealed (can’t be overridden).

■ Reference type variables contain the memory address of objects in the heap. By default, when a reference type variable is created, it is initialized to null, indicating that the reference type variable doesn’t currently point to a valid object. Attempting to use a null reference type variable causes a NullReferenceException to be thrown. By contrast, value type variables always contain a value of the underlying type, and all members of the value type are initialized to 0. Because a value type variable isn’t a pointer, it’s not possible to generate a NullReferenceException when accessing a value type. The CLR does offer a special feature that adds the notion of nullability to a value type. This feature, called nullable types, is discussed in Chapter 19, “Nullable Value Types.”

■ When you assign a value type variable to another value type variable, a field-by-field copy is made. When you assign a reference type variable to another reference type variable, only the memory address is copied.

■ Because of the previous point, two or more reference type variables can refer to a single object in the heap, allowing operations on one variable to affect the object referenced by the other variable. On the other hand, value type variables are distinct objects, and it’s not possible for operations on one value type variable to affect another.

■ Because unboxed value types aren’t allocated on the heap, the storage allocated for them is freed as soon as the method that defines an instance of the type is no longer active as opposed to waiting for a garbage collection.

CLR 如何控制类型中的字段布局

为了提高性能,CLR 能按照它所选择的任何方式排列类型的字段。例如,CLR 可以在内存中重新安排字段的顺序,将对象引用分为一组,同时正确排列和填充数据字段。但在定义类型时,针对类型的各个字段,你可以告诉 CLR 是严格按照自己制定的顺序排列,还是按照 CLR 自己认为合适的方式重新排列。

为了告诉 CLR 应该怎样做,要为自己定义的类或结构应用 System.Runtime.InteropServices.StructLayoutAttribute 特性。可向该特性的构造器传递 LayoutKind.Auto ,让 CLR 自动排列字段;也可传递 LayoutKind.Sequential ,让 CLR 保持你的字段布局;也可传递 LayoutKind.Explicit ,利用偏移量在内存中显式排列字段。如果不为自己定义的类型显式指定 StructLayoutAttribute ,编译器会选择它自认为最好的布局。

注意,Microsoft C# 编译器默认为引用类型 (类) 选择 LayoutKind.Auto ,为值类型 (结构) 选择 LayoutKind.Sequential 。显然,C# 编译器团队认为和非托管代码互操作时会经常用到结构。为此,字段必须保持程序员定义的顺序。然而,假如创建的值类型不与非托管代码互操作,就应该覆盖 C# 编译器的默认设定,下面是一个例子:

using System;
using System.Runtime.InteropServices;
// 让 CLR 自动排列字段以增强这个值类型的性能
[StructLayout(LayoutKind.Auto)]
internal struct SomeValType {
  private readonly Byte m_b;
  private readonly Int16 m_x;
  ...
}

StructLayoutAttribute 还允许显式指定每个字段的偏移量,这要求向其构造器传递 LayoutKind.Explicit 。 然后向值类型中的每个字段都应用 System.Runtime.InteropServices.FieldOffsetAttribute 特性的实例,向该特性的构造器传递 Int32 值来指出字段第一个字节距离实例起始处的偏移量 (以字节为单位)。显式布局常用于模拟非托管 C/C++ 中的 union ,因为多个字段可起始于内存的相同偏移位置。下面是一个例子:

union 是特殊的类,union 中的数据成员在内存中的存储相互重叠。每个数据成员都从相同内存地址开始。分配给 union 的存储区数量是包含它最大数据成员所需的内存数。同一时刻只有一个成员可以被赋值。 —— 译注

using System;
using System.Runtime.InteropServices;

// 开发人员显式排列这个值类型的字段
[StructLayout(LayoutKind.Explicit)]
internal struct SomeValType {
  [FiledOffset(0)]
  private readonly Byte m_b;     // m_b 和 m_x 字段在该类型的实例中相互重叠

  [FiledOffset(0)]
  private readonly Int16 m_x;    // m_b 和 m_x 字段在该类型的实例中相互重叠
  ...
}

注意在类型中,一个引用类型和一个值类型相互重叠是不合法的。虽然允许多个引用类型在同一个起始偏移位置相互重叠,但这无法验证 (unverifiable)。定义类型,在其中让多个值类型相互重叠则是合法的。但是,为了使这样的类型能够验证 (verifiable),所有重叠字节都必须能通过公共字段访问。

💡小结:CLR 支持引用类型和值类型。引用类型总是从托管堆分配,C# 的 new 操作符返回对象内存地址(即指向对象数据的内存地址)。对于引用类型必须认清以下四个事实:1. 内存总是在托管堆分配。2. 堆上分配的对象都有一些额外成员(类型对象指针和同步块索引),这些成员必须初始化。3. 对象中的其它字节(为字段而设)总是设为零。4. 从托管堆分配对象时,可能强制执行一次垃圾回收。作为 “轻量级类型” 的值类型一般在线程栈上分配(也可作为字段嵌入引用类型的对象中)。由于变量已包含了实例的字段,所以操作实例中的字段不需要提领指针。值类型的实例不受垃圾回收器的控制。因此,值类型的使用缓解了托管堆的压力,并减少了应用程序生存期内的垃圾回收次数。任何被称为类的类型都是引用类型。相反,所有值类型都称为结构或枚举。所有的结构都是抽象类型 System.ValueType 的直接派生类。System.ValueType 又直接从 System.Object 派生。所有枚举都从 System.Enum 抽象类型派生,后者又从 System.ValueType 派生。虽然不能在定义值类型时为它选择基类型,但如果愿意,值类型可以实现一个或多个接口。除此之外,所有值类型都隐式密封,目的是防止将值类型用作其它引用类型或值类型的基类型。在 C# 中,用 struct 声明的类型是值类型,用 class 声明的类型是引用类型。对于值类型来说即使用 new 关键字声明变量,仍然会在线程栈上分配内存,因为 C# 编译器知道这是值类型,所以会给它生成正确的 IL 代码,这种方式 C# 还会确保值类型中的所有字段都初始化为零。值类型对象有两种表示形式:未装箱和已装箱。相反,引用类型总是处于已装箱形式。定义自己的值类型时应重写 Equals 和 GetHashCode 方法,并提供它们的显式实现,因为默认实现会存在性能问题。此外,值类型的所有方法都不能时抽象的,所有方法都隐式密封(不可重写)。

# Boxing and Unboxing Value Types

Value types are lighter weight than reference types because they are not allocated as objects in the managed heap, not garbage collected, and not referred to by pointers. However, in many cases, you must get a reference to an instance of a value type. For example, let’s say that you wanted to create an ArrayList object (a type defined in the System.Collections namespace) to hold a set of Point structures. The code might look like this.

// Declare a value type. 
struct Point { 
 public Int32 x, y; 
} 
public sealed class Program { 
 public static void Main() { 
 ArrayList a = new ArrayList(); 
 Point p; // Allocate a Point (not in the heap). 
 for (Int32 i = 0; i < 10; i++) { 
 p.x = p.y = i; // Initialize the members in the value type. 
 a.Add(p); // Box the value type and add the 
 // reference to the Arraylist. 
 } 
 ... 
 } 
}

With each iteration of the loop, a Point’s value type fields are initialized. Then the Point is stored in the ArrayList. But let’s think about this for a moment. What is actually being stored in the ArrayList? Is it the Point structure, the address of the Point structure, or something else entirely? To get the answer, you must look up ArrayList’s Add method and see what type its parameter is defined as. In this case, the Add method is prototyped as follows.

public virtual Int32 Add(Object value);

From this, you can plainly see that Add takes an Object as a parameter, indicating that Add requires a reference (or pointer) to an object on the managed heap as a parameter. But in the preceding code, I’m passing p, a Point, which is a value type. For this code to work, the Point value type must be converted into a true heap-managed object, and a reference to this object must be obtained.

It’s possible to convert a value type to a reference type by using a mechanism called boxing. Internally, here’s what happens when an instance of a value type is boxed.

  1. Memory is allocated from the managed heap. The amount of memory allocated is the size required by the value type’s fields plus the two additional overhead members (the type object pointer and the sync block index) required by all objects on the managed heap.
  2. The value type’s fields are copied to the newly allocated heap memory.
  3. The address of the object is returned. This address is now a reference to an object; the value type is now a reference type.

The C# compiler automatically produces the IL code necessary to box a value type instance, but you still need to understand what’s going on internally so that you’re aware of code size and performance issues.

In the preceding code, the C# compiler detected that I was passing a value type to a method that requires a reference type, and it automatically emitted code to box the object. So at run time, the fields currently residing in the Point value type instance p are copied into the newly allocated Point object. The address of the boxed Point object (now a reference type) is returned and is then passed to the Add method. The Point object will remain in the heap until it is garbage collected. The Point value type variable (p) can be reused because the ArrayList never knows anything about it. Note that the lifetime of the boxed value type extends beyond the lifetime of the unboxed value type.

💡注意:FCL 现在包含一组新的泛型集合类,非泛型集合类已成为 “昨日黄花”。例如,应该使用 System.Collections.Generic.List<T> 类而不是 System.Collections.ArrayList 类。泛型集合类对非泛型集合类进行了大量改进。例如, API 得到简化和增强,集合类的性能也得到显著提升。但最大的改进就是泛型集合类允许开发人员在操作值类型的集合时不需要对集合中的项进行装箱 / 拆箱。单这一项改进,就使性能提升了不少。这是因为托管堆中需要创建的对象减少了,进而减少了应用程序需要执行的垃圾回收的次数。另外,开发人员还获得了编译时的类型安全性,源代码也因为强制类型转换的次数减少而变得更清晰。所有这一切都将在第 12 章 “泛型” 详细解释。

Now that you know how boxing works, let’s talk about unboxing. Let’s say that you want to grab the first element out of the ArrayList by using the following code.

Point p = (Point) a[0];

Here you’re taking the reference (or pointer) contained in element 0 of the ArrayList and trying to put it into a Point value type instance, p. For this to work, all of the fields contained in the boxed Point object must be copied into the value type variable, p, which is on the thread’s stack. The CLR accomplishes this copying in two steps. First, the address of the Point fields in the boxed Point object is obtained. This process is called unboxing. Then, the values of these fields are copied from the heap to the stack-based value type instance.

Unboxing is not the exact opposite of boxing. The unboxing operation is much less costly than boxing. Unboxing is really just the operation of obtaining a pointer to the raw value type (data fields) contained within an object. In effect, the pointer refers to the unboxed portion in the boxed instance. So, unlike boxing, unboxing doesn’t involve the copying of any bytes in memory. Having made this important clarification, it is important to note that an unboxing operation is typically followed by copying the fields.

Obviously, boxing and unboxing/copy operations hurt your application’s performance in terms of both speed and memory, so you should be aware of when the compiler generates code to perform these operations automatically and try to write code that minimizes this code generation.

Internally, here’s exactly what happens when a boxed value type instance is unboxed:

  1. If the variable containing the reference to the boxed value type instance is null, a NullReferenceException is thrown.
  2. If the reference doesn’t refer to an object that is a boxed instance of the desired value type, an InvalidCastException is thrown.

The second item in the preceding list means that the following code will not work as you might expect.

public static void Main() { 
 Int32 x = 5; 
 Object o = x; // Box x; o refers to the boxed object 
 Int16 y = (Int16) o; // Throws an InvalidCastException 
}

Logically, it makes sense to take the boxed Int32 that o refers to and cast it to an Int16. However, when unboxing an object, the cast must be to the exact unboxed value type—Int32 in this case. Here’s the correct way to write this code.

public static void Main() { 
 Int32 x = 5; 
 Object o = x; // Box x; o refers to the boxed object 
 Int16 y = (Int16)(Int32) o; // Unbox to the correct type and cast 
}

I mentioned earlier that an unboxing operation is frequently followed immediately by a field copy. Let’s take a look at some C# code demonstrating that unbox and copy operations work together.

public static void Main() { 
 Point p; 
 p.x = p.y = 1; 
 Object o = p; // Boxes p; o refers to the boxed instance 
 p = (Point) o; // Unboxes o AND copies fields from boxed 
 // instance to stack variable 
}

On the last line, the C# compiler emits an IL instruction to unbox o (get the address of the fields in the boxed instance) and another IL instruction to copy the fields from the heap to the stack-based variable p.

Now look at this code.

public static void Main() { 
 Point p; 
 p.x = p.y = 1; 
 Object o = p; // Boxes p; o refers to the boxed instance 
 // Change Point's x field to 2 
 p = (Point) o; // Unboxes o AND copies fields from boxed 
 // instance to stack variable 
 p.x = 2; // Changes the state of the stack variable 
 o = p; // Boxes p; o refers to a new boxed instance 
}

The code at the bottom of this fragment is intended only to change Point’s x field from 1 to 2. To do this, an unbox operation must be performed, followed by a field copy, followed by changing the field (on the stack), followed by a boxing operation (which creates a whole new boxed instance in the managed heap). Hopefully, you see the impact that boxing and unboxing/copying operations have on your application’s performance.

Some languages, such as C++/CLI, allow you to unbox a boxed value type without copying the fields. Unboxing returns the address of the unboxed portion of a boxed object (ignoring the object’s type object pointer and sync block index overhead). You can now use this pointer to manipulate the unboxed instance’s fields (which happen to be in a boxed object on the heap). For example, the previous code would be much more efficient if written in C++/CLI, because you could change the value of Point’s x field within the already boxed Point instance. This would avoid both allocating a new object on the heap and copying all of the fields twice!

💡重要提示:如果关心应用程序的性能,就应清楚编译器何时生成代码执行这些操作。遗憾的是,许多编译器都隐式生成代码来装箱对象,所以有时并不知道自己的代码会造成装箱。如果关心特定算法的性能,可用 ILDasm.exe 这样的工具查看方法的 IL 代码,观察 IL 指令 box 都在哪些地方出现。

Let’s look at a few more examples that demonstrate boxing and unboxing.

public static void Main() { 
 Int32 v = 5; // Create an unboxed value type variable. 
 Object o = v; // o refers to a boxed Int32 containing 5. 
 v = 123; // Changes the unboxed value to 123 
 Console.WriteLine(v + ", " + (Int32) o); // Displays "123, 5" 
}

In this code, can you guess how many boxing operations occur? You might be surprised to discover that the answer is three! Let’s analyze the code carefully to really understand what’s going on. To help you understand, I’ve included the IL code generated for the Main method shown in the preceding code. I’ve commented the code so that you can easily see the individual operations.

.method public hidebysig static void Main() cil managed 
{ 
 .entrypoint 
 // Code size 45 (0x2d) 
 .maxstack 3 
 .locals init ([0]int32 v, 
 [1] object o) 
 // Load 5 into v. 
 IL_0000: ldc.i4.5 
 IL_0001: stloc.0 
 // Box v and store the reference pointer in o. 
 IL_0002: ldloc.0 
 IL_0003: box [mscorlib]System.Int32 
 IL_0008: stloc.1 
 // Load 123 into v. 
 IL_0009: ldc.i4.s 123 
 IL_000b: stloc.0 
 // Box v and leave the pointer on the stack for Concat. 
 IL_000c: ldloc.0 
 IL_000d: box [mscorlib]System.Int32 
 // Load the string on the stack for Concat. 
 IL_0012: ldstr ", " 
 // Unbox o: Get the pointer to the In32's field on the stack. 
 IL_0017: ldloc.1 
 IL_0018: unbox.any [mscorlib]System.Int32 
 // Box the Int32 and leave the pointer on the stack for Concat. 
 IL_001d: box [mscorlib]System.Int32 
 // Call Concat. 
 IL_0022: call string [mscorlib]System.String::Concat(object, 
 object, 
 object) 
 // The string returned from Concat is passed to WriteLine. 
 IL_0027: call void [mscorlib]System.Console::WriteLine(string) 
 // Return from Main terminating this application. 
 IL_002c: ret 
} // end of method App::Main

First, an Int32 unboxed value type instance (v) is created on the stack and initialized to 5. Then a variable (o) typed as Object is created, and is initialized to point to v. But because reference type variables must always point to objects in the heap, C# generated the proper IL code to box and store the address of the boxed copy of v in o. Now the value 123 is placed into the unboxed value type instance v; this has no effect on the boxed Int32 value, which keeps its value of 5.

Next is the call to the WriteLine method. WriteLine wants a String object passed to it, but there is no string object. Instead, these three items are available: an unboxed Int32 value type instance (v), a String (which is a reference type), and a reference to a boxed Int32 value type instance (o) that is being cast to an unboxed Int32. These must somehow be combined to create a String.

To create a String, the C# compiler generates code that calls String’s static Concat method. There are several overloaded versions of the Concat method, all of which perform identically—the only difference is in the number of parameters. Because a string is being created from the concatenation of three items, the compiler chooses the following version of the Concat method.

public static String Concat(Object arg0, Object arg1, Object arg2);

For the first parameter, arg0, v is passed. But v is an unboxed value parameter and arg0 is an Object, so v must be boxed and the address to the boxed v is passed for arg0. For the arg1 pasrameter, the "," string is passed as a reference to a String object. Finally, for the arg2 parameter, o (a reference to an Object) is cast to an Int32. This requires an unboxing operation (but no copy operation), which retrieves the address of the unboxed Int32 contained inside the boxed Int32. This unboxed Int32 instance must be boxed again and the new boxed instance’s memory address passed for Concat’s arg2 parameter.

The Concat method calls each of the specified objects’ ToString method and concatenates each object’s string representation. The String object returned from Concat is then passed to WriteLine to show the final result.

I should point out that the generated IL code is more efficient if the call to WriteLine is written as follows.

Console.WriteLine(v + ", " + o);// Displays "123, 5"

This line is identical to the earlier version except that I’ve removed the (Int32) cast that preceded the variable o. This code is more efficient because o is already a reference type to an Object and its address can simply be passed to the Concat method. So, removing the cast saved two operations: an unbox and a box. You can easily see this savings by rebuilding the application and examining the generated IL code, as shown in the following code.

.method public hidebysig static void Main() cil managed 
{ 
 .entrypoint 
 // Code size 35 (0x23) 
 .maxstack 3 
 .locals init ([0] int32 v, 
 [1] object o) 
 // Load 5 into v. 
 IL_0000: ldc.i4.5 
 IL_0001: stloc.0 
 // Box v and store the reference pointer in o. 
 IL_0002: ldloc.0 
 IL_0003: box [mscorlib]System.Int32 
 IL_0008: stloc.1 
 // Load 123 into v. 
 IL_0009: ldc.i4.s 123 
 IL_000b: stloc.0 
 // Box v and leave the pointer on the stack for Concat. 
 IL_000c: ldloc.0 
 IL_000d: box [mscorlib]System.Int32 
 // Load the string on the stack for Concat. 
 IL_0012: ldstr ", " 
 // Load the address of the boxed Int32 on the stack for Concat. 
 IL_0017: ldloc.1 
 // Call Concat. 
 IL_0018: call string [mscorlib]System.String::Concat(object, 
 object, 
 object) 
 // The string returned from Concat is passed to WriteLine. 
 IL_001d: call void [mscorlib]System.Console::WriteLine(string) 
 // Return from Main terminating this application. 
 IL_0022: ret 
} // end of method App::Main

A quick comparison of the IL for these two versions of the Main method shows that the version without the (Int32) cast is 10 bytes smaller than the version with the cast. The extra unbox/box steps in the first version are obviously generating more code. An even bigger concern, however, is that the extra boxing step allocates an additional object from the managed heap that must be garbage collected in the future. Certainly, both versions give identical results, and the difference in speed isn’t noticeable, but extra, unnecessary boxing operations occurring in a loop cause the performance and memory usage of your application to be seriously degraded.

You can improve the previous code even more by calling WriteLine like this.

Console.WriteLine(v.ToString() + ", " + o); // Displays "123, 5"

Now ToString is called on the unboxed value type instance v, and a String is returned. String objects are already reference types and can simply be passed to the Concat method without requiring any boxing.

Let’s look at yet another example that demonstrates boxing and unboxing.

public static void Main() { 
 Int32 v = 5; // Create an unboxed value type variable. 
 Object o = v; // o refers to the boxed version of v. 
 v = 123; // Changes the unboxed value type to 123 
 Console.WriteLine(v); // Displays "123" 
 v = (Int32) o; // Unboxes and copies o into v 
 Console.WriteLine(v); // Displays "5" 
}

How many boxing operations do you count in this code? The answer is one. The reason that there is only one boxing operation is that the System.Console class defines a WriteLine method that accepts an Int32 as a parameter.

public static void WriteLine(Int32 value);

In the two preceding calls to WriteLine, the variable v, an Int32 unboxed value type instance, is passed by value. Now it may be that WriteLine will box this Int32 internally, but you have no control over that. The important thing is that you’ve done the best you could and have eliminated the boxing from your own code.

If you take a close look at the FCL, you’ll notice many overloaded methods that differ based on their value type parameters. For example, the System.Console type offers several overloaded versions of the WriteLine method.

public static void WriteLine(Boolean); 
public static void WriteLine(Char); 
public static void WriteLine(Char[]); 
public static void WriteLine(Int32); 
public static void WriteLine(UInt32); 
public static void WriteLine(Int64); 
public static void WriteLine(UInt64); 
public static void WriteLine(Single); 
public static void WriteLine(Double); 
public static void WriteLine(Decimal); 
public static void WriteLine(Object); 
public static void WriteLine(String);

You’ll also find a similar set of overloaded methods for System.Console’s Write method, System.IO.BinaryWriter’s Write method, System.IO.TextWriter’s Write and WriteLine methods, System.Runtime.Serialization.SerializationInfo’s AddValue method, System.Text.StringBuilder’s Append and Insert methods, and so on. Most of these methods offer overloaded versions for the sole purpose of reducing the number of boxing operations for the common value types.

If you define your own value type, these FCL classes will not have overloads of these methods that accept your value type. Furthermore, there are a bunch of value types already defined in the FCL for which overloads of these methods do not exist. If you call a method that does not have an overload for the specific value type that you are passing to it, you will always end up calling the overload that takes an Object. Passing a value type instance as an Object will cause boxing to occur, which will adversely affect performance. If you are defining your own class, you can define the methods in the class to be generic (possibly constraining the type parameters to be value types). Generics give you a way to define a method that can take any kind of value type without having to box it. Generics are discussed in Chapter 12.

One last point about boxing: if you know that the code that you’re writing is going to cause the compiler to box a single value type repeatedly, your code will be smaller and faster if you manually box the value type. Here’s an example.

using System; 
public sealed class Program { 
 public static void Main() { 
 Int32 v = 5; // Create an unboxed value type variable. 
#if INEFFICIENT 
 // When compiling the following line, v is boxed 
 // three times, wasting time and memory. 
 Console.WriteLine("{0}, {1}, {2}", v, v, v); 
#else 
 // The lines below have the same result, execute 
 // much faster, and use less memory. 
 Object o = v; // Manually box v (just once). 
 // No boxing occurs to compile the following line. 
 Console.WriteLine("{0}, {1}, {2}", o, o, o); 
#endif 
 } 
}

If this code is compiled with the INEFFICIENT symbol defined, the compiler will generate code to box v three times, causing three objects to be allocated from the heap! This is extremely wasteful because each object will have exactly the same contents: 5. If the code is compiled without the INEFFICIENT symbol defined, v is boxed just once, so only one object is allocated from the heap. Then, in the call to Console.WriteLine, the reference to the single boxed object is passed three times. This second version executes much faster and allocates less memory from the heap.

In these examples, it’s fairly easy to recognize when an instance of a value type requires boxing. Basically, if you want a reference to an instance of a value type, the instance must be boxed. Usually this happens because you have a value type instance and you want to pass it to a method that requires a reference type. However, this situation isn’t the only one in which you’ll need to box an instance of a value type.

Recall that unboxed value types are lighter-weight types than reference types for two reasons:

■ They are not allocated on the managed heap.

■ They don’t have the additional overhead members that every object on the heap has: a type object pointer and a sync block index

Because unboxed value types don’t have a sync block index, you can’t have multiple threads synchronize their access to the instance by using the methods of the System.Threading.Monitor type (or by using C#’s lock statement).

Even though unboxed value types don’t have a type object pointer, you can still call virtual methods (such as Equals, GetHashCode, or ToString) inherited or overridden by the type. If your value type overrides one of these virtual methods, then the CLR can invoke the method nonvirtually because value types are implicitly sealed and cannot have any types derived from them. In addition, the value type instance being used to invoke the virtual method is not boxed. However, if your override of the virtual method calls into the base type's implementation of the method, then the value type instance does get boxed when calling the base type's implementation so that a reference to a heap object gets passed to the this pointer into the base method.

However, calling a nonvirtual inherited method (such as GetType or MemberwiseClone) always requires the value type to be boxed because these methods are defined by System.Object, so the methods expect the this argument to be a pointer that refers to an object on the heap.

In addition, casting an unboxed instance of a value type to one of the type’s interfaces requires the instance to be boxed, because interface variables must always contain a reference to an object on the heap. (I’ll talk about interfaces in Chapter 13, “Interfaces.”) The following code demonstrates.

using System; 
internal struct Point : IComparable { 
 private readonly Int32 m_x, m_y; 
 // Constructor to easily initialize the fields 
 public Point(Int32 x, Int32 y) { 
 m_x = x; 
 m_y = y; 
 } 
 // Override ToString method inherited from System.ValueType 
 public override String ToString() { 
 // Return the point as a string. Note: calling ToString prevents boxing
 return String.Format("({0}, {1})", m_x.ToString(), m_y.ToString()); 
 } 
 // Implementation of type-safe CompareTo method 
 public Int32 CompareTo(Point other) { 
 // Use the Pythagorean Theorem to calculate 
 // which point is farther from the origin (0, 0) 
 return Math.Sign(Math.Sqrt(m_x * m_x + m_y * m_y) 
 - Math.Sqrt(other.m_x * other.m_x + other.m_y * other.m_y)); 
 }
  // Implementation of IComparable's CompareTo method 
 public Int32 CompareTo(Object o) { 
 if (GetType() != o.GetType()) { 
 throw new ArgumentException("o is not a Point"); 
 } 
 // Call type-safe CompareTo method 
 return CompareTo((Point) o); 
 } 
} 
public static class Program { 
 public static void Main() { 
 // Create two Point instances on the stack. 
 Point p1 = new Point(10, 10); 
 Point p2 = new Point(20, 20); 
 // p1 does NOT get boxed to call ToString (a virtual method). 
 Console.WriteLine(p1.ToString());// "(10, 10)" 
 // p DOES get boxed to call GetType (a non-virtual method). 
 Console.WriteLine(p1.GetType());// "Point" 
 // p1 does NOT get boxed to call CompareTo. 
 // p2 does NOT get boxed because CompareTo(Point) is called. 
 Console.WriteLine(p1.CompareTo(p2));// "-1" 
 // p1 DOES get boxed, and the reference is placed in c. 
 IComparable c = p1; 
 Console.WriteLine(c.GetType());// "Point" 
 // p1 does NOT get boxed to call CompareTo. 
 // Because CompareTo is not being passed a Point variable, 
 // CompareTo(Object) is called, which requires a reference to 
 // a boxed Point. 
 // c does NOT get boxed because it already refers to a boxed Point. 
 Console.WriteLine(p1.CompareTo(c));// "0" 
 // c does NOT get boxed because it already refers to a boxed Point. 
 // p2 does get boxed because CompareTo(Object) is called. 
 Console.WriteLine(c.CompareTo(p2));// "-1" 
 // c is unboxed, and fields are copied into p2. 
 p2 = (Point) c; 
 // Proves that the fields got copied into p2. 
 Console.WriteLine(p2.ToString());// "(10, 10)" 
 } 
}

This code demonstrates several scenarios related to boxing and unboxing:

■ Calling ToString In the call to ToString, p1 doesn’t have to be boxed. At first, you’d think that p1 would have to be boxed because ToString is a virtual method that is inherited from the base type, System.ValueType. Normally, to call a virtual method, the CLR needs to determine the object’s type in order to locate the type’s method table. Because p1 is an unboxed value type, there’s no type object pointer. However, the just-in-time (JIT) compiler sees that Point overrides the ToString method, and it emits code that calls ToString directly (nonvirtually) without having to do any boxing. The compiler knows that polymorphism can’t come into play here because Point is a value type, and no type can derive from it to provide another implementation of this virtual method. Note that if Point's ToString method internally calls base.ToString(), then the value type instance would be boxed when calling System.ValueType's ToString method.

■ Calling GetType In the call to the nonvirtual GetType method, p1 does have to be boxed. The reason is that the Point type inherits GetType from System.Object. So to call GetType, the CLR must use a pointer to a type object, which can be obtained only by boxing p1.

■ Calling CompareTo (first time) In the first call to CompareTo, p1 doesn’t have to be boxed because Point implements the CompareTo method, and the compiler can just call it directly. Note that a Point variable (p2) is being passed to CompareTo, and therefore the compiler calls the overload of CompareTo that accepts a Point parameter. This means that p2 will be passed by value to CompareTo and no boxing is necessary.

■ Casting to IComparable When casting p1 to a variable (c) that is of an interface type, p1 must be boxed because interfaces are reference types by definition. So p1 is boxed, and the pointer to this boxed object is stored in the variable c. The following call to GetType proves that c does refer to a boxed Point on the heap.

■ Calling CompareTo (second time) In the second call to CompareTo, p1 doesn’t have to be boxed because Point implements the CompareTo method, and the compiler can just call it directly. Note that an IComparable variable (c) is being passed to CompareTo, and therefore, the compiler calls the overload of CompareTo that accepts an Object parameter. This means that the argument passed must be a pointer that refers to an object on the heap. Fortunately, c does refer to a boxed Point, and therefore, that memory address in c can be passed to CompareTo, and no additional boxing is necessary.

■ Calling CompareTo (third time) In the third call to CompareTo, c already refers to a boxed Point object on the heap. Because c is of the IComparable interface type, you can call only the interface’s CompareTo method that requires an Object parameter. This means that the argument passed must be a pointer that refers to an object on the heap. So p2 is boxed, and the pointer to this boxed object is passed to CompareTo.

■ Casting to Point When casting c to a Point, the object on the heap referred to by c is unboxed, and its fields are copied from the heap to p2, an instance of the Point type residing on the stack.

I realize that all of this information about reference types, value types, and boxing might be overwhelming at first. However, a solid understanding of these concepts is critical to any .NET Framework developer’s long-term success. Trust me: having a solid grasp of these concepts will allow you to build efficient applications faster and easier.

# Changing Fields in a Boxed Value Type by Using Interfaces (and Why You Shouldn’t Do This)

Let’s have some fun and see how well you understand value types, boxing, and unboxing. Examine the following code, and see whether you can figure out what it displays on the console.

using System; 
// Point is a value type. 
internal struct Point { 
 private Int32 m_x, m_y; 
 public Point(Int32 x, Int32 y) { 
 m_x = x; 
 m_y = y; 
 } 
 public void Change(Int32 x, Int32 y) { 
 m_x = x; m_y = y; 
 } 
 public override String ToString() { 
 return String.Format("({0}, {1})", m_x.ToString(), m_y.ToString()); 
 } 
} 
public sealed class Program { 
 public static void Main() { 
 Point p = new Point(1, 1); 
 Console.WriteLine(p); 
 p.Change(2, 2); 
 Console.WriteLine(p); 
 Object o = p; 
 Console.WriteLine(o); 
 ((Point) o).Change(3, 3); 
 Console.WriteLine(o); 
 } 
}

Very simply, Main creates an instance (p) of a Point value type on the stack and sets its m_x and m_y fields to 1. Then, p is boxed before the first call to WriteLine, which calls ToString on the boxed Point, and (1, 1) is displayed as expected. Then, p is used to call the Change method, which changes the values of p’s m_x and m_y fields on the stack to 2. The second call to WriteLine requires p to be boxed again and displays (2, 2), as expected.

Now, p is boxed a third time, and o refers to the boxed Point object. The third call to WriteLine again shows (2, 2), which is also expected. Finally, I want to call the Change method to update the fields in the boxed Point object. However, Object (the type of the variable o) doesn’t know anything about the Change method, so I must first cast o to a Point. Casting o to a Point unboxes o and copies the fields in the boxed Point to a temporary Point on the thread’s stack! The m_x and m_y fields of this temporary point are changed to 3 and 3, but the boxed Point isn’t affected by this call to Change. When WriteLine is called the fourth time, (2, 2) is displayed again. Many developers do not expect this.

Some languages, such as C++/CLI, let you change the fields in a boxed value type, but C# does not. However, you can fool C# into allowing this by using an interface. The following code is a modified version of the previous code.

using System; 
// Interface defining a Change method 
internal interface IChangeBoxedPoint { 
 void Change(Int32 x, Int32 y); 
} 
// Point is a value type. 
internal struct Point : IChangeBoxedPoint { 
 private Int32 m_x, m_y; 
 public Point(Int32 x, Int32 y) { 
 m_x = x; 
 m_y = y; 
 } 
 public void Change(Int32 x, Int32 y) { 
 m_x = x; m_y = y; 
 } 
 public override String ToString() { 
 return String.Format("({0}, {1})", m_x.ToString(), m_y.ToString()); 
 } 
} 
public sealed class Program { 
 public static void Main() { 
 Point p = new Point(1, 1); 
 Console.WriteLine(p); 
 p.Change(2, 2); 
 Console.WriteLine(p); 
 Object o = p; 
 Console.WriteLine(o); 
 ((Point) o).Change(3, 3); 
 Console.WriteLine(o); 
 // Boxes p, changes the boxed object and discards it 
 ((IChangeBoxedPoint) p).Change(4, 4); 
 Console.WriteLine(p); 
 // Changes the boxed object and shows it 
 ((IChangeBoxedPoint) o).Change(5, 5); 
 Console.WriteLine(o); 
 } 
}

This code is almost identical to the previous version. The main difference is that the Change method is defined by the IChangeBoxedPoint interface, and the Point type now implements this interface. Inside Main, the first four calls to WriteLine are the same and produce the same results I had before (as expected). However, I’ve added two more examples at the end of Main.

In the first example, the unboxed Point, p, is cast to an IChangeBoxedPoint. This cast causes the value in p to be boxed. Change is called on the boxed value, which does change its m_x and m_y fields to 4 and 4, but after Change returns, the boxed object is immediately ready to be garbage collected. So the fifth call to WriteLine displays (2, 2). Many developers won’t expect this result.

In the last example, the boxed Point referred to by o is cast to an IChangeBoxedPoint. No boxing is necessary here because o is already a boxed Point. Then Change is called, which does change the boxed Point’s m_x and m_y fields. The interface method Change has allowed me to change the fields in a boxed Point object! Now, when WriteLine is called, it displays (5, 5) as expected. The purpose of this whole example is to demonstrate how an interface method is able to modify the fields of a boxed value type. In C#, this isn’t possible without using an interface method.

💡重要提示:本章前面提到,值类型应该 “不可变”(immutable)。也就是说,我们不应该定义任何会修改实例字段的成员。事实上,我建议将值类型的字段都标记为 readonly 。这样,一旦不留神写一个视图更改字段的方法,编译时就会报错。前面的例子清楚揭示了我们为什么应该这样做。假如方法试图修改值类型的实例字段,调用这个方法就会产生非预期的行为。构造好值类型后,如果不调用任何会修改其状态的方法 (或者如果根本不存在这样的方法),就用不着操心什么时候发生装箱和拆箱 / 字段复制。如果值类型不可变,简单复制相同的状态就可以了 (不用担心有方法会修改这些状态),代码的任何行为都在你的掌控之中。

有许多开发人员审阅了本书内容。在阅读我的部分示例代码之后 (比如前面的代码),他们告诉我以后再也不敢使用值类型了。我必须声明,值类型的这些玄妙之处着实花了我好几天功夫进行调试,痛定思痛之余,我必须在之里着重强调,提醒大家注意,希望大家记住我描述的问题。这样,当代码真正出现这些问题的时候,我们就能够做到心中有数。虽然如此,但也不要因噎废食而惧怕值类型。它们很有用,有自己的适用场景。毕竟,程序偶尔还是需要 Int32 的。只是要注意,值类型和引用类型的行为会因为使用方式的不同而有明显差异。事实上,前例将 Point 声明为 class 而不是 struct ,即可获得令人满意的结果。最后还要告诉你一个好消息,FCL 的核心值类型 ( ByteInt32UInt32Int64UInt64Single , Double , Decimal , BigInteger , Complex 以及所有枚举) 都是 “不可变” 的,所以在使用这些类型时,不会发生任何稀奇古怪的事情。

# Object Equality and Identity

Frequently, developers write code to compare objects with one another. This is particularly true when placing objects in a collection and you’re writing code to sort, search, or compare items in a collection. In this section, I'll discuss object equality and identity, and I’ll also discuss how to define a type that properly implements object equality.

The System.Object type offers a virtual method named Equals, whose purpose is to return true if two objects contain the same value. The implementation of Object’s Equals method looks like this.

public class Object { 
 public virtual Boolean Equals(Object obj) { 
 // If both references point to the same object, 
 // they must have the same value. 
 if (this == obj) return true; 
 // Assume that the objects do not have the same value. 
 return false; 
 } 
}

At first, this seems like a reasonable default implementation of Equals: it returns true if the this and obj arguments refer to the same exact object. This seems reasonable because Equals knows that an object must have the same value as itself. However, if the arguments refer to different objects, Equals can’t be certain if the objects contain the same values, and therefore, false is returned. In other words, the default implementation of Object’s Equals method really implements identity, not value equality.

Unfortunately, as it turns out, Object’s Equals method is not a reasonable default, and it should have never been implemented this way. You immediately see the problem when you start thinking about class inheritance hierarchies and how to properly override Equals. Here is how to properly implement an Equals method internally:

  1. If the obj argument is null, return false because the current object identified by this is obviously not null when the nonstatic Equals method is called.
  2. If the this and obj arguments refer to the same object, return true. This step can improve performance when comparing objects with many fields.
  3. If the this and obj arguments refer to objects of different types, return false. Obviously, checking if a String object is equal to a FileStream object should result in a false result.
  4. For each instance field defined by the type, compare the value in the this object with the value in the obj object. If any fields are not equal, return false.
  5. Call the base class’s Equals method so it can compare any fields defined by it. If the base class’s Equals method returns false, return false; otherwise, return true.

So Microsoft should have implemented Object’s Equals like this.

public class Object { 
 public virtual Boolean Equals(Object obj) { 
 // The given object to compare to can't be null 
 if (obj == null) return false;
 // If objects are different types, they can't be equal. 
 if (this.GetType() != obj.GetType()) return false; 
 // If objects are same type, return true if all of their fields match 
 // Because System.Object defines no fields, the fields match 
 return true; 
 } 
}

But, because Microsoft didn’t implement Equals this way, the rules for how to implement Equals are significantly more complicated than you would think. When a type overrides Equals, the override should call its base class’s implementation of Equals unless it would be calling Object’s implementation. This also means that because a type can override Object’s Equals method, this Equals method can no longer be called to test for identity. To fix this, Object offers a static ReferenceEquals method, which is implemented like this.

public class Object { 
 public static Boolean ReferenceEquals(Object objA, Object objB) { 
 return (objA == objB); 
 } 
}

You should always call ReferenceEquals if you want to check for identity (if two references point to the same object). You shouldn’t use the C# == operator (unless you cast both operands to Object first) because one of the operands’ types could overload the == operator, giving it semantics other than identity.

As you can see, the .NET Framework has a very confusing story when it comes to object equality and identity. By the way, System.ValueType (the base class of all value types) does override Object’s Equals method and is correctly implemented to perform a value equality check (not an identity check). Internally, ValueType’s Equals is implemented this way:

  1. If the obj argument is null, return false.
  2. If the this and obj arguments refer to objects of different types, return false.
  3. For each instance field defined by the type, compare the value in the this object with the value in the obj object by calling the field’s Equals method. If any fields are not equal, return false.
  4. Return true. Object’s Equals method is not called by ValueType’s Equals method.

Internally, ValueType’s Equals method uses reflection (covered in Chapter 23, “Assembly Loading and Reflection”) to accomplish step 3. Because the CLR’s reflection mechanism is slow, when defining your own value type, you should override Equals and provide your own implementation to improve the performance of value equality comparisons that use instances of your type. Of course, in your own implementation, do not call base.Equals.

When defining your own type, if you decide to override Equals, you must ensure that it adheres to the four properties of equality:

■ Equals must be reflexive; that is, x.Equals(x) must return true.

■ Equals must be symmetric; that is, x.Equals(y) must return the same value as y.Equals(x).

■ Equals must be transitive; that is, if x.Equals(y) returns true and y.Equals(z) returns true, then x.Equals(z) must also return true.

■ Equals must be consistent. Provided that there are no changes in the two values being compared, Equals should consistently return true or false.

If your implementation of Equals fails to adhere to all of these rules, your application will behave in strange and unpredictable ways.

When overriding the Equals method, there are a couple more things that you’ll probably want to do:

■ Have the type implement the System.IEquatable interface’s Equals method This generic interface allows you to define a type-safe Equals method. Usually, you’ll implement the Equals method that takes an Object parameter to internally call the type-safe Equals method.

■ Overload the == and != operator methods Usually, you’ll implement these operator methods to internally call the type-safe Equals method.

Furthermore, if you think that instances of your type will be compared for the purposes of sorting, you’ll want your type to also implement System.IComparable’s CompareTo method and System.IComparable’s type-safe CompareTo method. If you implement these methods, you’ll also want to overload the various comparison operator methods (<, <=, >, >=) and implement these methods internally to call the type-safe CompareTo method.

💡小结:将值类型转换成引用类型要使用装箱机制,这会在托管堆中分配内存并且值类型的字段会复制到新分配的堆内存,然后返回对象地址。有装箱自然会有拆箱,拆箱不是直接把装箱过程倒过来,拆箱的代价会比装箱低很多,因为拆箱只是获取已装箱实例中未装箱部分的指针,不会存在任何内存中的复制,只不过这个后面往往接着进行了一次字段复制的操作,这时才将值从堆复制到基于栈的值类型实例中(忽略对象的 “类型对象指针” 和同步块索引 “这两个额外的成员)。在对对象进行拆箱时,只能将其转型为最初未装箱的值类型,否则会抛出 InvalidCastException 异常。有的语言(比如 C++/CLI)允许在不复制字段的前提下对已装箱的值类型进行拆箱。我们应当减少程序中的装箱 / 拆箱次数,因为额外的装箱步骤会从托管堆中分配一个额外的对象,将来必须对其进行垃圾回收。FCL 的许多方法都针对不同的值类型参数进行了重载,大多数方法进行重载唯一的目的就是减少常用值类型的装箱次数。定义自己的类时,可以将类中的方法定义为泛型(通过类型约束将类型参数限制为值类型)。这样方法就可以获取任何值类型而不必装箱。由于未装箱值类型没有同步块索引,所以不能使用 System.Threading.Monitor 类型的方法(或者 C#lock 语句)让多个线程同步对实例的访问。虽然未装箱值类型没有类型对象指针,但仍可调用由类型继承或重写的虚方法(比如 EqualsGetHashCode 或者 ToString )。如果值类型重写了其中任何虚方法,那么 CLR 可以非虚地调用该方法,因为值类型应试密封,不可能有类型从它们派生,而且调用虚方法的值类型实例没有装箱。然而,如果重写的虚方法要调用方法在基类中的实现,那么在调用基类的实现时,值类型实例会装箱,以便能够通过 this 指针将对一个堆对象的引用传给基方法。此外,将值类型的未装箱实例转型未类型的某个接口时要对实例进行装箱。这是因为接口变量必须包含对堆对象的引用。有的语言(比如 C++/CLI)允许更改已装箱值类型中的字段,但 C# 不允许。不过,可以用接口欺骗 C#,让它允许这个操作。正常来说值类型应该定义成 “不可变”(immutable),以便在使用类型时不会发生任何稀奇古怪的事情。对象还有相等性和同一性的概念。对于 Object 的 Equals 方法的默认实现的实际时同一性,而非相等性。同一性指的是是否是相同的对象。相等性指的是两个对象是否包含相等的值。由于类型能重写 Object 的 Equals 方法,所以不能再用它测试同一性。要想检查同一性务必调用 ReferenceEquals ,不应使用 C# 的 == 操作符(除非先把两个操作数都转型为 Object),因为某个操作数的类型可能重载了 == 操作符,为其赋予不同于 “同一性” 的语义。 System.ValueType 从写了 Object 的 Equals 方法,并进行了正确的实现来执行值的相等性检查。在内部, ValueTypeEquals 方法利用反射将类型定义的每个实例字段进行比较。由于 CLR 反射机制慢,定义自己的值类型时应重写 Equals 方法来提供自己的实现,从而提高自己类型的实例进行值相等比较的性能。重写 Equals 方法时,可能还会实现 System.IEquatable<T>接口的Equals方法 ,这个泛型接口允许定义类型安全的 Equals 方法,在获取一个 Object 参数的 Equals 方法或者重载的 == 和!= 操作符方法中就可以调用类型安全的 Equals 方法。

# Object Hash Codes

The designers of the FCL decided that it would be incredibly useful if any instance of any object could be placed into a hash table collection. To this end, System.Object provides a virtual GetHashCode method so that an Int32 hash code can be obtained for any and all objects.

If you define a type and override the Equals method, you should also override the GetHashCode method. In fact, Microsoft’s C# compiler emits a warning if you define a type that overrides Equals without also overriding GetHashCode. For example, compiling the following type yields this warning: warning CS0659: 'Program' overrides Object.Equals(object o) but does not override Object.GetHashCode().

public sealed class Program { 
 public override Boolean Equals(Object obj) { ... } 
}

The reason a type that defines Equals must also define GetHashCode is that the implementation of the System.Collections.Hashtable type, the System.Collections.Generic.Dictionary type, and some other collections require that any two objects that are equal must have the same hash code value. So if you override Equals, you should override GetHashCode to ensure that the algorithm you use for calculating equality corresponds to the algorithm you use for calculating the object’s hash code.

Basically, when you add a key/value pair to a collection, a hash code for the key object is obtained first. This hash code indicates which “bucket” the key/value pair should be stored in. When the collection needs to look up a key, it gets the hash code for the specified key object. This code identifies the “bucket” that is now searched sequentially, looking for a stored key object that is equal to the specified key object. Using this algorithm of storing and looking up keys means that if you change a key object that is in a collection, the collection will no longer be able to find the object. If you intend to change a key object in a hash table, you should remove the original key/value pair, modify the key object, and then add the new key/value pair back into the hash table.

Defining a GetHashCode method can be easy and straightforward. But depending on your data types and the distribution of data, it can be tricky to come up with a hashing algorithm that returns a well-distributed range of values. Here’s a simple example that will probably work just fine for Point objects.

internal sealed class Point { 
 private readonly Int32 m_x, m_y; 
 public override Int32 GetHashCode() { 
 return m_x ^ m_y; // m_x XOR'd with m_y 
 } 
 ... 
}

When selecting an algorithm for calculating hash codes for instances of your type, try to follow these guidelines:

■ Use an algorithm that gives a good random distribution for the best performance of the hash table.

■ Your algorithm can also call the base type’s GetHashCode method, including its return value. However, you don’t generally want to call Object’s or ValueType’s GetHashCode method, because the implementation in either method doesn’t lend itself to high-performance hashing algorithms.

■ Your algorithm should use at least one instance field.

■ Ideally, the fields you use in your algorithm should be immutable; that is, the fields should be initialized when the object is constructed, and they should never again change during the object’s lifetime.

■ Your algorithm should execute as quickly as possible.

■ Objects with the same value should return the same code. For example, two String objects with the same text should return the same hash code value.

System.Object’s implementation of the GetHashCode method doesn’t know anything about its derived type and any fields that are in the type. For this reason, Object’s GetHashCode method returns a number that is guaranteed not to change for the lifetime of the object.

💡重要提示:假如因为某些原因要实现自己的哈希表集合,或者要在实现的代码中调用 GetHashCode ,记住千万不要对哈希码进行持久化,因为哈希码很容易改变。例如,一个类型未来的版本可能使用不同的算法计算对象哈希码。有个公司没有把这个警告放在心上。在他们的网站上,用户可选择用户名和密码来创建账号。然后,网站获取密码 String ,调用 GetHashCode ,将哈希码持久性存储到数据库。用户重新登录网站,输入自己的密码。网站再次调用 GetHashCode ,并将哈希码与数据库中存储的值比较,匹配就允许访问。不幸的是,公司升级到新版本 CLR 后, StringGetHashCode 方法发生了改变,现在返回不同的哈希码。结果是所有用户都无法登录!

💡小结: System.Object 提供了虚方法 GetHashCode ,它能获取任意对象的 Int32 哈希码。如果重写了 Equals 方法,还应该重写 GetHashCode 方法,否者会收到 Microsoft C# 编译器的警告,这是由于在 System.Collections.Hashtable 类型、 System.Collections.Generic.Dictionary 类型以及其它一些集合的实现中,要求两个对象必须具有相同哈希码才被视为相等。所以,重写 Equals 就必须重写 GetHashdCode ,确保相等性算法和对象哈希码算法一致。在向集合中添加键值对是,首先要获取键对象的哈希码,该哈希码之处键值对存储到哪个哈希桶(bucket)中,哈希码标识了之后要搜索的哈希桶,并在其中查找与指定键对象相等的键对象。由于哈希表采用这个算法来存储和查找键,意味着一旦修改了集合中的一个键对象,集合就再也找不到该对象。所以,需要修改哈希表中键对象时,正确做法是溢出原来的键值对,修改键对象,再将新的键值对添加回哈希表。在选择算法来计算类型实例的哈希码时,还应该存寿一些规则。例如算法要提供良好的随机分布、一般不调用 ObjectValueTypeGetHashCode 方法(因为两者的实现都和高性能哈希算法 “不沾边”)。 System.Object 实现的 GetHashCode 方法对派生类型和其中的字段一无所知,所以返回一个在对象生存期保证不变的编号。

# The dynamic Primitive Type

C# is a type-safe programming language. This means that all expressions resolve into an instance of a type and the compiler will generate only code that is attempting to perform an operation that is valid for this type. The benefit of a type-safe programming language over a non–type-safe programming language is that many programmer errors are detected at compile time, helping to ensure that the code is correct before you attempt to execute it. In addition, compile-time languages can typically produce smaller and faster code because they make more assumptions at compile time and bake those assumptions into the resulting IL and metadata.

However, there are also many occasions when a program has to act on information that it doesn’t know about until it is running. Although you can use type-safe programming languages (like C#) to interact with this information, the syntax tends to be clumsy, especially because you tend to work a lot with strings, and performance is hampered as well. If you are writing a pure C# application, then the only occasion you have for working with runtime-determined information is when you are using reflection (discussed in Chapter 23). However, many developers also use C# to communicate with components that are not implemented in C#. Some of these components could be .NET-dynamic languages such as Python or Ruby, or COM objects that support the IDispatch interface (possibly implemented in native C or C++), or HTML Document Object Model (DOM) objects (implemented using various languages and technologies). Communicating with HTML DOM objects is particularly useful when building a Microsoft Silverlight application.

To make it easier for developers using reflection or communicating with other components, the C# compiler offers you a way to mark an expression’s type as dynamic. You can also put the result of an expression into a variable and you can mark a variable’s type as dynamic. This dynamic expression/ variable can then be used to invoke a member such as a field, a property/indexer, a method, delegate, and unary/binary/conversion operators. When your code invokes a member by using a dynamic expression/variable, the compiler generates special IL code that describes the desired operation. This special code is referred to as the payload. At run time, the payload code determines the exact operation to execute based on the actual type of the object now referenced by the dynamic expression/ variable.

Here is some code to demonstrate what I’m talking about.

internal static class DynamicDemo {
 public static void Main() {
 dynamic value;
 for (Int32 demo = 0; demo < 2; demo++) {
 value = (demo == 0) ? (dynamic) 5 : (dynamic) "A";
 value = value + value;
 M(value);
 }
 }
 private static void M(Int32 n) { Console.WriteLine("M(Int32): " + n); }
 private static void M(String s) { Console.WriteLine("M(String): " + s); }
}

When I execute Main, I get the following output.

M(Int32): 10
M(String): AA

To understand what’s happening, let’s start by looking at the + operator. This operator has operands of the dynamic type. Because value is dynamic, the C# compiler emits payload code that will examine the actual type of value at run time and determine what the + operator should actually do.

The first time the + operator evaluates, value contains 5 (an Int32) and the result is 10 (also an Int32). This puts this result in the value variable. Then, the M method is called, passing it value. For the call to M, the compiler will emit payload code that will, at run time, examine the actual type of the argument being passed to M and determine which overload of the M method to call. When value contains an Int32, the overload of M that takes an Int32 parameter is called.

The second time the + operator evaluates, value contains “A” (a String) and the result is “AA” (the result of concatenating “A” with itself). Then, the M method is called again, passing it value. This time, the payload code determines that the actual type being passed to M is a String and calls the overload of M that takes a String parameter.

When the type of a field, method parameter, or method return type is specified as dynamic, the compiler converts this type to the System.Object type and applies an instance of System.Runtime.CompilerServices.DynamicAttribute to the field, parameter, or return type in metadata. If a local variable is specified as dynamic, then the variable’s type will also be of type Object, but the DynamicAttribute is not applied to the local variable because its usage is self-contained within the method. Because dynamic is really the same as Object, you cannot write methods whose signature differs only by dynamic and Object.

It is also possible to use dynamic when specifying generic type arguments to a generic class (reference type), a structure (value type), an interface, a delegate, or a method. When you do this, the compiler converts dynamic to Object and applies DynamicAttribute to the various pieces of metadata where it makes sense. Note that the generic code that you are using has already been compiled and will consider the type to be Object; no dynamic dispatch will be performed because the compiler did not produce any payload code in the generic code.

Any expression can implicitly be cast to dynamic because all expressions result in a type that is derived from Object.2 Normally, the compiler does not allow you to write code that implicitly casts an expression from Object to another type; you must use explicit cast syntax. However, the compiler does allow you to cast an expression from dynamic to another type by using implicit cast syntax.

Object o1 = 123; // OK: Implicit cast from Int32 to Object (boxing)
Int32 n1 = o; // Error: No implicit cast from Object to Int32
Int32 n2 = (Int32) o; // OK: Explicit cast from Object to Int32 (unboxing)
dynamic d1 = 123; // OK: Implicit cast from Int32 to dynamic (boxing)
Int32 n3 = d1; // OK: Implicit cast from dynamic to Int32 (unboxing)

Although the compiler allows you to omit the explicit cast when casting from dynamic to some other type, the CLR will validate the cast at run time to ensure that type safety is maintained. If the object’s type is not compatible with the cast, the CLR will throw an InvalidCastException exception.

Note that the result of evaluating a dynamic expression is a dynamic expression. Examine this code.

dynamic d = 123;
var result = M(d); // Note: 'var result' is the same as 'dynamic result'

Here, the compiler allows the code to compile because it doesn’t know at compile time which M method it will call. Therefore, it also does not know what type of result M will return. And so, the compiler assumes that the result variable is of type dynamic itself. You can verify this by placing your mouse over var in the Visual Studio editor; the IntelliSense window will indicate 'dynamic: Represents an object whose operations will be resolved at runtime.' If the M method invoked at run time has a return type of void, a Microsoft.CSharp.RuntimeBinder.RuntimeBinderException exception is thrown.

💡重要提示:不要混淆 dynamicvar 。用 var 声明局部变量只是一种简化语法,它要求编译器根据表达式推断具体数据类型。 var 关键字只能在方法内部声明局部变量,而 dynamic 关键字可用于局部变量、字段和参数。表达式不能转型为 var ,但能转型为 dynamic 。必须显示初始化用 var 声明的变量,但无需初始化用 dynamic 声明的变量。欲知 C# 的 var 关键字的详情,请参见 9.2 节 “隐式类型的局部变量”。

However, when converting from dynamic to another static type, the result’s type is, of course, the static type. Similarly, when constructing a type by passing one or more dynamic arguments to its constructor, the result is the type of object you are constructing.

dynamic d = 123;
var x = (Int32) d; // Conversion: 'var x' is the same as 'Int32 x'
var dt = new DateTime(d); // Construction: 'var dt' is the same as 'DateTime dt'

If a dynamic expression is specified as the collection in a foreach statement or as a resource in a using statement, the compiler will generate code that attempts to cast the expression to the non-generic System.IEnumerable interface or to the System.IDisposable interface, respectively. If the cast succeeds, the expression is used and the code runs just fine. If the cast fails, a Microsoft.CSharp.RuntimeBinder.RuntimeBinderException exception is thrown.

💡重要提示: dynamic 表达式其实是和 System.Object 一样的类型,编译器假定你在表达式上进行的任何操作都是合法的,所以不会生成任何警告或错误。但如果试图在运行时执行无效的操作,就会抛出异常。此外, Visual Studio 无法提供任何 “智能感知” 支持来帮助你写针对 dynamic 表达式的代码。虽然能定义对 Object 进行扩展方法 (详情参见第 8 章 “方法”),但不能定义对 dynamic 进行扩展的扩展方法。另外,不能将 lambda 表达式或匿名方法 (都在第 17 章 “委托” 中讨论) 作为实参传给 dynamic 方法调用,因为编译器推断不了要使用的类型。

Here is an example of some C# code that uses COM IDispatch to create a Microsoft Excel workbook and places a string in cell A1.

using Microsoft.Office.Interop.Excel;
...
public static void Main() {
 Application excel = new Application();
 excel.Visible = true;
 excel.Workbooks.Add(Type.Missing);
 ((Range)excel.Cells[1, 1]).Value = "Text in cell A1"; // Put this string in cell A1
}

Without the dynamic type, the value returned from excel.Cells[1, 1] is of type Object, which must be cast to the Range type before its Value property can be accessed. However, when producing a runtime callable wrapper assembly for a COM object, any use of VARIANT in the COM method is really converted to dynamic; this is called dynamification. Therefore, because excel.Cells[1, 1] is of type dynamic, you do not have to explicitly cast it to the Range type before its Value property can be accessed. Dynamification can greatly simplify code that interoperates with COM objects. Here is the simpler code.

using Microsoft.Office.Interop.Excel;
...
public static void Main() {
 Application excel = new Application();
 excel.Visible = true;
 excel.Workbooks.Add(Type.Missing);
 excel.Cells[1, 1].Value = "Text in cell A1"; // Put this string in cell A1
}

The following code shows how to use reflection to call a method (“Contains”) on a String target (“Jeffrey Richter”) passing it a String argument (“ff”) and storing the Boolean result in a local variable (result).

Object target = "Jeffrey Richter";
Object arg = "ff";
// Find a method on the target that matches the desired argument types
Type[] argTypes = new Type[] { arg.GetType() };
MethodInfo method = target.GetType().GetMethod("Contains", argTypes);
// Invoke the method on the target passing the desired arguments
Object[] arguments = new Object[] { arg };
Boolean result = Convert.ToBoolean(method.Invoke(target, arguments));

Using C#’s dynamic type, this code can be rewritten with greatly improved syntax.

dynamic target = "Jeffrey Richter";
dynamic arg = "ff";
Boolean result = target.Contains(arg);

Earlier, I mentioned that the C# compiler emits payload code that, at run time, figures out what operation to perform based on the actual type of an object. This payload code uses a class known as a runtime binder. Different programming languages define their own runtime binders that encapsulate the rules of that language. The code for the C# runtime binder is in the Microsoft.CSharp.dll assembly, and you must reference this assembly when you build projects that use the dynamic keyword. This assembly is referenced in the compiler’s default response file, CSC.rsp. It is the code in this assembly that knows to produce code (at run time) that performs addition when the + operator is applied to two Int32 objects and concatenation when applied to two String objects.

At run time, the Microsoft.CSharp.dll assembly will have to load into the AppDomain, which hurts your application’s performance and increases memory consumption. Microsoft.CSharp.dll also loads System.dll and System.Core.dll. If you are using dynamic to help you interoperate with COM components, then System.Dynamic.dll will also load. And when the payload code executes, it generates dynamic code at run time; this code will be in an in-memory assembly called “Anonymously Hosted DynamicMethods Assembly.” The purpose of this code is to improve the performance of dynamic dispatch in scenarios where a particular call site is making many invocations using dynamic arguments that have the same runtime type.

Due to all the overhead associated with C#’s built-in dynamic evaluation feature, you should consciously decide that you are getting sufficient syntax simplification from the dynamic feature to make it worth the extra performance hit of loading all these assemblies and the extra memory that they consume. If you have only a couple places in your program where you need dynamic behavior, it might be more efficient to just do it the old-fashioned way, by calling reflection methods (for managed objects) or with manual casting (for COM objects).

At run time, the C# runtime binder resolves a dynamic operation according to the runtime type of the object. The binder first checks to see if the type implements the IDynamicMetaObjectProvider interface. If the object does implement this interface, then the interface’s GetMetaObject method is called, which returns a DynamicMetaObject-derived type. This type can process all of the member, method, and operator bindings for the object. Both the IDynamicMetaObjectProvider interface and the DynamicMetaObject base class are defined in the System.Dynamic namespace, and both are in the System.Core.dll assembly.

Dynamic languages, such as Python and Ruby, endow their types with DynamicMetaObjectderived types so that they can be accessed in a way appropriate for them when manipulated from other programming languages (like C#). Similarly, when accessing a COM component, the C# runtime binder will use a DynamicMetaObject-derived type that knows how to communicate with a COM component. The COM DynamicMetaObject-derived type is defined in the System.Dynamic.dll assembly.

If the type of the object being used in the dynamic expression does not implement the IDynamicMetaObjectProvider interface, then the C# compiler treats the object like an instance of an ordinary C#-defined type and performs operations on the object using reflection.

One of the limitations of dynamic is that you can only use it to access an object’s instance members because the dynamic variable must refer to an object. But, there are occasions when it would be useful to dynamically invoke static members of a type where the type is determined at run time. To accomplish this, I have created a StaticMemberDynamicWrapper class that derives from System.Dynamic.DynamicObject, which implements the IDynamicMetaObjectProvider interface. The class internally uses quite a bit of reflection (covered in Chapter 23, “Assembly Loading and Reflection”). Here is the code for my StaticMemberDynamicWrapper class.

internal sealed class StaticMemberDynamicWrapper : DynamicObject {
 private readonly TypeInfo m_type;
 public StaticMemberDynamicWrapper(Type type) { m_type = type.GetTypeInfo(); }
 public override IEnumerable<String> GetDynamicMemberNames() {
 return m_type.DeclaredMembers.Select(mi => mi.Name);
 }
 public override Boolean TryGetMember(GetMemberBinder binder, out object result) {
 result = null;
 var field = FindField(binder.Name);
 if (field != null) { result = field.GetValue(null); return true; }
 var prop = FindProperty(binder.Name, true);
 if (prop != null) { result = prop.GetValue(null, null); return true; }
 return false;
 }
 public override Boolean TrySetMember(SetMemberBinder binder, object value) {
 var field = FindField(binder.Name);
 if (field != null) { field.SetValue(null, value); return true; }
 var prop = FindProperty(binder.Name, false);
 if (prop != null) { prop.SetValue(null, value, null); return true; }
 return false;
 }
 public override Boolean TryInvokeMember(InvokeMemberBinder binder, Object[] args, 
 out Object result) {
 MethodInfo method = FindMethod(binder.Name);
 if (method == null) { result = null; return false; }
 result = method.Invoke(null, args);
 return true;
 }
 private MethodInfo FindMethod(String name, Type[] paramTypes) {
 return m_type.DeclaredMethods.FirstOrDefault(mi => mi.IsPublic && mi.IsStatic 
 && mi.Name == name
 && ParametersMatch(mi.GetParameters(), paramTypes));
 }
 private Boolean ParametersMatch(ParameterInfo[] parameters, Type[] paramTypes) {
 if (parameters.Length != paramTypes.Length) return false;
 for (Int32 i = 0; i < parameters.Length; i++)
 if (parameters[i].ParameterType != paramTypes[i]) return false;
 return true;
 }
 private FieldInfo FindField(String name) {
 return m_type.DeclaredFields.FirstOrDefault(fi => fi.IsPublic && fi.IsStatic 
 && fi.Name == name);
 }
 private PropertyInfo FindProperty(String name, Boolean get) {
 if (get)
 return m_type.DeclaredProperties.FirstOrDefault(
 pi => pi.Name == name && pi.GetMethod != null &&
 pi.GetMethod.IsPublic && pi.GetMethod.IsStatic);
 return m_type.DeclaredProperties.FirstOrDefault(
 pi => pi.Name == name && pi.SetMethod != null &&
 pi.SetMethod.IsPublic && pi.SetMethod.IsStatic);
 }
}

To invoke a static member dynamically, construct an instance of this class by passing in the Type you want it to operate on and put the reference in a dynamic variable. Then, invoke the desired static member by using instance member syntax. Here is an example of how to invoke String’s static Concat(String, String) method.

dynamic stringType = new StaticMemberDynamicWrapper(typeof(String));
var r = stringType.Concat("A", "B"); // dynamically invoke String’s static Concat method
Console.WriteLine(r); // Displays "AB"

💡小结:C# 是类型安全的编程语言,这意味着所有表达式都解析成类型的实例,编译器生成的代码只执行对该类型有效的操作。但程序许多时候仍需处理一些运行时才会知晓的信息,例如在使用反射或者和一些不是用 C# 实现的组件进行通信的时候。为了方便开发人员使用反射或者与其他组件通信,C# 编译器允许将表达式的类型标记为 dynamic 。代码使用 dynamic 表达式 / 变量调用成员时,编译器生成特殊 IL 代码来描述所需的操作。这种特殊的代码称为 payload(有效载荷)。在运行时,payload 代码根据 dynamic 表达式 / 变量引用的对象的实际类型来决定具体执行的操作。如果字段、方法参数或方法返回值类型是 dynamic ,编译器会将该类型转换为 System.Object ,并在元数据中向字段、参数或返回类型应用 System.Runtime.CompilerServices.DynamicAttribute 的实例。如果局部变量被指定为 dynamic ,则变量类型也会成为 Object ,但不会向局部变量应用 DynamicAttribute ,因为它限制在方法内部使用。由于 dynamic 其实就是 Object ,所以方法签名不能仅靠 dynamicObject 的变化来区分。泛型类、结构、接口、委托或方法的泛型类型实参也可以是 dynamic 类型。注意,使用的泛型代码是已经编译好的,会将类型视为 Object ;编译器不在泛型代码中生成 payload 代码,所以不会执行动态调度。编译器允许使用隐式转型语法将表达式从 dynamic 转型为其他类型。从 dynamic 转型为其他类型时,虽然编译器允许省略显示转型,但 CLR 会在运行时验证转型来确保类型安全性。C# 之所以能在运行时根据对象实际类型判断要执行什么操作,是因为 payload 代码使用了称为 ** 运行时绑定器(runtime binder)** 的类。不同编程语言定义了不同的运行时绑定器来封装自己的规则。C#“运行时绑定器” 的代码在 Microsoft.CSharp.dll 程序集中,生成使用 dynamic 关键字的项目必须引用该程序集。在运行时, Microsoft.CSharp.dll 程序集必须加载到 AppDomain 中,这会损害应用程序的性能,增大内存消耗。payload 代码执行时,会在运行时生成动态代码,这些代码进入驻留于内存的程序集,即 “匿名寄宿的 DynamicMethods 程序集”(Anonymously Hosted DynamicMethods Assembly),作用是当特定 call site(发出调用的地方,可以理解成调用了一个目标方法的表达式或代码行)使用具有相同类型时类型的动态实参发出大量调用时增强动态调用性能。虽然能用动态功能简化语法,但也要看是否值得。毕竟,加载所有这些程序集以及额外的内存消耗,会对性能造成额外影响。在运行时,C# 的 “运行时绑定器” 根据对象的运行时类型分析应采取什么动态操作。绑定器首先检查类型是否实现了 IDynamicMetaObjectProvider 接口。如果是,就调用接口的 GetMetaObject 方法,它返回 DynamicMetaObject 的一个派生类型。该类型能处理对象的所有成员、方法和操作符绑定。 IDynamicMetaObjectProvider 接口和 DynamicMetaObject 基类都在 System.Dynamic 命名空间中定义,都在 System.Core.dll 程序集中。如果在动态表达式中使用的一个对象的类型未实现 IDynamicMetaObjectProvider 接口,C# 编译器会对对象视为用 C# 定义的普通类型的实例,利用反射在对象上执行操作。