# Chapter 16 Arrays

Arrays are mechanisms that allow you to treat several items as a single collection. The Microsoft .NET common language runtime (CLR) supports single-dimensional arrays, multi-dimensional arrays, and jagged arrays (that is, arrays of arrays). All array types are implicitly derived from the System.Array abstract class, which itself is derived from System.Object . This means that arrays are always reference types that are allocated on the managed heap and that your application’s variable or field contains a reference to the array and not the elements of the array itself. The following code makes this clearer.

Int32[] myIntegers; // Declares a reference to an array 
myIntegers = new Int32[100]; // Creates an array of 100 Int32s

On the first line, myIntegers is a variable that’s capable of pointing to a single-dimensional array of Int32s. Initially, myIntegers will be set to null because I haven’t allocated an array. The second line of code allocates an array of 100 Int32 values; all of the Int32s are initialized to 0. Because arrays are reference types, the memory block required to hold the 100 unboxed Int32s is allocated on the managed heap. Actually, in addition to the array’s elements, the memory block occupied by an array object also contains a type object pointer, a sync block index, and some additional overhead members as well. The address of this array’s memory block is returned and saved in the variable myIntegers.

You can also create arrays of reference types.

Control[] myControls; // Declares a reference to an array 
myControls = new Control[50]; // Creates an array of 50 Control references

On the first line, myControls is a variable capable of pointing to a single-dimensional array of Control references. Initially, myControls will be set to null because I haven’t allocated an array. The second line allocates an array of 50 Control references; all of these references are initialized to null. Because Control is a reference type, creating the array creates only a bunch of references; the actual objects aren’t created at this time. The address of this memory block is returned and saved in the variable myControls.

Figure 16-1 shows how arrays of value types and arrays of reference types look in the managed heap.

image-20221118120658511

In the figure, the Controls array shows the result after the following lines have executed.

myControls[1] = new Button(); 
myControls[2] = new TextBox(); 
myControls[3] = myControls[2]; // Two elements refer to the same object. 
myControls[46] = new DataGrid(); 
myControls[48] = new ComboBox(); 
myControls[49] = new Button();

Common Language Specification (CLS) compliance requires all arrays to be zero-based. This allows a method written in C# to create an array and pass the array’s reference to code written in another language, such as Microsoft Visual Basic .NET. In addition, because zero-based arrays are, by far, the most common arrays, Microsoft has spent a lot of time optimizing their performance. However, the CLR does support non-zero–based arrays even though their use is discouraged. For those of you who don’t care about a slight performance penalty or cross-language portability, I’ll demonstrate how to create and use non-zero–based arrays later in this chapter.

Notice in Figure 16-1 that each array has some additional overhead information associated with it. This information contains the rank of the array (number of dimensions), the lower bounds for each dimension of the array (almost always 0), and the length of each dimension. The overhead also contains the array’s element type. I’ll mention the methods that allow you to query this overhead information later in this chapter.

So far, I’ve shown examples demonstrating how to create single-dimensional arrays. When possible, you should stick with single-dimensional, zero-based arrays, sometimes referred to as SZ arrays, or vectors. Vectors give the best performance because you can use specific Intermediate Language (IL) instructions—such as newarr, ldelem, ldelema, ldlen, and stelem—to manipulate them. However, if you prefer to work with multi-dimensional arrays, you can. Here are some examples of multi-dimensional arrays.

// Create a two-dimensional array of Doubles. 
Double[,] myDoubles = new Double[10, 20]; 
// Create a three-dimensional array of String references. 
String[,,] myStrings = new String[5, 3, 10];

The CLR also supports jagged arrays, which are arrays of arrays. Zero-based, single-dimensional jagged arrays have the same performance as normal vectors. However, accessing the elements of a jagged array means that two or more array accesses must occur. Here are some examples of how to create an array of polygons with each polygon consisting of an array of Point instances.

// Create a single-dimensional array of Point arrays. 
Point[][] myPolygons = new Point[3][]; 
// myPolygons[0] refers to an array of 10 Point instances. 
myPolygons[0] = new Point[10]; 
// myPolygons[1] refers to an array of 20 Point instances. 
myPolygons[1] = new Point[20]; 
// myPolygons[2] refers to an array of 30 Point instances. 
myPolygons[2] = new Point[30]; 
// Display the Points in the first polygon. 
for (Int32 x = 0; x < myPolygons[0].Length; x++) 
 Console.WriteLine(myPolygons[0][x]);

💡注意:CLR 会验证数组索引的有效性。换句话说,不能创建含有 100 个元素的数组 (索引编号 0 到 99),然后试图访问索引为 -5 或 100 的元素。这样做会导致 System.IndexOutOfRangeException 异常。允许访问数组范围之外的内存会破坏类型安全性,而且会造成潜在的安全漏洞,所以 CLR 不允许可验证的代码这么做。通常,索引范围检查对性能的影响微乎其微,因为 JIT 编译器通常只在循环开始之前检查一次数组边界,而不是每次循环迭代都检查 <sup>①</sup>。不过,如果仍然担心 CLR 索引检查造成的性能损失,可以在 C# 中使用 unsafe 代码来访问数组。16.7 节 “数组的内部工作原理” 将演示具体做法。

① 不要混淆 “循环” 和 “循环迭代”。例如以下代码:

Int32[] myArray = new Int32[100];
for (Int32 i = 0; i < myArray.Length; i++) myArray[i] = i;

for 循环” 总共要 “循环迭代 100 次”,有时也简单地说 “迭代 100 次”。

💡小结:所有数组类型都隐式地从 System.Array 抽象类派生,后者又派生自 System.Object 。这意味着数组始终是引用类型,是在托管堆上分配的。实际上,除了数组元素,数组对象占据的内存块还包含一个类型对象指针、一个同步块索引和一些额外的成员。为了符合 “公共语言规范”(Common Language Specification,CLS)的要求,所有数组都必须是 0 基数组(即最小索引为 0)。这样就可以用 C# 的方法创建数组,并将该数组的引用传给其他语言(比如 Microsoft Visual Basic .NET)写的代码。不过 CLR 确实支持非 0 基数组,只是不提倡使用。每个数组都关联了一些额外的开销信息。这些信息包括数组的秩(即 rank,或称数组的维数)、数组每一维的下限(几乎总是 0)和每一维的长度。开销信息还包含数组的元素类型。应尽可能使用一维 0 基数组,有时也将这种数组称为 SZ(single-dimension,zero-based)数组或向量(vector)。向量的性能是最佳的,因为可以使用一些特殊的 IL 指令(比如 newarr,ldelem,ldelema,ldlen 和 stelem)来处理。必要时也可以使用多维数组。CLR 还支持交错数组(jagged array),即数组构成的数组。0 基一维交错数组的性能和普通向量一样好。不过,访问交错数组的元素意味着必须进行两次或更多次数组访问。

# Initializing Array Elements

In the previous section, I showed how to create an array object and then I showed how to initialize the elements of the array. C# offers syntax that allows you to do these two operations in one statement. The following shows an example.

String[] names = new String[] { "Aidan", "Grant" };

The comma-separated set of tokens contained within the braces is called an array initializer. Each token can be an arbitrarily complex expression or, in the case of a multi-dimensional array, a nested array initializer. In the preceding example, I used just two simple String expressions.

If you are declaring a local variable in a method to refer to the initialized array, then you can use C#’s implicitly typed local variable (var) feature to simplify the code a little.

// Using C#’s implicitly typed local variable feature:
var names = new String[] { "Aidan", "Grant" };

Here, the compiler is inferring that the names local variable should be of the String[] type because that is the type of the expression on the right of the assignment operator (=).

You can use C#’s implicitly typed array feature to have the compiler infer the type of the array’s elements. Notice the following line has no type specified between new and [].

// Using C#’s implicitly typed local variable and implicitly typed array features:
var names = new[] { "Aidan", "Grant", null };

In the preceding line, the compiler examines the types of the expressions being used inside the array to initialize the array’s elements, and the compiler chooses the closest base class that all the elements have in common to determine the type of the array. In this example, the compiler sees two Strings and null. Because null is implicitly castable to any reference type (including String), the compiler infers that it should be creating and initializing an array of String references.

If you had this code:

// Using C#’s implicitly typed local variable & implicitly typed array features: (error)
var names = new[] { "Aidan", "Grant", 123 };

the compiler would issue the message error CS0826: No best type found for implicitlytyped array. This is because the base type in common between the two Strings and the Int32 is Object, which would mean that the compiler would have to create an array of Object references and then box the 123 and have the last array element refer to a boxed Int32 with a value of 123. The C# compiler team thinks that boxing array elements is too heavy-handed for the compiler to do for you implicitly, and that is why the compiler issues the error.

As an added syntactical bonus when initializing an array, you can write the following.

String[] names = { "Aidan", "Grant" };

Notice that on the right of the assignment operator (=), only the array initializer expression is given with no new, no type, and no []s. This syntax is nice, but unfortunately, the C# compiler does not allow you to use implicitly typed local variables with this syntax.

// This is a local variable now (error)
var names = { "Aidan", "Grant" };

If you try to compile the preceding line of code, the compiler issues two messages: error CS0820: Cannot initialize an implicitly-typed local variable with an array initializer and error CS0622: Can only use array initializer expressions to assign to array types. Try using a new expression instead. Although the compiler could make this work, the C# team thought that the compiler would be doing too much for you here. It would be inferring the type of the array, new’ing the array, initializing the array, and inferring the type of the local variable, too.

The last thing I’d like to show you is how to use implicitly typed arrays with anonymous types and implicitly typed local variables. Anonymous types and how type identity applies to them are discussed in Chapter 10, “Properties.” Examine the following code.

// Using C#’s implicitly typed local, implicitly typed array, and anonymous type features:
var kids = new[] {new { Name="Aidan" }, new { Name="Grant" }};
// Sample usage (with another implicitly typed local variable):
foreach (var kid in kids)
 Console.WriteLine(kid.Name);

In this example, I am using an array initializer that has two expressions for the array elements. Each expression represents an anonymous type (because no type name is specified after the new operator). Because the two anonymous types have the identical structure (one field called Name of type String), the compiler knows that these two objects are of the exact same type. Now, I use C#’s implicitly typed array feature (no type specified between the new and the []s) so that the compiler will infer the type of the array itself, construct this array object, and initialize its references to the two instances of the one anonymous type.1 Finally, a reference to this array object is assigned to the kids local variable, the type of which is inferred by the compiler due to C#’s implicitly typed local variable feature.

I show the foreach loop as an example of how to use this array that was just created and initialized with the two anonymous type objects. I have to use an implicitly typed local variable (kid) for the loop, too. When I run this code, I get the following output.

Aidan
Grant

💡小结:C# 允许用一个语句创建数组对象并初始化数组中的元素。大括号中的以逗号分隔的数据项称为数组初始化器(array initializer)。每个数据项都可以是一个任意复杂度的表达式;在多维数组的情况下,则可以是一个嵌套的数组初始化器。此外,还可以利用 C# 的 “隐式类型的局部变量” 或者 C# 的隐式类型的数组功能来简化代码。作为初始化数组时的一个额外的语法奖励,赋值操作符(=)右侧只给出了一个初始化器,没有 new,没有类型,没有 []。这个语法可读性很好,不过 C# 编译器不允许在这种语法中使用隐式类型的局部变量,因为 C# 团队认为编译器在这里会为你做太多的工作。它要推断数组类型,新建数组对象,初始化数组,还要推断局部变量的类型。

# Casting Arrays

For arrays with reference type elements, the CLR allows you to implicitly cast the source array’s element type to a target type. For the cast to succeed, both array types must have the same number of dimensions, and an implicit or explicit conversion from the source element type to the target element type must exist. The CLR doesn’t allow the casting of arrays with value type elements to any other type. (However, by using the Array.Copy method, you can create a new array and populate its elements in order to obtain the desired effect.) The following code demonstrates how array casting works.

// Create a two-dimensional FileStream array. 
FileStream[,] fs2dim = new FileStream[5, 10]; 
// Implicit cast to a two-dimensional Object array 
Object[,] o2dim = fs2dim; 
// Can't cast from two-dimensional array to one-dimensional array 
// Compiler error CS0030: Cannot convert type 'object[*,*]' to 
// 'System.IO.Stream[]' 
Stream[] s1dim = (Stream[]) o2dim; 
// Explicit cast to two-dimensional Stream array 
Stream[,] s2dim = (Stream[,]) o2dim; 
// Explicit cast to two-dimensional String array 
// Compiles but throws InvalidCastException at runtime
String[,] st2dim = (String[,]) o2dim; 
// Create a one-dimensional Int32 array (value types). 
Int32[] i1dim = new Int32[5]; 
// Can't cast from array of value types to anything else 
// Compiler error CS0030: Cannot convert type 'int[]' to 'object[]' 
Object[] o1dim = (Object[]) i1dim; 
// Create a new array, then use Array.Copy to coerce each element in the 
// source array to the desired type in the destination array. 
// The following code creates an array of references to boxed Int32s. 
Object[] ob1dim = new Object[i1dim.Length]; 
Array.Copy(i1dim, ob1dim, i1dim.Length);

The Array.Copy method is not just a method that copies elements from one array to another. The Copy method handles overlapping regions of memory correctly, as does C’s memmove function. C’s memcpy function, on the other hand, doesn’t handle overlapping regions correctly. The Copy method can also convert each array element as it is copied if conversion is required. The Copy method is capable of performing the following conversions:

  • Boxing value type elements to reference type elements, such as copying an Int32[] to an Object[].

  • Unboxing reference type elements to value type elements, such as copying an Object[] to an Int32[].

  • Widening CLR primitive value types, such as copying elements from an Int32[] to a Double[].

  • Downcasting elements when copying between array types that can’t be proven to be compatible based on the array’s type, such as when casting from an Object[] to an IFormattable[] . If every object in the Object[] implements IFormattable , Copy will succeed.

Here’s another example showing the usefulness of Copy.

// Define a value type that implements an interface. 
internal struct MyValueType : IComparable { 
 public Int32 CompareTo(Object obj) { 
 ... 
 } 
} 
public static class Program { 
 public static void Main() { 
 // Create an array of 100 value types. 
 MyValueType[] src = new MyValueType[100]; 
 // Create an array of IComparable references. 
 IComparable[] dest = new IComparable[src.Length]; 
 // Initialize an array of IComparable elements to refer to boxed 
 // versions of elements in the source array. 
 Array.Copy(src, dest, src.Length); 
 } 
}

As you might imagine, the Framework Class Library (FCL) takes advantage of Array’s Copy method quite frequently.

In some situations, it is useful to cast an array from one type to another. This kind of functionality is called array covariance. When you take advantage of array covariance, you should be aware of an associated performance penalty. Let’s say you have the following code.

String[] sa = new String[100]; 
Object[] oa = sa; // oa refers to an array of String elements 
oa[5] = "Jeff"; // Perf hit: CLR checks oa's element type for String; OK 
oa[3] = 5; // Perf hit: CLR checks oa's element type for Int32; throws 
 // ArrayTypeMismatchException

In the preceding code, the oa variable is typed as an Object[]; however, it really refers to a String[]. The compiler will allow you to write code that attempts to put a 5 into an array element because 5 is an Int32, which is derived from Object. Of course, the CLR must ensure type safety, and when assigning to an array element, the CLR must ensure that the assignment is legal. So the CLR must check at run time whether the array contains Int32 elements. In this case, it doesn’t, and the assignment cannot be allowed; the CLR will throw an ArrayTypeMismatchException.

💡注意:如果只是需要将数组的某些元素复制到另一个数组,可选择 System.BufferBlockCopy 方法,它比 ArrayCopy 方法快。但 BufferBlockCopy 方法只支持基于类型,不提供像 ArrayCopy 方法那样的转型能力。方法的 Int32 参数代表的是数组中的字节偏移量,而非元素索引。设计 BlockCopy 的目的实际是将按位兼容 (bitwise-compatible)<sup>①</sup > 的数据从一个数组类型复制到另一个按位兼容的数据类型,比如将包含 Unicode 字符的一个 Byte[] (按字节的正确顺序) 复制到一个 Char[] 中,该方法一定程度上弥补了不能将数组当作任意类型的内存块来处理的不足。

要将一个数组的元素可靠地复制到另一个数组,应该使用 System.ArrayConstrainedCopy 方法。该方法要么完成复制,要么抛出异常,总之不会破坏目标数组中的数据。这就允许 ConstrainedCopy 在约束执行区域 (Constrained Execution Region, CER) 中执行。为了提供这种保证, ConstrainedCopy 要求源数组的元素类型要么与目标数组的元素类型相同,要么派生自目标数组的元素类型。另外,它不执行任何装箱、拆箱或向下类型转型。

💡小结:对于元素为引用类型的数组,CLR 允许将数组元素从一种类型转型为另一种。成功转型要求数组维数相同,而且必须存在从元素源类型到目标类型的隐式或显式转换。CLR 不允许将值类型元素的数组转型为其他任何类型。不过,可用 Array.Copy 方法创建新数组并在其中填充元素来模拟这种效果。Copy 方法还能正确处理内存的重叠区域,就像 C 的 memmove 函数一样。Copy 方法还能在复制每个数组元素时进行必要的类型转换。有时确实需要将数组从一种类型转换为另一种类型。这种功能称为数组协变性(array covariance)。但在利用它时要清楚由此而来的性能损失。

# All Arrays Are Implicitly Derived from System.Array

When you declare an array variable like this:

FileStream[] fsArray;

then the CLR automatically creates a FileStream[] type for the AppDomain. This type will be implicitly derived from the System.Array type, and therefore, all of the instance methods and properties defined on the System.Array type will be inherited by the FileStream[] type, allowing these methods and properties to be called using the fsArray variable. This makes working with arrays extremely convenient because there are many helpful instance methods and properties defined by System.Array, such as Clone, CopyTo, GetLength, GetLongLength, GetLowerBound, GetUpperBound, Length, Rank, and others.

The System.Array type also exposes a large number of extremely useful static methods that operate on arrays. These methods all take a reference to an array as a parameter. Some of the useful static methods are AsReadOnly, BinarySearch, Clear, ConstrainedCopy, ConvertAll, Copy, Exists, Find, FindAll, FindIndex, FindLast, FindLastIndex, ForEach, IndexOf, LastIndexOf, Resize, Reverse, Sort, and TrueForAll. There are many overloads for each of these methods. In fact, many of the methods provide generic overloads for compile-time type safety as well as good performance. I encourage you to examine the SDK documentation to get an understanding of how useful and powerful these methods are.

💡小结:声明 XXX 类型数组变量时,CLR 会自动为 AppDomain 创建一个 XXX[] 类型。该类型隐式派生自 System.Array 类型;因此, System.Array 类型定义的所有实例方法和属性都将由 XXX[] 继承。

# All Arrays Implicitly Implement IEnumerable, ICollection, and IList

There are many methods that operate on various collection objects because the methods are declared with parameters such as IEnumerable, ICollection, and IList. It is possible to pass arrays to these methods because System.Array also implements these three interfaces. System.Array implements these non-generic interfaces because they treat all elements as System.Object. However, it would be nice to have System.Array implement the generic equivalent of these interfaces, providing better compile-time type safety as well as better performance.

The CLR team didn’t want System.Array to implement IEnumerable, ICollection, and IList, though, because of issues related to multi-dimensional arrays and non-zero–based arrays. Defining these interfaces on System.Array would have enabled these interfaces for all array types. Instead, the CLR performs a little trick: when a single-dimensional, zero–lower bound array type is created, the CLR automatically makes the array type implement IEnumerable, ICollection, and IList (where T is the array’s element type) and also implements the three interfaces for all of the array type’s base types as long as they are reference types. The following hierarchy diagram helps make this clear.

Object 
 Array (non-generic IEnumerable, ICollection, IList) 
 	Object[] (IEnumerable, ICollection, IList of Object) 
 		String[] (IEnumerable, ICollection, IList of String) 
 		Stream[] (IEnumerable, ICollection, IList of Stream) 
 			FileStream[] (IEnumerable, ICollection, IList of FileStream) 
 		. 
 		.		 (other arrays of reference types) 
		.

So, for example, if you have the following line of code:

FileStream[] fsArray;

then when the CLR creates the FileStream[] type, it will cause this type to automatically implement the IEnumerable, ICollection, and IList interfaces. Furthermore, the FileStream[] type will also implement the interfaces for the base types: IEnumerable, IEnumerable, ICollection, ICollection, IList, and IList. Because all of these interfaces are automatically implemented by the CLR, the fsArray variable could be used wherever any of these interfaces exist. For example, the fsArray variable could be passed to methods that have any of the following prototypes.

void M1(IList<FileStream> fsList) {} 
void M2(ICollection<Stream> sCollection) {} 
void M3(IEnumerable<Object> oEnumerable) {}

Note that if the array contains value type elements, the array type will not implement the interfaces for the element’s base types. For example, if you have the following line of code:

DateTime[] dtArray; // An array of value types

then the DateTime[] type will implement IEnumerable, ICollection, and IList only; it will not implement versions of these interfaces that are generic over System.ValueType or System.Object. This means that the dtArray variable cannot be passed as an argument to the M3 method shown earlier. The reason for this is because arrays of value types are laid out in memory differently than arrays of reference types. Array memory layout was discussed earlier in this chapter.

💡小结: System.Array 实现了 IEnumerableICollectionIList 这些非泛型接口,是因为这些接口将所有元素都视为 System.Object 。要想让 System.Array 实现这些接口的泛型形式,提供更好的编译时类型安全性和更好的性能,CLR 耍了一个小花招:创建一维 0 基数组类型时,CLR 自动使数组类型实现 IEnumerable<T>ICollection<T>IList<T> (T 是数组元素的类型)。同时,还为数组类型的所有基类型实现这三个接口,只要它们是引用类型。注意,如果数组包含值类型的元素,数组类型不会为元素的基类型实现接口。

# Passing and Returning Arrays

When passing an array as an argument to a method, you are really passing a reference to that array. Therefore, the called method is able to modify the elements in the array. If you don’t want to allow this, you must make a copy of the array and pass the copy into the method. Note that the Array.Copy method performs a shallow copy, and therefore, if the array’s elements are reference types, the new array refers to the already existing objects.

Similarly, some methods return a reference to an array. If the method constructs and initializes the array, returning a reference to the array is fine. But if the method wants to return a reference to an internal array maintained by a field, you must decide if you want the method’s caller to have direct access to this array and its elements. If you do, just return the array’s reference. But most often, you won’t want the method’s caller to have such access, so the method should construct a new array and call Array.Copy, returning a reference to the new array. Again, be aware that Array.Copy makes a shallow copy of the original array.

If you define a method that returns a reference to an array, and if that array has no elements in it, your method can return either null or a reference to an array with zero elements in it. When you’re implementing this kind of method, Microsoft strongly recommends that you implement the method by having it return a zero-length array because doing so simplifies the code that a developer calling the method must write. For example, this easy-to-understand code runs correctly even if there are no appointments to iterate over.

// This code is easier to write and understand. 
Appointment[] appointments = GetAppointmentsForToday(); 
for (Int32 a = 0; a < appointments.Length; a++) { 
 ... 
}

The following code also runs correctly if there are no appointments to iterate over. However, this code is slightly more difficult to write and understand.

// This code is harder to write and understand. 
Appointment[] appointments = GetAppointmentsForToday(); 
if (appointments != null) { 
 for (Int32 a = 0, a < appointments.Length; a++) { 
 // Do something with appointments[a] 
 } 
}

If you design your methods to return arrays with zero elements instead of null, callers of your methods will have an easier time working with them. By the way, you should do the same for fields. If your type has a field that’s a reference to an array, you should consider having the field refer to an array even if the array has no elements in it.

💡小结:数组作为实参传给方法时,实际传递的是对该数组的引用。因此,被调用的方法能修改数组中的元素。如果不想被修改,必须生成数组的拷贝并将拷贝传给方法。 Array.Copy 方法执行的是浅拷贝。换言之,如果数组元素是引用类型,新数组将引用现有的对象。如果定义返回数组引用的方法,而且数组中不包含元素,那么方法既可以返回 null,也可以返回对包含零个元素的一个数组的引用。实现这种方法时,Microsoft 强烈建议让它返回后者,因为这样能简化调用该方法时需要写的代码。将方法设计为返回对含有 0 个元素的一个数组的引用,而不是返回 null,该方法的调用者就能更轻松地使用该方法。顺便提一句,对字段也应如此。如果类型中有一个字段是数组引用,应考虑让这个字段始终引用数组,即使数组中不包含任何元素。

# Creating Non-Zero Lower Bound Arrays

Earlier I mentioned that it’s possible to create and work with arrays that have non-zero lower bounds. You can dynamically create your own arrays by calling Array’s static CreateInstance method. Several overloads of this method exist, allowing you to specify the type of the elements in the array, the number of dimensions in the array, the lower bounds of each dimension, and the number of elements in each dimension. CreateInstance allocates memory for the array, saves the parameter information in the overhead portion of the array’s memory block, and returns a reference to the array. If the array has two or more dimensions, you can cast the reference returned from CreateInstance to an ElementType[] variable (where ElementType is some type name), making it easier for you to access the elements in the array. If the array has just one dimension, in C#, you have to use Array’s GetValue and SetValue methods to access the elements of the array.

Here’s some code that demonstrates how to dynamically create a two-dimensional array of System.Decimal values. The first dimension represents calendar years from 2005 to 2009 inclusive, and the second dimension represents quarters from 1 to 4 inclusive. The code iterates over all the elements in the dynamic array. I could have hard-coded the array’s bounds into the code, which would have given better performance, but I decided to use System.Array’s GetLowerBound and GetUpperBound methods to demonstrate their use.

using System; 
public static class DynamicArrays { 
 public static void Main() { 
 // I want a two-dimensional array [2005..2009][1..4]. 
 Int32[] lowerBounds = { 2005, 1 }; 
 Int32[] lengths = { 5, 4 }; 
 Decimal[,] quarterlyRevenue = (Decimal[,]) 
 Array.CreateInstance(typeof(Decimal), lengths, lowerBounds); 
 Console.WriteLine("{0,4} {1,9} {2,9} {3,9} {4,9}", 
 "Year", "Q1", "Q2", "Q3", "Q4"); 
 Int32 firstYear = quarterlyRevenue.GetLowerBound(0); 
 Int32 lastYear = quarterlyRevenue.GetUpperBound(0); 
 Int32 firstQuarter = quarterlyRevenue.GetLowerBound(1); 
 Int32 lastQuarter = quarterlyRevenue.GetUpperBound(1); 
 for (Int32 year = firstYear; year <= lastYear; year++) { 
 Console.Write(year + " "); 
 for (Int32 quarter = firstQuarter; quarter <= lastQuarter; quarter++) { 
 Console.Write("{0,9:C} ", quarterlyRevenue[year, quarter]); 
 } 
 Console.WriteLine(); 
 } 
 } 
}

If you compile and run this code, you get the following output.

Year Q1 Q2 Q3 Q4 
2005 $0.00 $0.00 $0.00 $0.00 
2006 $0.00 $0.00 $0.00 $0.00 
2007 $0.00 $0.00 $0.00 $0.00 
2008 $0.00 $0.00 $0.00 $0.00 
2009 $0.00 $0.00 $0.00 $0.00

💡小结:调用数组的静态 CreateInstance 方法可以动态创建和操作下限非 0 的数组。该方法有若干个重载版本,允许指定数组元素的类型、数组的维数、每一维的下限和每一维的元素数目。 CreateInstance 为数组分配内存,将参数信息保存到数组的内存块的开销(overload)部分,然后返回对该数组的引用。如果数组的维数是 2 或 2 以上,就可以把 CreateInstance 返回的引用转型为一个 ElementType[] 变量( ElementType 要替换为类型名称),以简化对数组中的元素的访问。如果只有一维,C# 要求必须使用该 ArrayGetValueSetValue 方法访问数组元素。

# Array Internals

Internally, the CLR actually supports two different kinds of arrays:

  • Single-dimensional arrays with a lower bound of 0. These arrays are sometimes called SZ (for single-dimensional, zero-based) arrays or vectors.

  • Single-dimensional and multi-dimensional arrays with an unknown lower bound.

You can actually see the different kinds of arrays by executing the following code (the output is shown in the code’s comments).

using System; 
public sealed class Program { 
 public static void Main() { 
 Array a; 
 // Create a 1-dim, 0-based array, with no elements in it 
 a = new String[0]; 
 Console.WriteLine(a.GetType()); // "System.String[]" 
 // Create a 1-dim, 0-based array, with no elements in it 
 a = Array.CreateInstance(typeof(String), 
 new Int32[] { 0 }, new Int32[] { 0 }); 
 Console.WriteLine(a.GetType()); // "System.String[]" 
 // Create a 1-dim, 1-based array, with no elements in it 
 a = Array.CreateInstance(typeof(String), 
 new Int32[] { 0 }, new Int32[] { 1 }); 
 Console.WriteLine(a.GetType()); // "System.String[*]" <-- INTERESTING! 
 Console.WriteLine(); 
 // Create a 2-dim, 0-based array, with no elements in it 
 a = new String[0, 0]; 
 Console.WriteLine(a.GetType()); // "System.String[,]" 
 // Create a 2-dim, 0-based array, with no elements in it 
 a = Array.CreateInstance(typeof(String), 
 new Int32[] { 0, 0 }, new Int32[] { 0, 0 }); 
 Console.WriteLine(a.GetType()); // "System.String[,]" 
 // Create a 2-dim, 1-based array, with no elements in it 
 a = Array.CreateInstance(typeof(String), 
 new Int32[] { 0, 0 }, new Int32[] { 1, 1 }); 
 Console.WriteLine(a.GetType()); // "System.String[,]" 
 } 
}

Next to each Console.WriteLine is a comment that indicates the output. For the singledimensional arrays, the zero-based arrays display a type name of System.String[], whereas the 1-based array displays a type name of System.String[]. The * indicates that the CLR knows that this array is not zero-based. Note that C# does not allow you to declare a variable of type String[], and therefore it is not possible to use C# syntax to access a single-dimensional, non-zero–based array. Although you can call Array’s GetValue and SetValue methods to access the elements of the array, this access will be slow due to the overhead of the method call.

For multi-dimensional arrays, the zero-based and 1-based arrays all display the same type name: System.String[,]. The CLR treats all multi-dimensional arrays as though they are not zero-based at run time. This would make you think that the type name should display as System.String[,]; however, the CLR doesn’t use the *s for multi-dimensional arrays because they would always be present, and the asterisks would just confuse most developers.

Accessing the elements of a single-dimensional, zero-based array is slightly faster than accessing the elements of a non-zero–based, single-dimensional array or a multi-dimensional array. There are several reasons for this. First, there are specific IL instructions—such as newarr, ldelem, ldelema, ldlen, and stelem—to manipulate single-dimensional, zero-based arrays, and these special IL instructions cause the JIT compiler to emit optimized code. For example, the JIT compiler will emit code that assumes that the array is zero-based, and this means that an offset doesn’t have to be subtracted from the specified index when accessing an element. Second, in common situations, the JIT compiler is able to hoist the index range–checking code out of the loop, causing it to execute just once. For example, look at the following commonly written code.

using System; 
public static class Program { 
 public static void Main() { 
 Int32[] a = new Int32[5]; 
 for(Int32 index = 0; index < a.Length; index++) { 
 // Do something with a[index] 
 } 
 } 
}

The first thing to notice about this code is the call to the array’s Length property in the for loop’s test expression. Because Length is a property, querying the length actually represents a method call. However, the JIT compiler knows that Length is a property on the Array class, and the JIT compiler will actually generate code that calls the property just once and stores the result in a temporary variable that will be checked with each iteration of the loop. The result is that the JITted code is fast. In fact, some developers have underestimated the abilities of the JIT compiler and have tried to write “clever code” in an attempt to help the JIT compiler. However, any clever attempts that you come up with will almost certainly impact performance negatively and make your code harder to read, reducing its maintainability. You are better off leaving the call to the array’s Length property in the preceding code instead of attempting to cache it in a local variable yourself.

The second thing to notice about the preceding code is that the JIT compiler knows that the for loop is accessing array elements 0 through Length - 1. So the JIT compiler produces code that, at run time, tests that all array accesses will be within the array’s valid range. Specifically, the JIT compiler produces code to check if (0 >= a.GetLowerBound(0)) && ((Length – 1) <= a.GetUpperBound(0)). This check occurs just before the loop. If the check is good, the JIT compiler will not generate code inside the loop to verify that each array access is within the valid range. This allows array access within the loop to be very fast.

Unfortunately, as I alluded to earlier in this chapter, accessing elements of a non-zero–based single-dimensional array or of a multi-dimensional array is much slower than a single-dimensional, zerobased array. For these array types, the JIT compiler doesn’t hoist index checking outside of loops, so each array access validates the specified indexes. In addition, the JIT compiler adds code to subtract the array’s lower bounds from the specified index, which also slows the code down, even if you’re using a multi-dimensional array that happens to be zero-based. So if performance is a concern to you, you might want to consider using an array of arrays (a jagged array) instead of a rectangular array.

C# and the CLR also allow you to access an array by using unsafe (non-verifiable) code, which is, in effect, a technique that allows you to turn off the index bounds checking when accessing an array. Note that this unsafe array manipulation technique is usable with arrays whose elements are SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Char, Single, Double, Decimal, Boolean, an enumerated type, or a value type structure whose fields are any of the aforementioned types.

This is a very powerful feature that should be used with extreme caution because it allows you to perform direct memory accesses. If these memory accesses are outside the bounds of the array, an exception will not be thrown; instead, you will be corrupting memory, violating type safety, and possibly opening a security hole! For this reason, the assembly containing the unsafe code must either be granted full trust or at least have the Security Permission with Skip Verification turned on.

This is a very powerful feature that should be used with extreme caution because it allows you to perform direct memory accesses. If these memory accesses are outside the bounds of the array, an exception will not be thrown; instead, you will be corrupting memory, violating type safety, and possibly opening a security hole! For this reason, the assembly containing the unsafe code must either be granted full trust or at least have the Security Permission with Skip Verification turned on.

The following C# code demonstrates three techniques (safe, jagged, and unsafe), for accessing a two-dimensional array.

using System;
using System.Diagnostics;
public static class Program {
 private const Int32 c_numElements = 10000;
 public static void Main() {
 // Declare a two-dimensional array
 Int32[,] a2Dim = new Int32[c_numElements, c_numElements];
 // Declare a two-dimensional array as a jagged array (a vector of vectors)
 Int32[][] aJagged = new Int32[c_numElements][];
 for (Int32 x = 0; x < c_numElements; x++)
 aJagged[x] = new Int32[c_numElements];
 // 1: Access all elements of the array using the usual, safe technique
 Safe2DimArrayAccess(a2Dim);
 // 2: Access all elements of the array using the jagged array technique
 SafeJaggedArrayAccess(aJagged);
 // 3: Access all elements of the array using the unsafe technique
 Unsafe2DimArrayAccess(a2Dim);
 }
 private static Int32 Safe2DimArrayAccess(Int32[,] a) {
 Int32 sum = 0;
 for (Int32 x = 0; x < c_numElements; x++) {
 for (Int32 y = 0; y < c_numElements; y++) {
 sum += a[x, y];
 }
 }
 return sum;
 }
 private static Int32 SafeJaggedArrayAccess(Int32[][] a) {
 Int32 sum = 0;
 for (Int32 x = 0; x < c_numElements; x++) {
 for (Int32 y = 0; y < c_numElements; y++) {
 sum += a[x][y];
 }
 }
 return sum;
 }
 private static unsafe Int32 Unsafe2DimArrayAccess(Int32[,] a) {
 Int32 sum = 0;
 fixed (Int32* pi = a) {
 for (Int32 x = 0; x < c_numElements; x++) {
 Int32 baseOfDim = x * c_numElements;
 for (Int32 y = 0; y < c_numElements; y++) {
 sum += pi[baseOfDim + y];
 }
 }
 }
 return sum;
 }
}

The Unsafe2DimArrayAccess method is marked with the unsafe modifier, which is required to use C#’s fixed statement. To compile this code, you’ll have to specify the /unsafe switch when invoking the C# compiler or select the Allow Unsafe Code check box on the Build tab of the Project Properties pane in Microsoft Visual Studio.

Obviously, the unsafe technique has a time and place when it can best be used by your own code, but beware that there are three serious downsides to using this technique:

  • The code that manipulates the array elements is more complicated to read and write than that which manipulates the elements using the other techniques because you are using C#’s fixed statement and performing memory-address calculations.

  • If you make a mistake in the calculation, you are accessing memory that is not part of the array. This can result in an incorrect calculation, corruption of memory, a type-safety violation, and a potential security hole.

  • Due to the potential problems, the CLR forbids unsafe code from running in reduced-security environments (like Microsoft Silverlight).

💡小结:CLR 内部支持两种不同的数组,一种是 SZ 数组,另一种是下限未知的一维或多维数组。CLR 使用 * 符号表示知道该数组不是一维 0 基数组。C# 不允许声明 String[*] 类型的变量,因此不能使用 C# 语法来访问一维非 0 基数组。尽管可以调用 ArrayGetValueSetValue 方法来访问这种数组的元素,但速度会比较慢,因为有方法调用的开销。对于多维数组,0 基和 1 基数组会显式同样的类型名称: System.String[,] 。在运行时,CLR 将所有多维数组都视为非 0 基数组。对于多维数组,CLR 决定不使用 * 符号。访问一维 0 基数组的元素比访问非 0 基一维或多维数组的元素稍快。这是多方面的原因造成的。首先,有一些特殊 IL 指令,比如 newarrledlemldelemaldlenstelem ,用于处理一维 0 基数组,这些特殊 IL 指令会导致 JIT 编译器生成优化代码。其次,一般情况下,JIT 编译器能将索引范围检查代码从循环中拿出,导致它只执行一次。对于非 0 基一维数组而言,JIT 编译器不会将索引检查从循环中拿出来,所以每次数组访问都要验证指定的索引。此外,JIT 编译器还要添加代码从指定索引中减去数组下限,这进一步影响了代码执行速度,即使此时使用的多维数组碰巧是 0 基数组。所以,如果很关心性能,考虑用由数组构成的数组(基骄傲错数组)代替矩形数组。C# 和 CLR 还允许使用 unsafe(不可验证)代码访问数组。这种技术实际在访问数组时关闭索引上下限检查。这种功能很强大,但使用须谨慎,因为它允许直接内存访问。访问越界(超出数组上下限)不会抛出异常,但会损坏内存中的数据,破坏类型安全性,并可能造成安全漏洞,有鉴于此,包含 unsafe 代码的程序集必须被赋予完全信任,或至少启用 “跳过验证” 安全权限。此外,unsafe 修饰符常与 fixed 关键字搭配使用。

# Unsafe Array Access and Fixed-Size Array

Unsafe array access is very powerful because it allows you to access:

  • Elements within a managed array object that resides on the heap (as the previous section demonstrated).

  • Elements within an array that resides on an unmanaged heap. The SecureString example in Chapter 14, “Chars, Strings, and Working with Text,” demonstrated using unsafe array access on an array returned from calling the System.Runtime.InteropServices.Marshal class’s SecureStringToCoTaskMemUnicode method.

  • Elements within an array that resides on the thread’s stack.

In cases in which performance is extremely critical, you could avoid allocating a managed array object on the heap and instead allocate the array on the thread’s stack by using C#’s stackalloc statement (which works a lot like C’s alloca function). The stackalloc statement can be used to create a single-dimensional, zero-based array of value type elements only, and the value type must not contain any reference type fields. Really, you should think of this as allocating a block of memory that you can manipulate by using unsafe pointers, and therefore, you cannot pass the address of this memory buffer to the vast majority of FCL methods. Of course, the stack-allocated memory (array) will automatically be freed when the method returns; this is where we get the performance improvement. Using this feature also requires you to specify the /unsafe switch to the C# compiler.

The StackallocDemo method in the following code shows an example of how to use C#’s stackalloc statement.

using System; 
public static class Program { 
 public static void Main() { 
 StackallocDemo(); 
 InlineArrayDemo(); 
 } 
 private static void StackallocDemo() { 
 unsafe { 
 const Int32 width = 20; 
 Char* pc = stackalloc Char[width]; // Allocates array on stack 
 
 String s = "Jeffrey Richter"; // 15 characters 
 
 for (Int32 index = 0; index < width; index++) { 
 pc[width - index - 1] = 
 (index < s.Length) ? s[index] : '.'; 
 } 
 // The following line displays ".....rethciR yerffeJ" 
 Console.WriteLine(new String(pc, 0, width)); 
 } 
 } 
 private static void InlineArrayDemo() { 
 unsafe { 
 CharArray ca; // Allocates array on stack 
 Int32 widthInBytes = sizeof(CharArray); 
 Int32 width = widthInBytes / 2; 
 String s = "Jeffrey Richter"; // 15 characters 
 for (Int32 index = 0; index < width; index++) { 
 ca.Characters[width - index - 1] = 
 (index < s.Length) ? s[index] : '.'; 
 } 
 // The following line displays ".....rethciR yerffeJ" 
 Console.WriteLine(new String(ca.Characters, 0, width)); 
 } 
 } 
} 
internal unsafe struct CharArray { 
 // This array is embedded inline inside the structure 
 public fixed Char Characters[20]; 
}

Normally, because arrays are reference types, an array field defined in a structure is really just a pointer or reference to an array; the array itself lives outside of the structure’s memory. However, it is possible to embed an array directly inside a structure as shown by the CharArray structure in the preceding code. To embed an array directly inside a structure, there are several requirements:

  • The type must be a structure (value type); you cannot embed an array inside a class (reference type).

  • The field or its defining structure must be marked with the unsafe keyword.

  • The array field must be marked with the fixed keyword.

  • The array must be single-dimensional and zero-based.

  • The array’s element type must be one of the following types: Boolean, Char, SByte, Byte, Int16, UInt16, Int32, UInt32, Int64, UInt64, Single, or Double.

Inline arrays are typically used for scenarios that involve interoperating with unmanaged code where the unmanaged data structure also has an inline array. However, inline arrays can be used in other scenarios as well. The InlineArrayDemo method in the code shown earlier offers an example of how to use an inline array. The InlineArrayDemo method performs the same function as the StackallocDemo method; it just does it in a different way.

💡小结:不安全的数组访问非常强大,它不仅可以访问堆上的托管数组对象中的元素,还能访问非托管堆上的数组中的元素以及线程栈上的数组中的元素。非托管堆上的数组可以参考安全字符串的例子,而在线程栈上分配数组是通过 C# 的 stackalloc 语句来完成的(它很大程度上类似于 C 的 alloca 函数)。 stackalloc 语句只能创建一维 0 基、由值元素构成的数组,而且值类型绝对不能包含任何引用类型的字段。实际上,应该把它的作用看成是分配一个内存块,这个内存块可以使用不安全的指针来操纵。所以,不能将这个内存缓冲区的地址传给大部分 FCL 方法。通常,由于数组是引用类型,所以结构中定义的数组字段实际只是指向数组的指针或引用;数组本身在结构的内存的外部。不过,也可以向上述例子中那样,直接将数组嵌入结构。在结构中嵌入数组需满足几个条件:1. 类型必须是结构(值类型);不能在类(引用类型)中嵌入数组。2. 字段或其定义结构必须用 unsafe 关键字标记。3. 数组字段必须用 fixed 关键字标记。4. 数组必须是一维 0 基数组。5. 数组的元素类型必须是以下类型之一: BooleanCharSByteByteInt16Int32UInt16UInt32Int64SingleDouble 。要和非托管代码进行互操作,而且非托管数据结构也有一个内联数组,就特别适合使用内联的数组。