Week 9 - Semantic Part 3 - Expression
值、类型与对象¶
Value, Type and object.
什么是值(Value)?你可能会这么理解:当我们访问一个变量(object)时,从它的内存中读取出来的数值。
更为精确的说:Value is the precise meaning of the contents of an object when interpreted as having a specific type.
这句话应该被这样解读:值是指,对一个对象的内容,在一个特定类型下進行解读,所得到的精确含义。
-
这里的对象(object)并不是指面向对象编程中的对象,而是指 一块内存区域,而 对象的内容 指的是这块内存区域中存储的原始二进制数据。
-
怎么理解 "when interpreted as having a specific type":
-
对象的内容可以按照不同方式(即类型)进行解读,例如:
-
'A'和 65 的二进制码都是'01000001',它们都是对同一块内存内容在不同类型下的解读。
-
Expression 表达式¶
我们都知道可以对表达式进行求值(evaluate),那么求值的时候具体会发生什么?
我们可以参照 C 语言标准中的 Expression 的定义:
表达式的作用¶
(1) An expression is a sequence of operators and operands that specifies computation of a value, or that designates an object or a function, or that generates side effects, or that performs a combination thereof. The value computations of the operands of an operator are sequenced before the value computation of the result of the operator.
表达式 (expression) 是一个操作符 (operators) 和操作数 (operands) 的序列,它用于:
-
指定一个值的计算(例如
2 + 3), -
或指定一个对象(例如
x,它指向一块内存)或函数(例如main), -
或产生副作用 (side effects)(例如
x++,它使 x 的值发生了变化), -
或执行以上任意的组合(例如
x = y + 5)。
(第二句)一个操作符的操作数的值计算,按顺序先于 (sequenced before) 该操作符结果的值计算。
什么是 side effects
这是一个非常核心的概念,它表示着程序状态的改变 (which are changes in the state of the execution environment)。
那么,“程序的状态”该怎么理解:你可以理解为所有变量的值。
如果一个表达式没有产生 side effects,那么它就对程序的状态没有修改,那我们可不可以把它删了?
第二句话解决了先有鸡还是先有蛋的问题:例如,a + b:你必须先得到 a 的值和 b 的值,然后才能计算它们的和(a+b)。即:若 (expr) A sequenced-before B,完成 A 的计算一定要先于开始 B 的计算。
虽然这句话看着很 naive,但是它首先定义了表达式的计算顺序(sequenced-before),它是后续我们描述其他计算顺序规则的基础。
Sequenced Before 的精确定义
Sequenced beforeis an asymmetric, transitive, pair-wise relation between evaluations executed by a single thread, which induces a partial order among those evaluations. Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B. (Conversely, if A is sequenced before B, then B is sequenced after A.)If A is not sequenced before or after B, then A and B are
unsequenced.Evaluations A and B are
indeterminately sequencedwhen A is sequenced either before or after B, but it is unspecified which. (The executions of unsequenced evaluations can interleave. Indeterminately sequenced evaluations cannot interleave, but can be executed in any order.)The presence of a
sequence pointbetween the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.
如果你深入了解过并发编程与内存序(Memory Ordering),或者分布式系统,你可能也会听过类似的表达:happens-before.
计算顺序¶
(2) The grouping of operators and operands is indicated by the syntax. Except as specified later, side effects and value computations of subexpressions are unsequenced.
操作符与操作数的 组合 (grouping) 是由语法(例如括号和优先级)决定的。除非规范另有规定,子表达式的副作用和值计算默认是 未排序的 (unsequenced)。
这句话区分了两个完全不同的概念:组合顺序和值计算顺序。
-
组合 (Grouping) - 决定表达式的结构
组合顺序(grouping)是由语法决定的,例如
a + b * c中,语法规定了*的优先级高于+,所以这个表达式的结合顺序是a + (b * c)。但是! 它并没有规定先算
a还是先算(b * c),它只规定了a + (b * c)要后于前两者的计算。 -
排序 (Sequencing) - 决定“计算的顺序”
子表达式的计算默认是未排序的。在上述例子中,
a和(b * c)是a + (b * c)的子表达式,这句规则规定了它们俩的计算顺序是 未排序的 (unsequenced)。除非另有说明:这指的是那些有明确排序规则的操作符,它们引入了“顺序点”(Sequence Point):
最经典的例子就是
&&和||。对于&&,第一个操作数的计算先于第二个操作数的计算;若第一个操作符等于 0,第二个操作符不会被计算。这也是我们熟知的 Short circuiting(短路)。the && operator guarantees left-to-right evaluation; if the second operand is evaluated, there is a
sequence pointbetween the evaluations of the first and second operands. If the first operand compares equal to 0, the second operand is not evaluated.
计算顺序、副作用与 UB¶
(3) If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.
如果对一个标量(Arithmetic types and pointer types,如 int, char, 指针)对象的副作用,与以下两者之一是“未排序的”(unsequenced):
-
对 同一个 标量对象的 另一个副作用
-
使用 同一个 标量对象的值 值计算
则行为是未定义的 (Undefined Behavior, UB)。
(第二句)如果表达式的子表达式存在多种可能的求值顺序,只要在任何一种可能的顺序中出现了这种“未排序的”副作用,行为就是未定义的。
这是 C 语言经典的坑,换句话说:“在两个顺序点之间,不要对同一个变量做两次修改,也不要在一个表达式中同时修改和读取它(当读写顺序不确定时)。”
"未排序的"(unsequenced):编译器不保证哪一个先发生。例如 f(a, b),a 和 b 的求值就是 unsequenced,编译器可能先算 a,也可能先算 b。
导致未定义行为 (UB) 的经典例子:
-
“副作用” vs “副作用” (规则1)
i = i++;这两个副作用对同一个对象 i 操作(递增和赋值),并且它们之间是 "unsequenced" 的。 -
“副作用” vs “值计算” (规则2)
x[i] = i++;i 上的副作用:i++(修改 i)。i 上的值计算:x[i](读取 i 来确定数组索引)。Assignment operator 并没有规定前后两个operands谁先计算(The evaluations of the operands are unsequenced.)。 -
函数调用中的例子
printf("%d %d\n", i++, i);函数参数的求值顺序是 unsequenced 的。
Undefined Behavior¶
什么是 Undefined Behavior?
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements.
当程序使用了不可移植的、错误的构造或错误的数据时, 本标准(C 语言规范)对此类行为不施加任何要求。
核心在于后半句:当你的代码触发了 UB,C 语言标准就不再保证会发生什么事情。标准既不规定程序必须崩溃,也不规定它必须给出错误答案,更不规定它必须做任何特定的事情。
“不施加任何要求”意味着,任何事情都可能发生。当 UB 发生时,以下任何一种情况都是符合 C 语言标准的“正确”行为:
-
程序崩溃:(例如段错误 Segmentation Fault)。这是最好的情况,因为它立刻告诉你有问题。
-
程序算出错误结果:(例如 2 + 2 得到 5)。
-
程序看似正常工作:这是最坏的情况。它可能今天在你的电脑上正常工作,但明天你改了一行毫不相关的代码、换了个编译器、或开了个优化选项(如 -O2),它就突然崩溃了。
-
生成了格式化你的硬盘的代码:这是真的。
我盘怎么没了
signed int overflow 是经典 UB,在开启 -O2 的情况下,llvm 可能会把函数的整个 epilogue 搞没了。
(unsigned int overflow 是良定义行为,它规定了会发生 warp)
左值 lvalue¶
我们之前说:Expression 的一个作用是指定(designates)一个对象或函数,这种 Expression 非常重要和常见,以至于我们给了它一个专属的名字:lvalue。
什么是 lvalue¶
完整段落
An lvalue is an expression (with an object type other than void) that potentially designates an object; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.
When an object is said to have a particular type, the type is specified by the lvalue used to designate the object.
A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a const-qualified type, and if it is a structure or union, does not have any member (including, recursively, any member or element of all contained aggregates or unions) with a const-qualified type.
"An lvalue is an expression (with an object type other than void) that potentially designates an object... A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a const-qualified type, and if it is a structure or union, does not have any member ... with a const-qualified type."
-
An lvalue is an expression...that potentially designates an object- 核心含义: “lvalue” (左值) 是指一个**指向内存中某个对象(一块数据)的表达式**。
- 你可以把它想象成一个“位置”或“地址”。"lvalue" 最早来源于 "left-hand side of an assignment"(赋值操作的左侧),因为你只能给一个“位置”赋值。
- 示例:
int x; x = 10;在这里x是一个 lvalue,因为它“指定”(designates) 了一个存储10这个值的对象。而10本身不是 lvalue(它是一个 "rvalue" 或"右值"),因为它只是一个值,没有对应的内存位置(你不能写10 = x;)。 - 未定义行为: 规范说 "if an lvalue does not designate an object... the behavior is undefined." 这指的是比如解引用一个
NULL指针:int *p = NULL; *p = 5;。*p是一个 lvalue 表达式,但它不指向任何有效对象,所以这是未定义行为 (UB)。
-
A modifiable lvalue is an lvalue that...- 核心含义: “可修改的左值” (modifiable lvalue) 是指你可以**合法地修改它所指向的对象**的 lvalue。这是可以出现在赋值操作符 (
=,+=等) 左侧的表达式。 - 哪些 不是 可修改的左值:
- 数组类型 (array type): 你不能给一个数组名赋值。
int a[10]; int b[10]; a = b;是非法的。 - 不完整类型 (incomplete type): 如果一个类型只声明了但未定义(比如
struct Foo;),你不能修改它。struct Foo *f; f->member = 10;(如果struct Foo还没定义)是非法的。 - const 限定类型 (const-qualified type): 和 包含 const 成员的结构体/联合体:
- 数组类型 (array type): 你不能给一个数组名赋值。
- 核心含义: “可修改的左值” (modifiable lvalue) 是指你可以**合法地修改它所指向的对象**的 lvalue。这是可以出现在赋值操作符 (
左值转换 (Lvalue Conversion)¶
完整段落
Except when it is the operand of the sizeof operator, the unary & operator, the++ operator, the-- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called lvalue conversion. If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; additionally, if the lvalue has atomic type, the value has the non-atomic version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
"Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue); this is called
lvalue conversion."
这是 C 语言中最重要和最频繁发生的隐式转换之一,通常被称为“取值”(value-of) 或 lvalue conversion,即 Memory Load 语义。
- 核心含义: 当一个 lvalue(位置)出现在几乎所有需要一个**值**的上下文中时,它会**自动转换为存储在该位置的那个值**。转换后,它就不再是 lvalue(位置)了,而是一个 rvalue(值)。
- 示例:
int x = 10; int y = x;- 在
y = x;中: y是赋值操作的*左操作数*,它是一个 lvalue(位置),所以它**不会**发生 lvalue 转换。x是赋值操作的*右操作数*。它处于一个需要“值”的上下文中。因此,lvaluex(x的内存位置)被自动转换("lvalue conversion")为它存储的**值**,即10。- 最后,这个值
10被赋给y所在的 lvalue(位置)。
- 在
额外地,规范明确指出了 lvalue 保持其 lvalue 特性(即保持为“位置”)的几种情况:
sizeof运算符:sizeof(x)。sizeof关心的是x(lvalue)的**类型**所占的大小,而不是x存的那个**值**的大小。&(取地址) 运算符:&x。这个操作符的目的就是为了获取 lvalue(位置)的地址,所以它显然不能先把 lvalue 转换成值。++和--运算符:++x或x++。这些运算符需要修改x所在位置 的值,所以它们必须操作在 lvalue(位置)上。.(成员访问) 运算符的左侧:my_struct.member。.运算符需要my_struct这个 lvalue(结构体的位置)才能计算出其成员member的 lvalue(成员的位置)。- 赋值运算符的左侧:
x = ...。赋值操作的目的就是把一个值存到一个 lvalue(位置)里。
Value Category¶
上述内容来自于 C 语言的规范,我们可以来看点实践上更好用的东西:
See also: https://en.cppreference.com/w/c/language/value_category.html
-
Each expression in C (an operator with its arguments, a function call, a constant, a variable name, etc) is characterized by two independent properties: a type and a value category.
-
Every expression belongs to one of three value categories: lvalue, non-lvalue object (rvalue), and function designator.
以下内容经过删减,在 Splc 中不包含的内容已经被移除。
Lvalue expressions¶
左值表达式 (lvalue expression) 是任何具有对象类型的表达式,它可用于指定一个对象(如果一个左值在求值时并未实际指定一个对象,则其行为是未定义的)。换句话说,左值表达式求值的结果是该**对象的标识 (object identity)**。
这个值类别(“左值”)的名称具有历史渊源,它源自于左值表达式在 CPL 编程语言中被用作**赋值操作符的左操作数**。
Lvalue expression is any expression with object type, which potentially designates an object (the behavior is undefined if an lvalue does not actually designate an object when it is evaluated). In other words, lvalue expression evaluates to the object identity.
The name of this value category (“left value”) is historic and reflects the use of lvalue expressions as the left-hand operand of the assignment operator in the CPL programming language.
Lvalue expressions can be used in the following lvalue contexts:
- as the operand of the address-of (
&) operator. - as the operand of the pre/post increment and decrement operators.
- as the left-hand operand of the member access (dot) operator.
- as the left-hand operand of the assignment operators.
If an lvalue expression is used in any context other than the operators listed above, non-array lvalues of any complete type undergo lvalue conversion, which models the memory load of the value of the object from its location.
The following expressions are lvalues:
- identifiers, including function named parameters, provided they were declared as designating objects (not functions or enumeration constants)
- parenthesized expression if the unparenthesized expression is an lvalue
- the result of a member access (dot) operator if its left-hand argument is lvalue
- the result of a member access through pointer
->operator - the result of the indirection (unary
*) operator applied to a pointer to object - the result of the subscription operator (
[])
额外地,由于 Splc 并没有 const 关键字,故绝大多是 lvalue 均是 modifiable lvalue(A modifiable lvalue is any lvalue expression of complete, non-array type which is not const-qualified)。
Non-lvalue object expressions¶
非左值的对象表达式被称为**右值 (rvalues)。这类对象类型的表达式并不指定一个对象,而是代表一个**没有对象标识或存储位置的值。右值表达式是不能被取地址 (&, address-of operator) 的。
Known as rvalues, non-lvalue object expressions are the expressions of object types that do not designate objects, but rather values that have no object identity or storage location. The address of a non-lvalue object expression cannot be taken.
The following expressions are non-lvalue object expressions:
- integer, character constants
- all operators not specified to return lvalues, including
- any function call expression
- member access operator (dot) applied to a non-lvalue structure/union,
f().x,(s1 = s2).m - results of all arithmetic, relational, logical, and bitwise operators
- results of increment and decrement operators
- results of assignment operators
- the address-of operator, even if neutralized by application to the result of unary * operator