Defining A Typesystem For Your Language
What is a typesystem?
A typesystem is a part of a language definition used to assign types to the nodes in the models written in your language. The typesystem language is also used to check certain constraints on nodes and their types. Information about types of nodes is useful for:
- finding type errors
- checking conditions on nodes' types during generation to apply only appropriate generator rules
- providing information required for certain refactorings (e.g. for the "extract variable" refactoring)
- and more
Any MPS node may serve as a type. To enable MPS to assign types to nodes of your language, you should create a language aspect for typesystem. The typesystem model for your language will be written in the typesystem language.
The main concept of the typesystem language is an inference rule. An inference rule for a certain concept is mainly responsible for computing a type for instances of that concept.
An inference rule consists of a condition and a body. A condition is used to determine whether a rule is applicable to a certain node. A condition may be of two kinds: a concept reference or a pattern. A rule with a condition in the form of concept reference is applicable to every instance of that concept and its subconcepts. A rule with a pattern is applicable to nodes that match the pattern. A node matches a pattern if it has the same properties and references as the pattern, and if its children match the pattern's children. A pattern also may contain several variables which match everything.
The body of an inference rule is a list of statements which are executed when a rule is applied to a node. The main kind of statements of typesystem language are statements used for creating equations and inequations between types.
To avoid duplications, one may want to extract identical parts of code of several inference rules to a method. An inference method is just a simple Base Language method marked with an annotation "
@InferenceMethod". There are several language constructions you may use only inside inference rules and replacement rules and inference methods, they are: "typeof" expressions, equations and inequations, "when concrete" statements, type variable declarations and type variable references, and invocations of inference methods. That is made for not to use such constructions in arbitrary methods which may be called in arbitrary context, maybe not during type checking.
Equations And Inequations
The main process which is performed by typesystem engine is the process of solving equations and inequations between types. A language designer tells the engine which equations it should solve by writing them in inference rules. To add an equation into an engine, the following statement is used:
expr1 :==: expr2, where
expr2 are expressions which evaluate to a node.
Consider the following use case. You want to say that the type of a local variable reference is equal to a type of variable declaration. So, you write
typeof (varRef) :==: typeof (varRef.localVariableDeclaration), and that's all. The typesystem engine will solve such equations automatically.
The above-mentioned expression
expr must evaluate to an MPS node) is a language construction which returns a so-called type variable which serves as a type of that node. Type variables become concrete types during the process of equation solving.
In certain situations you want to say that a certain type doesn't have to exactly equal another type, but also may be a subtype or a supertype of that type. For instance, the type of the actual parameter of a method call does not necessarily have to be the same type as that of the method's formal parameter - it can be its subtype. That is, a method which requires Object as a parameter may be applied to a String.
To express such a constraint, you may use an inequation instead of an equation. An inequation expresses the fact that a certain type should be a subtype of another type. It is written as follows:
expr1 :<=: expr2.
Weak And Strong Subtyping
A relationship of subtyping is useful for several different cases. You want a type of an actual parameter to be a subtype of formal parameter type, or you want a type of an assigned value to be a subtype of variable's declared type; in method calls or field access operations you want a type of an operand to be a subtype of a method's declaring class.
Sometimes such demands are somewhat controversial: consider, for instance, two types, int and Integer, which you want to be interchangeable when you pass parameters of such types to a method: if a method is
doSomething(int i), it is both legal to call
doSomething(new Integer(1)). But when these types are used as types of operand of, say, method call, the situation is the different: you shouldn't be able to call a method of an expression of type int, of an integer constant for example. So, we have to conclude that in one sense
Integer are subtypes of each other, while in other sense they are not.
For solving such a controversy, we introduce two relationships of subtyping: namely, weak and strong subtyping. Weak subtyping will follow from strong subtyping: if a node is a strong subtype of another node, then it is it's weak subtype also.
Then, we can say about our example, that int and Integer are weak subtypes of each other, but they are not strong subtypes. Assignment and parameters passing require weak subtyping only, method calls require strong subtyping.
When you create an inequation in you typesystem, you may choose it to be a strong or weak inequation. Also subtyping rules, those which state subtyping relationship (see below), can be either weak or strong. A weak inequation looks like
:<=:, a strong inequation looks like
In most cases you want to state strong subtyping, and to check weak subtyping. If you are not sure which subtyping you need, use weak one for inequations, strong one for subtyping rules.
When the typesystem engine solves inequations, it requires information about whether a type is a subtype of another type. But how does the typesystem engine know about that? It uses subtyping rules. Subtyping rules are used to express subtyping relationship between types. In fact, a subtyping rule is a function which, given a type, returns its immediate supertypes.
A subtyping rule consists of a condition (which can be either a concept reference or a pattern) and a body, which is a list of statements that compute and return a node or a list of nodes that are immediate supertypes of the given node. When checking whether some type A is a supertype of another type B, the typesystem engine applies subtyping rules to B and computes its immediate supertypes, then applies subtyping rules to those supertypes and so on. If type A is among the computed supertypes of type B, the answer is "yes".
By default, subtyping stated by subtyping rules is a strong one. If you want to state only weak subtyping, set "is weak" property of a rule to "true".
Comparison Inequations And Comparison Rules
Consider you want to write a rule for EqualsExpression (operator
== in Java, BaseLanguage and some other languages): you want left operand and right operand of EqualsExpression to be comparable, that is either type of a left operand should be a (non-strict) subtype of a right operand, or vice versa. To express this, you write a comparison inequation, in a form
expr1 :~: expr2, where expr1 and expr2 are expressions which represent types. Such an inequation is fulfilled if
expr1 is a subtype of
expr2 (expr1 <: expr2), or
expr2 <: expr1.
Then, consider that, say, any Java interfaces should also be comparable, even if such interfaces are not subtypes of one another. That is because there always can be written a class which implements both of interfaces, so variables of interface types can contain the same node, and variable of an interface type can be cast to any other interface. hence a equation, cast, instanceof expressions with both types being interface types should be legal (and in Java they are such).
To state such a comparability which does not follow from subtyping relationship, one should use comparison rules. A comparison rule consists of two conditions for the two applicable types and a body which returns true if the types are comparable or false if they are not.
Here's the comparison rule for interface types:
A quotation is a language construction that lets you easily create a node with a required structure. Of course, you can create a node using smodelLanguage and then populate it with appropriate children, properties and references, using the same smodelLanguage. But there's a simpler - and more visual - way to accomplish that.
A quotation is an expression whose value is the MPS node written inside the quotation. Think about a quotation as a "node literal", a construction similar to numeric constants and string literals. That is, you write a literal if you statically know what value do you mean. So inside a quotation you don't write an expression which evaluates to a node, you rather write the node itself. That is, an expression
2 + 3 evaluates to
5, an expression
< 2 + 3 > (angled braces being quotation braces) evaluates to a node PlusExpression with leftOperand being IntegerConstant 3 and rightOperand being IntegerConstant
For it is a literal, a value of quotation should be known statically. So in case where you know some parts (i.e. children, referents or properties) of your node only dynamically i.e. those parts should be evaluated in runtime and are not known in design time, then you can't use just a quotation to create a node with such parts.
The good news, however, is that if you know the most part of a node statically and you want to replace only several parts of by dynamically-evaluated nodes you can use antiquotations. An antiquotation can be of 4 types: child, reference, property and list antiquotation. They both contain an expression which evaluates dynamically to replace a part of quoted node by its result. Child and referent antiquotations evaluate to a node, property antiquotation evaluates to string and list antiquotation evaluates to list of nodes.
For instance, you want to create a ClassifierType with the class ArrayList, but its type parameter is known only dynamically, for instance by calling a method, say, "computeMyTypeParameter()".
Thus, you write the following expression: < ArrayList < %( computeMyTypeParameter() )% > >. The construction %(...)% here is that very antiquotation.
One also may antiquotate reference targets and property values, with ^(...)^ and $(...)$, respectively; or a list of children of one role, using *(...)* .
a) If you want to replace a node somewhere inside a quoted node with a node evaluated by an expression, you use node antiquotation, that is %( )%. As you may guess there's no sense to replace the whole quoted node with an antiquotation with an expression inside, because you can write such an expression simply in your program instead.
So such an antiquotation is used to replace children, grandchildren, great-grandchildren and other descendants of a quoted node. Thus, an expression inside of antiquotation should return a node. To write such an antiquotation, position your caret on a cell for a child and type "%".
b) If you want to replace a target of a reference from somewhere inside a quoted node with a node evaluated by an expression, you use reference antiquotation, that is ^(...)^. To write such an antiquotation, position your caret on a cell for a referent and type "^".
c) If you want to replace a child (or some more deeply located descendant) which is of a multiple-cardinality role, and if for that reason you may want to replace it not with a single node but rather with several ones, then use child list (simply list for brevity) antiquotations, *( )*. An expression inside a list antiquotation should return a list of nodes, that is of type nlist<..> or compatible type (i.e. list<node<..>> is ok, too, as well as some others). To write such an antiquotation, position your caret on a cell for a child inside a child collection and type "*". You can not use it on an empty child collection, so before you press "*" you have to enter a single child inside it.
d) If you want to replace a property value of a quoted node by a dynamicаlly calculated value, use property antiquotation $( )$. An expression inside a quotation should return string, which will be a value for an antiquoted property of a quoted node. To write such an antiquotation, position your caret on a cell for a property and type "$".
Examples Of Inference Rules
Here are the simplest basic use cases of an inference rule:
- to assign the same type to all instances of a concept (useful mainly for literals):
- to equate a type of a declaration and the references to it (for example, for variables and their usages):
- to give a type to a node with a type annotation (for example, type of a variable declaration):
- to establish a restriction for a type of a certain node: useful for actual parameters of a method, an initializer of a type variable, the right-hand part of an assignment, etc.
Inside the typesystem engine, a type may be either a concrete type (a node) or a so-called type variable. Also, it may be a node which contains some type variables as its children or further descendants. A type variable is an undefined type which may then become a concrete type, as a result of solving equations that contain this type variable.
Type variables appear at runtime mainly as a result of a "typeof" operation, but you can create them manually if you want to. There's a statement TypeVarDeclaration in typesystem lanuage to do so. You write it like "var T" or "var X" or "var V", i.e. "var" followed by the name of a type variable. Then you may use your variable, for example, in antiquotations to create a node with type variables inside.
Example: an inference rule for "for each" loop. A "for each" loop in Java consists of a loop body, an iterable to iterate over, and a variable into which the next member of an iterable is assigned before the next iteration. An iterable should be either an instance of a subclass of the Iterable interface, or an array. To simplify the example, we don't consider the case where the iterable is an array. Therefore, we need to express the following: an iterable's type should be a subtype of an Iterable of something, and the variable's type should be a supertype of that very something. For instance, you can write the following:
or the following:
Iterables in both examples above have the type ArrayList<String> which is a subtype of Iterable<String>. Variables have types String and Object, respectively, both of which are subtypes of String.
As we see, an iterable's type should be a subtype of an Iterable of something, and the variable's type should be a supertype of that very something. But how to say "that very something" in typesystem language? The answer is, it's a type variable that we use to express the link between the type of an iterable and the type of a variable. So we write the following inference rule:
Meet and Join types
Meet and Join types are special types which are treated differently by the typesystem engine. Technically Meet and Join types are instances of MeetType and JoinType concepts, respectively. They may have an arbitrary number of argument types which could be any nodes. Semantically, a Join type is a type which is a supertype of all its arguments, and a node which has a type Join(T1|T2|..Tn) can be regarded as if it has type T1 or type T2 or... or type Tn. a Meet type is a type which is a subtype of its every argument, so one can say that a node which has a type Meet(T1&T2&..&Tn) inhabits type T1 and type T2 and.. and type Tn. The separators of arguments of Join and Meet types (i.e. "|" and "&") are chosen respectively to serve as a mnemonics.
Meet and Join types are not unuseful. Meet types appear even in MPS Base Language (which is very close to Java), for instance the type of such an expression:
is Meet(Serializable & Comparable), because both Integer (the type of new Integer(1)) and String (the type of "hello") implement both Serializable and Comparable.
Join type is useful when, say, you want some function-like concept return values of two different types (node or list of nodes, for instance). Then you should make type of its invocation be Join(node<> | list<node<>>).
You can create Meet and Join types by yourself, if you need to. Use quotations to create them, just as with other types and other nodes. The concepts of Meet and Join types are MeetType and JoinType, as it is said above.
"When Concrete" Blocks
Sometimes one may want not only to write equations and inequations for a certain types, but to perform some complex analysis with type structure. That is, inspect inner structure of a concrete type: its children, children of children, referents, etc.
It may seem that one just may write typeof(some expression), and then analyze this type. The problem is, however, that one can't just inspect a result of "typeof" expression because it may be a type variable at that moment. Although a type variable usually will become a concrete type at some moment, it can't be guaranteed that it is concrete in some given point of your typesystem code.
To solve such a problem you can use a "when concrete" block.
Here, "expr" is an expression which will evaluate to a mere type you want to inspect (not to a node type of which you want to inspect), and "var" is a variable to which an expression will be assigned. Then this variable may be used inside a body of "when concrete" block. A body is a list of statements which will be executed only when a type denoted by "expr" becomes concrete, thus inside the body of a when concrete block you may safely inspect its children, properties, etc. if you need to.
If you have written a when concrete block and look into its inspector you will see two options: "is shallow" and "skip error". If you set "is shallow" to "true", the body of your when concrete block will be executed when expression becomes shallowly concrete, i.e. becomes not a type variable itself but possibly has type variables as children or referents. Normally, if your expression in condition of when concrete block is never concrete, then an error is reported. If it is normal for a type denoted by your expression to never become a concrete type, you can disable such error reporting by setting "skip error" to true.
Sometimes an operator (like +, -, etc.) has different semantics when applied to different values. For example, + in Java means addition when applied to numbers, and it means string concatenation if one of its operands is of type String. When the semantics of an operator depends on the types of its operands, it's called operator overloading. In fact, we have many different operators denoted by the same syntactic construction.
Let's try to write an inference rule for a plus expression. First, we should inspect the types of operands, because if we don't know the types of operands (whether they are numbers or Strings), we cannot choose the type for an operation (it will be either a number or a String). To be sure that types of operands are concrete we'll surround our code with two when concrete blocks, one for left operand's type and another one for right operand's type.
Then, we can write some inspections, where we check whether our types are strings or numbers and choose an appropriate type of operation. But there will be a problem here: if someone writes an extension of baseLanguage, where one wants to use plus expression for addition of some other entities, say, matrices or dates, this person won't be able to use plus expression because types for plus expression are hard-coded in the inference rule for it. So, we need an extension point to allow language-developers to overload existing binary operations.
Typesystem language has such an extension point. It consists of:
- overloading operation rules and
- a construct which provides a type of operation by operation and types of its operands.
For instance, a rule for PlusExpression in baseLanguage is written as follows:
Here, "operation type" is a construct which provides a type of an operation according to types of operation's left operand's type, right operand's type and the operation itself. For such a purpose it uses overloading operation rules.
Overloaded Operation Rules
Overloaded operation rules reside within a root node of concept OverloadedOpRulesContainer. Each overloaded operation rule consists of:
- an applicable operation concept, i.e. a reference to a concept of operation to which a rule is applicable (e.g. PlusExpression);
- left and right operand type restrictions, which contain a type which restricts a type of left/right operand, respectively. A restriction can be either exact or not, which means that a type of an operand should be exactly a type in a restriction(if restriction is exact), or its subtype(if not exact), for a rule to be applicable to such operand types;
- a function itself, which returns a type of operation by operation, left and right operand types.
Here's an example of one of overloading operation rules for PlusExpression in baseLanguage:
Consider the following use case: you have types for functions in your language, e.g. (a 1, a 2,...a N ) -> r, where a 1, a 2, .., a N, and r are types: a K is a type of K-th function argument and r is a type of a result of a function. Then you want to say that your function types are covariant by their return types and countervariant by their argument types. That is, a function type F = (T 1, .., T N) -> R is a subtype of a function type G = (S 1, .., S N) -> Q (written as F <: G) if and only if R <: Q (covariant by return type) and for any K from 1 to N, T K :> S K (that is, countervariant by arguments' types).
The problem is, how to express covariance and countervariance in typesystem language? Using subtyping rules you may express covariance by writing something like this:
Okay, we have collected all immediate supertypes for a function's return type and have created a list of function types with those collected types as return types and with original argument types. But, first, if we have many supertypes of return type, it's not very efficient to perform such an action each time we need to solve an inequation and second, although now we have covariance by function's return type, we still don't have countervariance by function's arguments' types. We can't collect immediate subtypes of a certain type because subtyping rules give us supertypes, not subtypes.
In fact, we just want to express the abovementioned property: F = (T 1, .., T N) -> R <: G = (S 1, .., S N) -> Q (written as F <: G) if and only if R <: Q and for any K from 1 to N, T K :> S K . For this and similar purposes typesystem language has a notion called "replacement rule."
What's a replacement rule?
A replacement rule is another way to solve inequations. While the standard way is to transitively apply subtyping rules to a should-be-subtype until a should-be-supertype is found among the results (or is never found among the results), a replacement rule, if applicable to a inequation, removes the inequation and then executes its body (which usually contains "create equation" and "create inequation" statements).
A replacement rule for above-mentioned example is written as follows:
Here we say that a rule is applicable to a should-be-subtype of concept FunctionType and a should-be-supertype of concept FunctionType. The body of a rule ensures that the number of parameter types of function types are equal, otherwise it reports an error and returns. If the numbers of parameter types of both function types are equal, a rule creates an inequation with return types and appropriate inequation for appropriate parameter types.
Another simple example of replacement rules usage is a rule which states that a Null type (a type of null literal) is a subtype of every type except primitive ones. Of course, we can't write a subtyping rule for Null type which returns a list of all types. Instead, we write the following replacement rule:
This rule is applicable to any should-be-supertype and to those should-be-subtypes which are Null types. The only thing this rule does is checking whether should-be-supertype is an instance of PrimitiveType concept. If it is, the rule reports an error. If is not, the rule does nothing, therefore the inequation to solve is simply removed from the typesystem engine with no further effects.
A semantic of a replacement rule, as explained above, is to replace an inequation with some other equations and inequations or to perform some other actions when applied. This semantic really doesn't state that a certain type is a subtype of another type under some conditions. It just defines how to solve an inequation with those two types.
For example, suppose that during generation you need to inspect whether some statically unknown type is a subtype of String. What will an engine answer when a type to inspect is Null type? When we have an inequation, a replacement rule can say that it is true, but for this case its abovementioned semantics is unuseful: we have no inequations, we have a question to answer yes or no. With function types, it is worse because a rule says that we should create some inequations. So, what do we have to do with them in our use case?
To make replacement rules usable when we want to inspect whether a type is a subtype of an another type, a different semantic is given to replacement rules in such a case.
This semantic is as follows: each "add equation" statement is treated as an inspection of whether two nodes match; each "add inequation" statement is treated as an inspection of whether one node is a subtype of another; each report error statement is treated as "return false."
Consider the above replacement rule for function types:
In a different semantic, it will be treated as follows:
So, as we can see, another semantic is quite an intuitive mapping between creating equations/inequations and performing inspections.
Advanced features of typesystem language
Basically, inequations may affect nodes' types, for instance if one of the inequation part is a type variable, it may become a concrete type because of this inequation. But, sometimes one does not want a certain inequation to create types, only to check whether such an inequation is satisfied. We call such inequations check-only inequations. To mark an inequation as a check-only, one should go to this inequation's inspector and should set a flag "check-only" to "true". An "less or equals" sign for check-only inequation visually is gray, while for normal ones it is black, so you can see whether an inequation is check-only without looking at its inspector.
When writing a generator for a certain language (see generator), one may want to ask for a type of a certain node in generator queries. When generator generates a model, such a query will make typesystem engine do some typechecking to find out the type needed. Performing full typechecking of a node's containing root to obtain the node's type is expensive and almost always unnecessary. In most cases, the typechecker should check only the node given. In more difficult cases, obtaining the type of a given node may require checking its parent or maybe a further ancestor. The typechecking engine will check a given node if the computed type is not fully concrete (i.e. contains one or more type variables); then the typechecker will check the node's parent, and so on.
Sometimes there's an even more complex case: the type of a certain node being computed in isolation is fully concrete; and the type of the same node - in a certain environment - is fully concrete also but differs from the first one. In such a case, the abovementioned algorithm will break, returning the type of an node as if being isolated, which is not the correct type for the given node.
To solve this kind of problem, you can give some hints to the typechecker. Such hints are called dependencies - they express a fact that a node's type depends on some other node. Thus, when computing a type of a certain node during generation, if this node has some dependencies they will be checked also, so the node will be type-checked in an appropriate environment.
A dependency consists of a "target" concept (a concept of a node being checked, whose type depends on some other node), an optional "source" concept (a concept of another node on which a type of a node depends), and a query which returns dependencies for a node which is being checked, i.e. a query returns a node or a set of nodes.
For example, sometimes a type of a variable initializer should be checked with the enclosing variable declaration to obtain the correct type. A dependency which implements such a behavior may be written as follows:
That means the following: if the typechecker is asked for a type of a certain Expression during generation, it will check whether such an expression is of a role initializer, and if it is, will say that not only the given Expression but also its parent should be checked to get the correct type for the given Expression.