New Language Support

Skip to end of metadata
Go to start of metadata

WORK IN PROGRESS

In this part of the guide we'll look at developing ReSharper support for a new language. We shall take a look at the following:

Introduction

ReSharper supports a wide variety of language and also supports a mixture of languages (e.g., .cshtml files are a mixture of C# and HTML). Plugin writers can use existing infrastructure in order to support new languages within ReSharper. A language implementation can be a plug-in or part of a plug-in, and in most cases no special action is required for it to be picked up and recognized by ReSharper.

In order to interpret your language, you can use any parser/lexer technology (e.g., CsLex or ANTLR) in order to read in the file structure. However, ReSharper does place constraints on the data types (e.g., interfaces) that the parsed structure needs to support - this is especially so if you're parsing a CLR language.

This guide shows you how to put in place the necessary infrastructure for supporting a new language under ReSharper.

Language Definition

First of all, you need to create a class inheriting from KnownLanguage. The constraints on this class (let's call it MyLanguage here) are that it should:

  • Be declared public
  • Have one public static non-readonly, non-initialized field of a MyLanguage type
  • Must have a parameterless constructor
  • Should have a set of protected constructors to allow language inheritance.

The MyLanguage class must also be marked with the [ReSharper:LanguageDefinition] attribute. This attribute takes two parameters:

  • The first and only required parameter is the name of the language.
  • (Optional) The Edition parameter specifies the ReSharper edition that this language supports. If you need to constrain the language to a particular R# edition, use an element of the ReSharperEditions.Ids enumeration here. Reminder: R# currently comes in three editions -- C#, VB.NET and Full.

Here is an example of a language definition for the C# language:

Project File Type

Now, we need to tell ReSharper how to support a project of a particular type. This is another class that contains primarily metadata, and is used mainly to indicate the file extensions that correspond to a particular language. Just like the Language Definition class, this class (let's call it MyLanguageProjectFileType) requires that:

  • It is declared public
  • It has a publicstatic, non-readonly field of type MyLanguageProjectFileType
  • It has a parameterless constructor. This constructor calls its base class' constructor, passing the array of file extensions that this project file type supports.
  • It has a set of protected methods for inheritance

In addition, the project file type class has to be decorated with the ProjectFileTypeDefinition attribute. This attribute has the following parameters:

  • The Type parameter requires name of the language (as per language definition)
  • (Optional) The Edition parameter determines the R# edition that supports this project file type.
  • (Optional) The Internal boolean parameter determines whether this project file type is only supported in internal mode. Plugin writers should not define this parameter.

Here is an example of a project file type class for HTML files:


Project File Language Service

Let's create a project file language service. This entity is used to tell us which language tree needs to be constructed for a particular file. For example, the XamlProjectFileLanguageService would create a XAML tree of the XAML file is part of the project, and only an XML tree if it is not.

The project file language service is a class which implements the IProjectFileLanguageService interface and is decorated with the ProjectFileType attribute relating to the project file type. The attribute takes a single parameter -- the typeof(MyLanguageProjectFileType) that we created earler.

We are now getting in the thick of language development, this being the last stop before we go to define the overall language service (the one that exposes lexers, parsers and other supplementary information). Let us go through the members of our MyProjectFileLanguageService implementation.

  • First of all, we need to have a public constructor that will take a parameter of the MyProjectFileType type. We'll need to store this parameter for returning it later.
  • The LanguageType property is precisely the location where we would typically return the above stored value.
  • The Icon property needs to return the icon for this type of file. If you are storing the icon internally as an embedded resource, use ImageLoader.GetImage("icon.resource.name", null) to return the icon.
  • The GetPsiLanguageType() method takes an IProjectFile parameter. If the LanguageType property of this parameter matches our language, then we return the Instance field of our language. Otherwise, we return UnknownLanguage.Instance. Here is an example:
  • There is also an overload of the GetPsiLanguageType() method that takes a ProjectFileType as a parameter. The implementation of this method is similar to the one above:
  • The GetMixedLexerFactory() method returns the mixed lexer factory. The mixed lexer (mixed refers to the possibility of having mixed languages in a file) is returned by the language service, which we haven't defined. The typical implementation of this method is as follows:
    Icon

    In the above code, LanguageService() is an extension method that resides in the PsiLanguageTypeExtensions class.

  • The GetPreprocessorDefines() method is used to return a set of PreProcessingDirective definitions. If your language doesn’t have preprocessing directives, in which case you can simply return EmptyArray<PreProcessingDirective>.Instance.
  • The GetPsiProperties() method is used to return a new instance of the PSI properties type for this file as follows:
    We haven't seen the PSI Properties class, so that's coming up next.

PSI Properties

Yet another service entity that is required for successful language support is the PSI Properties entity. In order to understand it why it is needed, it's important to understand that when working with files, we essentially operate on two different entities: IProjectFile and IPsiSourceFile:

  • IProjectFile is part of the project model, i.e., it holds information about the file in the context of the project it's in. Among other things, it yields information regarding its ProjectFileType (we have defined this earlier), and also yields an IProjectFileProperties value that contains the types of file properties you see when you open the Properties window (or press F4) in Visual Studio.
  • IPsiSourceFile is part of the PSI. It also yields a ProjectFileType as one of its properties, but it also yields a language (a PsiLanguageType inheritor such as MyLanguage) as well as a set of IPsiSourceFileProperties. These properties correspond to the code model, i.e., the AST that represents the file.

Thus, the PSI Properties entity is a kind of glue that manages takes as parameters both an IProjectFile and a IPsiSourceFile and manages to bind the two together.
The PSI Properties class is a class deriving from DefaultPsiProjectFileProperties, and is often located as an inner class of the above language service type. This class takes two parameters - the project file and the PSI source file. Typically, its constructor simply passes them up to the parent:

The above is a minimal implementation. Often, the PSI Properties type contains overrides for some of the properties defined by the base class. Here are a few such properties:

  • ShouldBuildPsi determines whether the PSI tree needs to be built for this file. By default, the value is true for any type that isn't null or unknown. This property should be overridden in certain cases where it's not worth building a PSI - for example, in the case where you have a XAML file that's not part of a project.
  • ProvidesCodeModel determines whether the file provides a code model. Not all files do - for example, an XML file does not. By default, this has the value of ShouldBuildPsi.
  • IsNonUserFile determines whether this is a file is owned by the user or not. Typically this has the value of !IsCompile, i.e. depends on the file's build action.
  • IsGeneratedFile determines whether this file is generated or not. By default, R# tries to get this information from the provided projectFile.

Language Service

We are finally ready to create is a language service – an entity that finally lets us work with the language AST. This is a class inheriting from LanguageService. The requirements for this class are:

  • It must be declared public
  • It must have a public constructor taking at least the language definition (i.e., the MyLanguage type) and an IConstantValueService.
  • It must call the base constructor with the above parameters.
Icon

After implementing the above, check that your breakpoints actually fire when opening the file with your chosen extension. An empty implementation of the above should be sufficient for ReSharper to identify the feature-related language service.

The language service is where all the activity occurs. In particular, the language services exposes both the lexer and the parser, as well as a number of complementary services that may or may not be required for the particular language.

We begin our language implementation with the lexer. The language service type has two members relating to the creation of lexers.

The first is the GetPrimaryLexerFactory() method. This method returns a lexer factory, which is simply a class that implements the ILexerFactory interface and whose CreateLexer() method returns an instance of a lexer (which we'll define in a moment):

Thus, the implementation of MyLanguageService.GetPrimaryLexerFactory() reduces to:

Of course, we have not yet mentioned the ILexer. Before we get on to actually making a lexer, it's also worth mentioning another LanguageService member -- the CreateFilteringLexer() method. Now, just as the name suggests, a filtering lexer is a lexer that filters (i.e. ignores) certain token types. A filtering lexer inherits from the FilteringLexer class. Apart from having to have an ILexer as a constructor parameter passed up to its base class, it has a single method called Skip() that you need to override.

The Skip() method is simple: it takes a TokenNodeType and determines whether this is the kind of token that needs to be skipped. Typical tokens to be skipped often include whitespace, line breaks, comments or code within certain preprocessor directives. Now, the simplest way to provide support for this is to create a NodeTypeSet of all the tokens that you intend to skip as follows:

With this definition in place, the implementation of the Skip() method can be as follows:

Now, I'm sure you can't wait to see the implementation of the ILexer type and the actual tokens (like the ones we used above), so let's take a look at lexer construction.

Lexer

What is a lexer? In essence, a lexer is a program that takes source code as text and breaks it down into easily recognizable lexical tokens. A lexer begins by breaking down the analyzed text into a set of tokens - individually identifiable lexical constructs. An example of a lexical construct would be a keyword (e.g., catch), an operator (e.g., &&) or an identifier.

Lexical analyzers are typically not created by hand - instead, both the lexer and the various token types are automatically generated by a ready-made software called, predictably, a lexer. Many free lexers are available, but the one we are going to work with is called CsLex, and is available to download from the following address: http://www.cybercom.net/~zbrad/DotNet/Lex/. CsLex is an open-source, C# implementation of the original Lex lexer generator.

The challenge in creating the actual lexer for your language thus becomes twofold:

  • First, you need to adapt CsLex to become part of your build process.
  • Next, you need to prepare a specifically formatted specification file from which the appropriate token analyzer (i.e., an ILexer) is generated.

The first of the above steps is fairly trivial and is not described in this guide. Suffice it to say, you can either simply use CsLex from the command line or, if necessary, write a custom tool or an MSBuild task that will call it for you. As for the second step, this does require quite a bit of explaining.

To start with, your lexer consists of three things:

  • Metadata regarding the lexer class that is generated. This typically includes information about the namespace, name of class, visibility, implemented interfaces (in our case, just ILexer), and so on.
  • Code that you want to inject in situ. Keep in mind that the lexer type is declared partial, so all the missing methods of ILexer can be implemented in a separate file.
  • Information about the tokens which constitute the language. This forms the bulk of the lexer definition.

Let's deal with the lexer metadata first. Here is an example of a metadata block:

The above basically states that the lexer class should be called FSharpLexerGenerated, should be created within the MyLexer namespace, should be public and implement ILexer. Its primary lexing function shall be called _locateToken(), and it is precisely this function that will be subsequently called from the parts of the ILexer interface that we have to implement by hand. Also, the above defines the type of our token to be TokenNodeType and determines the steps to be taken when we reach the end of the data to be parsed.

An explanation of TokenNodeType is in order here. This type is a ReSharper-defined type from which the types corresponding to lexical nodes are derived. For example, a typical inheritance chain looks like this: TokenNodeType > CSharpTokenNodeType -> KeywordTokenNodeType -> CSharpTokenType.AsyncKeywordNodeType. The bottom type of this chain would then correspond to the async keyword.

Let's not dwell on this for now. Fast-forwarding to the actual lexical definitions, most of the various tokens would actually be defined as follows:

What this implies, at the very least, is that the non-generated part of your lexer has a field of a TokenNodeType class and a method called makeToken(). In fact, here they are:

Keywords

Of course, the case with the != operator is simple: what if we come across a standalone string instead? Is it a keyword? Or an identifier? How do we know? In actual fact, the definition in our .lex file would appear as follows:

The above code implies that our lexer has a (non-generated) Dictionary called keywords that contains mappings from keywords (e.g., "async") to actual token types (e.g., AsyncKeywordNodeType). The yytext() call actually gets you the text you're on. The dictionary is best kept static, and it makes sense to intialize it in a static constructor:

Now, where did that LET_KEYWORD come from? In actual fact, the explanation is a bit complicated. Let's start with the idea that in our, somewhat synthetic, example, we have a partial static class called FSharpTokenType which serves as a container for the token node type (i.e., the base type) as well as the actual node types and their instances. Starting with the definition of the base classes first, we get:

So now, all you have to do is inherit the above the inner type, but there are nuances: you need to call the base class' constructor with a string identifier that's unique for this lexeme, and you also need to override the Create() method to return an actual token element.

The token element itself is yet another class! That class, in our case, derives from FSharpTokenBase which is, once again, our own class which we'll talk about later. For now, let's take a look at its definition:

There's that LET_KEYWORD again, and we promised to explain it. Okay, so here's the final bit of explanation: LET_KEYWORD is nothing more than a static instance of the LetKeywordNodeType:

Testing

ReSharper comes with a class called LexerTestBase, which is a base class useful for testing lexers. To get started using this class, you need to override the CreateLexer(StreamReader) method to return a lexer of the type that you intend to test. For example:

You might also wish to decorate your test fixture with the TestFileExtension attribute, letting the test system know the extension of test data files.

Once you're done, you need to provide test data in pairs. For example, you might have an input file test01.fs as follows:

and a corresponding test01.fs.gold file such as:

When the tests are executed, the lexer will compare the lexeme names in the .gold file with what has actually been parsed.

Icon

If your lexer has any errors and is unable to parse the input file, it is likely you'll get an exception without any indication of what went wrong.

2 Comments

  1. Overriding of Extension property of LexerTestBase is obsolete. Use TestFileExtension attribute per class or specific test method.