The set of IntelliJ IDEA features which are supported for custom languages includes:
In addition, IDEA provides a powerful framework on which additional intelligence features, like refactorings and code analysis, can be implemented.
If you have any questions or comments related to the Language API or any other aspects of IntelliJ IDEA plugin development, feel free to ask them in the jetbrains.intellij.openapi newsgroup on the news.jetbrains.com news server, or in the corresponding Web forum. The newsgroup is monitored by JetBrains developers who will be able to help you with the development.
The information in this document has been updated to cover the API changes and new features of IntelliJ IDEA 8.0.
The first step in developing a custom language plugin is registering a file type the language will be associated with. IDEA determines the type of a file by looking at its file name. Thus, a custom language can only be associated with specific file names or extensions - it is not currently possible to create a language which will be applied to files with specific content, like, for example, a specific XML root namespace.
A custom language file type is a class derived from LanguageFileType, which passes a Language implementation class to its base class constructor. To register a file type, the plugin developer provides an implementation of the FileTypeFactory interface, which is registered via the com.intellij.fileTypeFactory extension point.
To verify that the file type is indeed registered correctly, you can implement the LanguageFileType.getIcon() method and verify that the correct icon is displayed for files which have the extension associated with your file type.
The lexer (lexical analyzer) defines how the contents of a file is broken into tokens. The lexer serves as a foundation for nearly all of the features of custom language plugins, from basic syntax highlighting to advanced code analysis features. The API for the lexer is defined by the Lexer interface.
IDEA invokes the lexer in three main contexts, and the plugin can provide different lexer implementations for these contexts:
The lexer used for syntax highlighting can be invoked incrementally to process only the changed part of a file, whereas lexers used in other contexts are always called to process an entire file, or a complete language construction embedded in a file in a different language. An important requirement for a syntax highlighting lexer, required for incremental lexing, is that its state must be represented by a single integer number (returned from Lexer.getState()). That state will be passed to the Lexer.start() method, along with the start offset of the fragment to process, when lexing is resumed from the middle of a file. Lexers used in other contexts can always return 0 from the getState() method if their state has a more complex internal representation.
The easiest way to create a lexer for a custom language plugin is to use JFlex. IDEA contains adapter classes (FlexLexer and FlexAdapter) that adapt JFlex lexers to the IDEA lexer API. The Plugin Development package includes a patched version of JFlex 1.4.1 (tools/jflex) and lexer skeleton file (tools/jflex/idea-flex.skeleton) which can be used for creating lexers compatible with FlexAdapter. The patched version of JFlex provides a new command line option --charat which changes the JFlex generated code so that it works with the IDEA skeleton (which passes the source data for lexing as a CharSequence and not as an array of characters).
For developing lexers using JFlex, the JFlex Support plugin can be useful. It provides syntax highlighting and other useful features for editing JFlex files.
Note that lexers, and in particular JFlex-based lexers, need to be created in such a way that they always match the entire contents of the file, without any gaps between tokens, and generate special tokens for characters which are not valid at their location. Lexers must never abort prematurely because of an invalid character.
Types of tokens for lexers used in IDEA are defined by instances of IElementType. A number of token types common for all languages are defined in the TokenType interface; custom language plugins should reuse these token types wherever applicable. For all other token types, the plugin needs to create new IElementType instances and associate with the language in which the token type is used. The same IElementType instance should be returned every time a particular token type is encountered by the lexer.
An important feature which can be implemented at lexer level is mixing languages within a file (for example, embedding fragments of Java code in some template language). If a language supports embedding its fragments in another language, it needs to define the chameleon token types for different types of fragments which can be embedded, and these token types need to implement the IChameleonElementType interface. The lexer of the enclosing language needs to return the entire fragment of the embedded language as a single chameleon token, of the type defined by the embedded language. To parse the contents of the chameleon token, IDEA will call the parser of the embedded language through a call to IChameleonElementType.parseContents().
to be continued