IntelliJ IDEA supports two main types of indexes: file-based indexes and stub indexes. File-based indexes are built directly over the content of files, and stub indexes are built over serialized stub trees. A stub tree for a source file is a subset of its PSI tree which contains only externally visible declarations and is serialized in a compact binary format. Querying a file-based index gets you the set of files matching a certain condition, and querying a stub index gets you the set of matching PSI elements. Therefore, custom language plugin developers should typically use stub indexes in their plugin implementations.
File-based indexes in IntelliJ IDEA are based on a map/reduce architecture. Each index has a certain type of key and a certain type of value. The key is what's later used to retrieve data from the index; for example, in the word index the key is the word itself. The value is arbitrary data which is associated with the key in the index; for example, in the word index the value is a mask indicating in which context the word occurs (code, string literal or comment). In the simplest case (when we only need to know in what files some data occurs), the value has type Void and is not stored in the index.
When the index implementation indexes a file, it receives the content of a file and returns a map from the keys found in the file to the associated values. When the index is accessed, you specify the key that you're interested in and get back the list of files in which the key occurs and the value associated with each file.
Implementing a File-based Index
The standard word index has a fairly straightforward implementation; you can refer to it as an example to understand this discussion better.
Each specific index implementation is a class extending FileBasedIndexExtension, registered in the <fileBasedIndex> extension point. The implementation contains of the following main parts:
- getIndexer() returns the indexer class, which is responsible for actually building a set of key/value pairs based on the file content.
- getKeyDescriptor() returns the key descriptor, which is responsible for comparing the keys and storing them in a serialized binary format. Probably the most commonly used KeyDescriptor implementation is EnumeratorStringDescriptor implementation, designed for storing identifiers in an efficient way.
- getValueExternalizer() returns the value serializer, which takes care of storing values in a serialized binary format.
- getInputFilter() allows to restrict the indexing only to a certain set of files.
- getVersion() returns the version of the index implementation. The index is automatically rebuilt if the current version differs from the version of the index implementation used to build the index.
If you don't need to associate any value with the files (i.e. your value type is Void), you can simplify the implementation by using ScalarIndexExtension as the base class.