As the name suggests, a Transformer is a kind of object that transforms other objects. In pliers, every Transformer always takes a single Stim as its input, though it can return different outputs. The Transformer API in pliers is modeled loosely on the widely-used scikit-learn API; as such, what defines a Transformer, from a user’s perspective, is that one can always call pass a Stim instance to Transformer’s .transform() method and expect to get another object as a result.

In practice, most users should never have any reason to directly instantiate the base Transformer class. We will almost invariably work with one of three different Transformer sub-classes: Extractor, Converter, and Filter. These classes are distinguished by the type of output that their respective .transform() methods produce:

Transformer class Input Output
Extractor AStim ExtractorResult
Converter AStim BStim
Filter AStim AStim

Here, AStim and BStim are different Stim subclasses. So an Extractor always returns an ExtractorResult, no matter what type of Stim it receives as input. A Converter and a Filter are distinguished by the fact that a Converter always returns a Stim of a different class than its input, while a Filter always returns a Stim of the same type as its input. This simple hierarchy turns out to be extremely powerful, as it enables us to operate in a natural, graph-like way over Stims, by filtering and converting them as needed before applying one or more Extractors to obtain extracted feature values.

Let’s examine each of these Transformer types more carefully.


Extractors are the most important kind of Transformer in pliers, and in many cases, users will never have to touch any other kind of Transformer directly. Every Extractor implements a transform() method that takes a Stim object as its first argument, and returns an object of class ExtractorResult (see below). For example:

# Google Cloud Vision API face detection
from pliers.extractors import GoogleVisionAPIFaceExtractor

ext = GoogleVisionAPIExtractor()
result = ext.transform('my_image.jpg')

List of Extractor classes

At present, pliers implements several dozen Extractor classes that span a wide variety of input modalities and types of extracted features. These include:

Audio feature extractors

ChromaCENSExtractor([n_chroma]) Extracts a chroma variant “Chroma Energy Normalized” (CENS) chromogram from audio (via Librosa).
ChromaCQTExtractor([n_chroma]) Extracts a constant-q chromogram from audio using the Librosa library.
ChromaSTFTExtractor([n_chroma]) Extracts a chromagram from an audio’s waveform using the Librosa library.
MeanAmplitudeExtractor([name]) Mean amplitude extractor for blocks of audio with transcription.
MelspectrogramExtractor([n_mels]) Extracts mel-scaled spectrogram from audio using the Librosa library.
MFCCExtractor([n_mfcc]) Extracts Mel Frequency Ceptral Coefficients from audio using the Librosa library.
PolyFeaturesExtractor([order]) Extracts the coefficients of fitting an nth-order polynomial to the columns of an audio’s spectrogram (via Librosa).
RMSExtractor([feature, hop_length]) Extracts root mean square (RMS) from audio using the Librosa library.
SpectralCentroidExtractor([feature, hop_length]) Extracts the spectral centroids from audio using the Librosa library.
SpectralBandwidthExtractor([feature, hop_length]) Extracts the p’th-order spectral bandwidth from audio using the Librosa library.
SpectralContrastExtractor([n_bands]) Extracts the spectral contrast from audio using the Librosa library.
SpectralRolloffExtractor([feature, hop_length]) Extracts the roll-off frequency from audio using the Librosa library.
STFTAudioExtractor([frame_size, hop_size, …]) Short-time Fourier Transform extractor.
TempogramExtractor([win_length]) Extracts a tempogram from audio using the Librosa library.
TonnetzExtractor([feature, hop_length]) Extracts the tonal centroids (tonnetz) from audio using the Librosa library.
ZeroCrossingRateExtractor([feature, hop_length]) Extracts the zero-crossing rate of audio using the Librosa library.

Image feature extractors

BrightnessExtractor([name]) Gets the average luminosity of the pixels in the image
ClarifaiAPIImageExtractor([api_key, model, …]) Uses the Clarifai API to extract tags of images.
FaceRecognitionFaceEncodingsExtractor(…) Uses the face_recognition package to extract a 128-dimensional encoding for every face detected in an image.
FaceRecognitionFaceLandmarksExtractor(…) Uses the face_recognition package to extract the locations of named features of faces in the image.
FaceRecognitionFaceLocationsExtractor(…) Uses the face_recognition package to extract bounding boxes for all faces in an image.
GoogleVisionAPIFaceExtractor([…]) Identifies faces in images using the Google Cloud Vision API.
GoogleVisionAPILabelExtractor([…]) Labels objects in images using the Google Cloud Vision API.
GoogleVisionAPIPropertyExtractor([…]) Extracts image properties using the Google Cloud Vision API.
GoogleVisionAPISafeSearchExtractor([…]) Extracts safe search detection using the Google Cloud Vision API.
GoogleVisionAPIWebEntitiesExtractor([…]) Extracts web entities using the Google Cloud Vision API.
IndicoAPIImageExtractor([api_key, models, …]) Uses to Indico API to extract features from Images, such as facial emotion recognition or content filtering.
MicrosoftAPIFaceExtractor([face_id, …]) Extracts face features (location, emotion, accessories, etc.).
MicrosoftAPIFaceEmotionExtractor([face_id, …]) Extracts facial emotions from images using the Microsoft API
MicrosoftVisionAPIExtractor([features, …]) Base MicrosoftVisionAPIExtractor class.
MicrosoftVisionAPITagExtractor([…]) Extracts image tags using the Microsoft API
MicrosoftVisionAPICategoryExtractor([…]) Extracts image categories using the Microsoft API
MicrosoftVisionAPIImageTypeExtractor([…]) Extracts image types (clipart, etc.) using the Microsoft API
MicrosoftVisionAPIColorExtractor([…]) Extracts image color attributes using the Microsoft API
MicrosoftVisionAPIAdultExtractor([…]) Extracts the presence of adult content using the Microsoft API
SaliencyExtractor([name]) Determines the saliency of the image using Itti & Koch (1998) algorithm
SharpnessExtractor([name]) Gets the degree of blur/sharpness of the image
VibranceExtractor([name]) Gets the variance of color channels of the image

Text feature extractors

ComplexTextExtractor([name]) Base ComplexTextStim Extractor class; all subclasses can only be applied to ComplexTextStim instance.
DictionaryExtractor(dictionary[, variables, …]) A generic dictionary-based extractor that supports extraction of arbitrary features contained in a lookup table.
IndicoAPITextExtractor([api_key, models, …]) Uses to Indico API to extract features from text, such as sentiment extraction.
LengthExtractor([name]) Extracts the length of the text in characters.
NumUniqueWordsExtractor([tokenizer]) Extracts the number of unique words used in the text.
PartOfSpeechExtractor([batch_size]) Tags parts of speech in text with nltk.
PredefinedDictionaryExtractor(variables[, …]) A generic Extractor that maps words onto values via one or more pre-defined dictionaries accessed via the web.
TextVectorizerExtractor([vectorizer]) Uses a scikit-learn Vectorizer to extract bag-of-features from text.
VADERSentimentExtractor() Uses nltk’s VADER lexicon to extract (0.0-1.0) values for the positve, neutral, and negative sentiment of a TextStim.
WordEmbeddingExtractor(embedding_file[, …]) An extractor that uses a word embedding file to look up embedding vectors for text.

Video feature extractors

FarnebackOpticalFlowExtractor([pyr_scale, …]) Extracts total amount of dense optical flow between every pair of video frames.

Note that, in practice, the number of features one can extract using the above classes is extremely large, because many of these Extractors return open-ended feature sets that are determined by the contents of the input Stim and/or the specified initialization arguments. For example, most of the image-labeling Extractors that rely on deep learning-based services (e.g., GoogleVisionAPILabelExtractor and ClarifaiAPIImageExtractor) will return feature information for any of the top N objects detected in the image. And the PredefinedDictionaryExtractor provides a standardized interface to a large number of online word lookup dictionaries (e.g., word norms for written frequency, age-of-acquisition, emotionality ratings, etc.).

Working with Extractor results

ExtractorResult classes differ from other Transformers in an important way: they return feature data rather than Stim objects. Pliers imposes a standardized representation on these results; in particular, calling transform on any Extractor returns an aptly-named object of class ExtractorResult. This object contains all kinds of useful internal references and logged data; however, it can also be easily converted to a pandas DataFrame. There’s much more to say about feature extraction results in pliers, but to keep things focused, we’ll say it in a separate Results section rather than here.


Converters, as their name suggests, convert Stim classes from one type to another. For example, the IBMSpeechAPIConverter, which is a subclass of AudioToTextConverter, takes an AudioStim as input, queries IBM’s Watson speech-to-text API, and returns a transcription of the audio as a ComplexTextStim object. Most Converter classes have sensible names that clearly indicate what they do, but to prevent any ambiguity (and support type-checking), every concrete Converter class must define _input_type and _output_type properties that indicate what Stim classes they take and return as input and output, respectively.

Implicit Stim conversion

Although Converters play a critical role in pliers, they usually don’t need to be invoked explicitly by users, as pliers can usually figure out what conversions must be performed and carry them out implicitly. For example, suppose we want to run the STFTAudioExtractor—which computes the short-time Fourier transform on an audio clip and returns its power spectrum—on the audio track of a movie clip. We don’t need to explicitly convert the VideoStim to an AudioStim, because pliers is clever enough to determine that it can get the appropriate input for the STFTAudioExtractor by executing the VideoToAudioConverter. In practice, then, the following two snippets produce identical results:

from pliers.extractors import STFTAudioExtractor
from pliers.stimuli import VideoStim
video = VideoStim('my_movie.mp4')

# Option A: explicit conversion
from pliers.converters import VideoToAudioConverter
conv = VideoToAudioConverter()
audio = conv.transform(video)
ext = STFTAudioExtractor(freq_bins=10)
result = ext.transform(audio)

# Option B: implicit conversion
ext = STFTAudioExtractor(freq_bins=10)
result = ext.transform(video)

Because pliers contains a number of “multistep” Converter classes, which chain together multiple standard Converters, implicit Stim conversion will typically work not only for a single conversion, but also for a whole series of them. For example, if you feed a video file to a LengthExtractor (which just counts the number of characters in each TextStim’s text), pliers will use the built-in VideoToTextConverter class to transform your VideoStim into a TextStim, and everything should work smoothly in most cases.

I say “most” cases, because there are two important gotchas to be aware of when relying on implicit conversion. First, sometimes there’s an inherent ambiguity about what trajectory a given stimulus should take through converter space; in such cases, the default conversions pliers performs may not line up with your expectations. For example, a VideoStim can be converted to a TextStim either by (a) extracting the audio track from the video and then transcribing into text via a speech recognition service, or (b) extracting the video frames from the video and then attempting to detect any text labels within each image. Because pliers has no way of knowing which of these you’re trying to accomplish, it will default to the first. The upshot is that if you think there’s any chance of ambiguity in the conversion process, it’s probably a good idea to explicitly chain the Converter steps (you can do this very easily using the Graph interface discussed separately). The explicit approach also provides additional precision in that you may want to initialize a particular Converter with non-default arguments, and/or specify exactly which of several candidate Converter classes to use (e.g., pliers defaults to performing speech-to-text conversion via the IBM Watson API, but also provides alternative support for the Wit.AI, and Google Cloud Speech APIs services).

Package-wide conversion defaults

Alternatively, you can set the default Converter(s) to use for any implicit Stim conversion at a package-wide level, via the config.default_converters attribute. By default, this is something like:

default_converters = {
    'AudioStim->TextStim': ('IBMSpeechAPIConverter', 'WitTranscriptionConverter'),
    'ImageStim->TextStim': ('GoogleVisionAPITextConverter', 'TesseractConverter')

Here, each entry in the default_converters dictionary lists the Converter(s) to use, in order of preference. For example, the above indicates that any conversion between ImageStim and TextStim should first try to use the GoogleVisionAPITextConverter, and then, if that fails (e.g., because the user has no Google Cloud Vision API key set up), fall back on the TesseractConverter. If all selections specified in the config fail, pliers will still try to use any matching Converters it finds, but you’ll lose the ability to control the order of selection.

Second, because many Converters call API-based services, if you’re going to rely on implicit conversion, you should make sure that any API keys you might need are properly set up as environment variables in your local environment, seeing as you’re not going to be able to pass those keys to the Converter as initialization arguments. For example, by default, pliers uses the IBM Watson API for speech-to-text conversion (i.e., when converting an AudioStim to a ComplexTextStim). But since you won’t necessarily know this ahead of time, you won’t be able to initialize the Converter with the correct credentials–i.e., by calling IBMSpeechAPIConverter(username=’my_username’, password=’my_password’). Instead, the Converter will get initialized without any arguments (IBMSpeechAPIConverter()), which means the initialization logic will immediately proceed to look for IBM_USERNAME and IBM_PASSWORD variables in the environment, and will raise an exception if at least one of these variables is missing. So make sure as many API keys as possible are appropriately set in the environment. You can read more about this in the API keys section.

List of Converter classes

Pliers currently implements the following Converter classes:

ComplexTextIterator([name]) Iterates elements in a ComplexTextStim as TextStims.
IBMSpeechAPIConverter([username, password, …]) Uses the IBM Watson Text to Speech API to run speech-to-text transcription on an audio file.
GoogleSpeechAPIConverter([language_code, …]) Uses the Google Speech API to do speech-to-text transcription.
GoogleVisionAPITextConverter([…]) Detects text within images using the Google Cloud Vision API.
MicrosoftAPITextConverter([language, …]) Detects text within images using the Microsoft Vision API.
TesseractConverter([name]) Uses the Tesseract library to extract text from images.
VideoFrameCollectionIterator([name]) Iterates frames in a DerivedVideoStim as ImageStims.
VideoFrameIterator([name]) Iterates frames in a VideoStim as ImageStims.
VideoToAudioConverter([name]) Convert a VideoStim to an AudioStim by extracting the audio track using moviepy.
VideoToComplexTextConverter([steps]) Converts a VideoStim directly to a ComplexTextStim.
VideoToTextConverter([steps]) Converts a VideoStim directly to a TextStim.
WitTranscriptionConverter([api_key, rate_limit]) Speech-to-text transcription via the API.


A Filter is a kind of Transformer that returns an object of the same Stim class as its input. Filters can be used for tasks like image or audio filtering, text tokenization or sanitization, and many other things. The defining feature of a Filter class is simply that it must return a Stim of the same type as the input passed to the .transform() method (e.g., passing in an ImageStim and getting back another, modified, ImageStim).

List of Filter classes

Pliers currently implements the following Filter classes:

AudioTrimmingFilter([start, end, frames, …])
FrameSamplingFilter([every, hertz, top_n]) Samples frames from video stimuli, to improve efficiency.
ImageCroppingFilter([box]) Crops an image.
LowerCasingFilter([name]) Lower cases the text in a TextStim.
PillowImageFilter([image_filter]) Uses the ImageFilter module from PIL to run a pre-defined image enhancement filter on an ImageStim.
PunctuationRemovalFilter([name]) Removes punctuation from a TextStim.
TemporalTrimmingFilter([start, end, frames, …]) Temporally trims the contents of the audio stimulus using the provided start and end points.
TokenizingFilter([tokenizer]) Tokenizes a TextStim into several word TextStims.
TokenRemovalFilter([tokens, language]) Removes tokens (e.g., stopwords, common words, punctuation) from a TextStim.
VideoTrimmingFilter([start, end, frames, …])
WordStemmingFilter([stemmer, tokenize]) Nltk-based word stemming Filter.

Iterable-aware transformations

A useful feature of the Transformer API is that it’s inherently iterable-aware: every pliers Transformer (including all Extractors, Converters, and Filters) can be passed an iterable (specifically, a list, tuple, or generator) of Stim objects rather than just a single Stim. The transformation will then be applied independently to each Stim.