Quickstart

The fastest way to learn how pliers works is to work through a few short examples. In this section, we’ll demonstrate how pliers can be used to quickly tackle three different feature extraction challenges. We start with very simple examples, and gradually scale up in complexity.

An executable Jupyter Notebook version of this document can be found in the /examples folder of the GitHub repo.

Face detection

This first example uses the face_recognition package’s location extraction method to detect the location of Barack Obama’s face within a single image. The tools used to do this are completely local (i.e., the image isn’t sent to an external API).

We output the result as a pandas DataFrame; the ‘face_locations’ column contains the coordinates of the bounding box in CSS format (i.e., top, right, bottom, and left edges).

from pliers.extractors import FaceRecognitionFaceLocationsExtractor
from os.path import join
from pliers.tests.utils import get_test_data_path

# A picture of Barack Obama
image = join(get_test_data_path(), 'image', 'obama.jpg')

# Initialize Extractor
ext = FaceRecognitionFaceLocationsExtractor()

# Apply Extractor to image
result = ext.transform(image)

result.to_df()
onset order duration object_id face_locations
0 NaN NaN NaN 0 (142, 349, 409, 82)

Face detection with multiple inputs

What if we want to run the face detector on multiple images? Naively, we could of course just loop over input images and apply the Extractor to each one. But pliers makes this even easier for us, by natively accepting iterables as inputs. The following code is almost identical to the above snippet. The only notable difference is that, because the result we get back is now also a list (because the features extracted from each image are stored separately), we need to explicitly combine the results using the merge_results utility.

from pliers.extractors import FaceRecognitionFaceLocationsExtractor, merge_results

images = ['apple.jpg', 'obama.jpg', 'thai_people.jpg']
images = [join(get_test_data_path(), 'image', img) for img in images]

ext = FaceRecognitionFaceLocationsExtractor()
results = ext.transform(images)
df = merge_results(results)
df
source_file onset class filename stim_name history duration order object_id FaceRecognitionFaceLocationsExtractor#face_locations
0 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... NaN ImageStim /Users/tal/Dropbox/Code/pliers/pliers/tests/da... obama.jpg NaN NaN 0 (142, 349, 409, 82)
1 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... NaN ImageStim /Users/tal/Dropbox/Code/pliers/pliers/tests/da... thai_people.jpg NaN NaN 0 (236, 862, 325, 772)
2 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... NaN ImageStim /Users/tal/Dropbox/Code/pliers/pliers/tests/da... thai_people.jpg NaN NaN 1 (104, 581, 211, 474)
3 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... NaN ImageStim /Users/tal/Dropbox/Code/pliers/pliers/tests/da... thai_people.jpg NaN NaN 2 (365, 782, 454, 693)
4 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... NaN ImageStim /Users/tal/Dropbox/Code/pliers/pliers/tests/da... thai_people.jpg NaN NaN 3 (265, 444, 355, 354)

Note how the merged pandas DataFrame contains 5 rows, even though there were only 3 input images. The reason is that there are 5 detected faces across the inputs (0 in the first image, 1 in the second, and 4 in the third). You can discern the original sources from the stim_name and source_file columns.

Face detection using a remote API

The above examples use an entirely local package (face_recognition) for feature extraction. In this next example, we use the Google Cloud Vision API to extract various face-related attributes from an image of Barack Obama. The syntax is identical to the first example, save for the use of the GoogleVisionAPIFaceExtractor instead of the FaceRecognitionFaceLocationsExtractor. Note, however, that successful execution of this code requires you to have a GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to your Google credentials JSON file. See the documentation for more details.

from pliers.extractors import GoogleVisionAPIFaceExtractor

ext = GoogleVisionAPIFaceExtractor()
image = join(get_test_data_path(), 'image', 'obama.jpg')
result = ext.transform(image)

result.to_df(format='long', timing=False, object_id=False)
feature value
0 face1_boundingPoly_vertex1_x 34
1 face1_boundingPoly_vertex1_y 3
2 face1_boundingPoly_vertex2_x 413
3 face1_boundingPoly_vertex2_y 3
4 face1_boundingPoly_vertex3_x 413
5 face1_boundingPoly_vertex3_y 444
6 face1_boundingPoly_vertex4_x 34
7 face1_boundingPoly_vertex4_y 444
8 face1_fdBoundingPoly_vertex1_x 81
9 face1_fdBoundingPoly_vertex1_y 112
10 face1_fdBoundingPoly_vertex2_x 367
11 face1_fdBoundingPoly_vertex2_y 112
12 face1_fdBoundingPoly_vertex3_x 367
13 face1_fdBoundingPoly_vertex3_y 397
14 face1_fdBoundingPoly_vertex4_x 81
15 face1_fdBoundingPoly_vertex4_y 397
16 face1_landmark_LEFT_EYE_x 165.82545
17 face1_landmark_LEFT_EYE_y 209.29224
18 face1_landmark_LEFT_EYE_z -0.0012580488
19 face1_landmark_RIGHT_EYE_x 277.2751
20 face1_landmark_RIGHT_EYE_y 200.76282
21 face1_landmark_RIGHT_EYE_z -2.2834022
22 face1_landmark_LEFT_OF_LEFT_EYEBROW_x 124.120514
23 face1_landmark_LEFT_OF_LEFT_EYEBROW_y 183.2301
24 face1_landmark_LEFT_OF_LEFT_EYEBROW_z 10.437931
25 face1_landmark_RIGHT_OF_LEFT_EYEBROW_x 191.6638
26 face1_landmark_RIGHT_OF_LEFT_EYEBROW_y 184.7009
27 face1_landmark_RIGHT_OF_LEFT_EYEBROW_z -23.860262
28 face1_landmark_LEFT_OF_RIGHT_EYEBROW_x 246.78976
29 face1_landmark_LEFT_OF_RIGHT_EYEBROW_y 180.80664
... ... ...
100 face1_landmark_LEFT_EAR_TRAGION_x 94.670586
101 face1_landmark_LEFT_EAR_TRAGION_y 261.28238
102 face1_landmark_LEFT_EAR_TRAGION_z 144.7621
103 face1_landmark_RIGHT_EAR_TRAGION_x 354.20724
104 face1_landmark_RIGHT_EAR_TRAGION_y 254.42862
105 face1_landmark_RIGHT_EAR_TRAGION_z 139.51318
106 face1_landmark_FOREHEAD_GLABELLA_x 218.83662
107 face1_landmark_FOREHEAD_GLABELLA_y 179.9332
108 face1_landmark_FOREHEAD_GLABELLA_z -29.149652
109 face1_landmark_CHIN_GNATHION_x 225.09085
110 face1_landmark_CHIN_GNATHION_y 404.05176
111 face1_landmark_CHIN_GNATHION_z -0.870588
112 face1_landmark_CHIN_LEFT_GONION_x 108.6293
113 face1_landmark_CHIN_LEFT_GONION_y 336.2217
114 face1_landmark_CHIN_LEFT_GONION_z 100.71832
115 face1_landmark_CHIN_RIGHT_GONION_x 342.96274
116 face1_landmark_CHIN_RIGHT_GONION_y 329.56253
117 face1_landmark_CHIN_RIGHT_GONION_z 96.03735
118 face1_rollAngle -1.6782061
119 face1_panAngle -1.1388631
120 face1_tiltAngle -2.0583308
121 face1_face_detectionConfidence 0.999946
122 face1_face_landmarkingConfidence 0.84057003
123 face1_joyLikelihood VERY_LIKELY
124 face1_sorrowLikelihood VERY_UNLIKELY
125 face1_angerLikelihood VERY_UNLIKELY
126 face1_surpriseLikelihood VERY_UNLIKELY
127 face1_underExposedLikelihood VERY_UNLIKELY
128 face1_blurredLikelihood VERY_UNLIKELY
129 face1_headwearLikelihood VERY_UNLIKELY

130 rows × 2 columns

Notice that the output in this case contains many more features. That’s because the Google face recognition service gives us back a lot more information than just the location of the face within the image. Also, the example illustrates our ability to control the format of the output, by returning the data in “long” format, and suppressing output of columns that are uninformative in this context.

Sentiment analysis on text

Here we use the VADER sentiment analyzer (Hutto & Gilbert, 2014) implemented in the nltk package to extract sentiment for (a) a coherent block of text, and (b) each word in the text separately. This example also introduces the Stim hierarchy of objects explicitly, whereas the initialization of Stim objects was implicit in the previous examples.

Treat text as a single block

from pliers.stimuli import TextStim, ComplexTextStim
from pliers.extractors import VADERSentimentExtractor, merge_results

raw = """We're not claiming that VADER is a very good sentiment analysis tool.
Sentiment analysis is a really, really difficult problem. But just to make a
point, here are some clearly valenced words: disgusting, wonderful, poop,
sunshine, smile."""

# First example: we treat all text as part of a single token
text = TextStim(text=raw)

ext = VADERSentimentExtractor()
results = ext.transform(text)
results.to_df()
onset order duration object_id sentiment_neg sentiment_neu sentiment_pos sentiment_compound
0 NaN NaN NaN 0 0.19 0.51 0.3 0.6787

Analyze each word individually

# Second example: we construct a ComplexTextStim, which will
# cause each word to be represented as a separate TextStim.
text = ComplexTextStim(text=raw)

ext = VADERSentimentExtractor()
results = ext.transform(text)

# Because results is a list of ExtractorResult objects
# (one per word), we need to merge the results explicitly.
df = merge_results(results, object_id=False)
df.head(10)
source_file onset class filename stim_name history duration order VADERSentimentExtractor#sentiment_compound VADERSentimentExtractor#sentiment_neg VADERSentimentExtractor#sentiment_neu VADERSentimentExtractor#sentiment_pos
0 NaN 0.0 TextStim NaN text[We] ComplexTextStim->ComplexTextIterator/TextStim NaN 0 0.0000 0.0 1.0 0.0
1 NaN 0.0 TextStim NaN text['re] ComplexTextStim->ComplexTextIterator/TextStim NaN 1 0.0000 0.0 1.0 0.0
2 NaN 0.0 TextStim NaN text[not] ComplexTextStim->ComplexTextIterator/TextStim NaN 2 0.0000 0.0 1.0 0.0
3 NaN 0.0 TextStim NaN text[claiming] ComplexTextStim->ComplexTextIterator/TextStim NaN 3 0.0000 0.0 1.0 0.0
4 NaN 0.0 TextStim NaN text[that] ComplexTextStim->ComplexTextIterator/TextStim NaN 4 0.0000 0.0 1.0 0.0
5 NaN 0.0 TextStim NaN text[VADER] ComplexTextStim->ComplexTextIterator/TextStim NaN 5 0.0000 0.0 1.0 0.0
6 NaN 0.0 TextStim NaN text[is] ComplexTextStim->ComplexTextIterator/TextStim NaN 6 0.0000 0.0 1.0 0.0
7 NaN 0.0 TextStim NaN text[a] ComplexTextStim->ComplexTextIterator/TextStim NaN 7 0.0000 0.0 0.0 0.0
8 NaN 0.0 TextStim NaN text[very] ComplexTextStim->ComplexTextIterator/TextStim NaN 8 0.0000 0.0 1.0 0.0
9 NaN 0.0 TextStim NaN text[good] ComplexTextStim->ComplexTextIterator/TextStim NaN 9 0.4404 0.0 0.0 1.0

Extract chromagram from an audio clip

We have an audio clip, and we’d like to compute its chromagram (i.e., to extract the normalized energy in each of the 12 pitch classes). This is trivial thanks to pliers’ support for the librosa package, which contains all kinds of useful functions for spectral feature extraction.

from pliers.extractors import ChromaSTFTExtractor

audio = join(get_test_data_path(), 'audio', 'barber.wav')
# Audio is sampled at 11KHz; let's compute power in 1 sec bins
ext = ChromaSTFTExtractor(hop_length=11025)
result = ext.transform(audio).to_df()
result.head(10)
onset order duration object_id chroma_0 chroma_1 chroma_2 chroma_3 chroma_4 chroma_5 chroma_6 chroma_7 chroma_8 chroma_9 chroma_10 chroma_11
0 0.0 NaN 1.0 0 0.893229 0.580649 0.537203 0.781329 0.791074 0.450180 0.547222 0.344074 0.396035 0.310631 0.338300 1.000000
1 1.0 NaN 1.0 0 0.294194 0.197414 0.183005 0.218851 0.393326 0.308403 0.306165 0.470528 1.000000 0.352208 0.299830 0.551487
2 2.0 NaN 1.0 0 0.434900 0.235230 0.210706 0.299252 0.480551 0.393670 0.380633 0.400774 1.000000 0.747835 0.565902 0.905888
3 3.0 NaN 1.0 0 0.584723 1.000000 0.292496 0.280725 0.126438 0.141413 0.095718 0.051614 0.169491 0.159829 0.104278 0.152245
4 4.0 NaN 1.0 0 0.330675 0.093160 0.050093 0.110299 0.124181 0.195670 0.176633 0.154360 0.799665 1.000000 0.324705 0.299411
5 5.0 NaN 1.0 0 0.163303 0.166029 0.137458 0.674934 0.307667 0.444728 1.000000 0.363117 0.051563 0.056137 0.257512 0.311271
6 6.0 NaN 1.0 0 0.429001 0.576284 0.477286 0.629205 1.000000 0.683207 0.520680 0.550905 0.463083 0.136868 0.139903 0.516497
7 7.0 NaN 1.0 0 0.153344 0.061214 0.071127 0.156032 1.000000 0.266781 0.061097 0.100614 0.277248 0.080686 0.102179 0.560139
8 8.0 NaN 1.0 0 1.000000 0.179003 0.003033 0.002940 0.007769 0.001853 0.012441 0.065445 0.013986 0.002070 0.008418 0.250575
9 9.0 NaN 1.0 0 1.000000 0.195387 0.021611 0.028680 0.019289 0.018033 0.054944 0.047623 0.011615 0.031029 0.274826 0.840266
# And a plot of the chromagram...
plt.imshow(result.iloc[:, 4:].values.T, aspect='auto')
_images/chromagram.png

Sentiment analysis on speech transcribed from audio

So far all of our examples involve the application of a feature extractor to an input of the expected modality (e.g., a text sentiment analyzer applied to text, a face recognizer applied to an image, etc.). But we often want to extract features that require us to first convert our input to a different modality. Let’s see how pliers handles this kind of situation.

Say we have an audio clip. We want to run sentiment analysis on the audio. This requires us to first transcribe any speech contained in the audio. As it turns out, we don’t have to do anything special here; we can just feed an audio clip directly to an Extractor class that expects a text input (e.g., the VADER sentiment analyzer we used earlier). How? Magic! Pliers is smart enough to implicitly convert the audio clip to a ComplexTextStim internally. By default, it does this using IBM’s Watson speech transcription API. Which means you’ll need to make sure your API key is set up properly in order for the code below to work. (But if you’d rather use, say, Google’s Cloud Speech API, you could easily configure pliers to make that the default for audio-to-text conversion.)

audio = join(get_test_data_path(), 'audio', 'homer.wav')
ext = VADERSentimentExtractor()
result = ext.transform(audio)
df = merge_results(result, object_id=False)
df
source_file onset class filename stim_name history duration order VADERSentimentExtractor#sentiment_compound VADERSentimentExtractor#sentiment_neg VADERSentimentExtractor#sentiment_neu VADERSentimentExtractor#sentiment_pos
0 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 0.04 TextStim NaN text[engage] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.46 0 0.34 0.0 0.0 1.0
1 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 0.50 TextStim NaN text[because] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.37 1 0.00 0.0 1.0 0.0
2 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 0.87 TextStim NaN text[we] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.22 2 0.00 0.0 1.0 0.0
3 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 1.09 TextStim NaN text[obey] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.51 3 0.00 0.0 1.0 0.0
4 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 1.60 TextStim NaN text[the] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.16 4 0.00 0.0 1.0 0.0
5 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 1.76 TextStim NaN text[laws] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.40 5 0.00 0.0 1.0 0.0
6 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 2.16 TextStim NaN text[of] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.14 6 0.00 0.0 1.0 0.0
7 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 2.30 TextStim NaN text[thermodynamics] AudioStim->IBMSpeechAPIConverter/ComplexTextSt... 0.99 7 0.00 0.0 1.0 0.0

Object recognition on selectively sampled video frames

A common scenario when analyzing video is to want to apply some kind of feature extraction tool to individual video frames (i.e., still images). Often, there’s little to be gained by analyzing every single frame, so we want to sample frames with some specified frequency. The following example illustrates how easily this can be accomplished in pliers. It also demonstrates the concept of chaining multiple Transformer objects. We first convert a video to a series of images, and then apply an object-detection Extractor to each image.

Note, as with other examples above, that the ClarifaiAPIImageExtractor wraps the Clarifai object recognition API, so you’ll need to have an API key set up appropriately (if you don’t have an API key, and don’t want to set one up, you can replace ClarifaiAPIImageExtractor with TensorFlowInceptionV3Extractor to get similar, though not quite as accurate, results).

from pliers.filters import FrameSamplingFilter
from pliers.extractors import ClarifaiAPIImageExtractor, merge_results

video = join(get_test_data_path(), 'video', 'small.mp4')

# Sample 2 frames per second
sampler = FrameSamplingFilter(hertz=2)
frames = sampler.transform(video)

ext = ClarifaiAPIImageExtractor()
results = ext.transform(frames)
df = merge_results(results, )
df
source_file onset class filename stim_name history duration order object_id ClarifaiAPIImageExtractor#Lego ... ClarifaiAPIImageExtractor#power ClarifaiAPIImageExtractor#precision ClarifaiAPIImageExtractor#production ClarifaiAPIImageExtractor#research ClarifaiAPIImageExtractor#robot ClarifaiAPIImageExtractor#science ClarifaiAPIImageExtractor#still life ClarifaiAPIImageExtractor#studio ClarifaiAPIImageExtractor#technology ClarifaiAPIImageExtractor#toy
0 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 0.0 VideoFrameStim NaN frame[0] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.949353 ... NaN 0.767964 NaN NaN 0.892890 0.823121 0.898390 0.714794 0.946736 0.900628
1 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 0.5 VideoFrameStim NaN frame[15] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.948389 ... NaN 0.743388 NaN NaN 0.887668 0.826262 0.900226 0.747545 0.951705 0.892195
2 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 1.0 VideoFrameStim NaN frame[30] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.951566 ... NaN 0.738823 NaN NaN 0.885989 0.801925 0.908438 0.756304 0.948202 0.903330
3 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 1.5 VideoFrameStim NaN frame[45] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.951050 ... NaN 0.794678 0.710889 0.749307 0.893252 0.892987 0.877005 NaN 0.962567 0.857956
4 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 2.0 VideoFrameStim NaN frame[60] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.872721 ... 0.756543 0.802734 NaN NaN 0.866742 0.816107 0.802523 NaN 0.956920 0.803250
5 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 2.5 VideoFrameStim NaN frame[75] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.930966 ... NaN 0.763779 NaN NaN 0.841595 0.755196 0.885707 0.713024 0.937848 0.876500
6 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 3.0 VideoFrameStim NaN frame[90] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.866936 ... 0.749151 0.749939 NaN NaN 0.862391 0.824693 0.806569 NaN 0.948547 0.793848
7 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 3.5 VideoFrameStim NaN frame[105] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.957496 ... NaN 0.775053 NaN NaN 0.895434 0.839599 0.890773 0.720677 0.949031 0.898136
8 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 4.0 VideoFrameStim NaN frame[120] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.954910 ... NaN 0.785069 NaN NaN 0.888534 0.833464 0.895954 0.752757 0.948506 0.897712
9 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 4.5 VideoFrameStim NaN frame[135] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.957653 ... NaN 0.796410 0.711184 NaN 0.897311 0.854389 0.899367 0.726466 0.951222 0.893269
10 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 5.0 VideoFrameStim NaN frame[150] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.50 NaN 0 0.954066 ... NaN 0.793047 0.717981 NaN 0.904960 0.861293 0.905260 0.754906 0.956006 0.894970
11 /Users/tal/Dropbox/Code/pliers/pliers/tests/da... 5.5 VideoFrameStim NaN frame[165] VideoStim->FrameSamplingFilter/VideoFrameColle... 0.07 NaN 0 0.932649 ... NaN 0.818984 0.758780 NaN 0.876721 0.882386 0.887411 NaN 0.958058 0.872935

12 rows × 41 columns

The resulting data frame has 41 columns (!), most of which are individual object labels like ‘lego’, ‘toy’, etc., selected for us by the Clarifai API on the basis of the content detected in the video (we could have also forced the API to return values for specific labels).

Multiple extractors

So far we’ve only used a single Extractor at a time to extract information from our inputs. Now we’ll start to get a little more ambitious. Let’s say we have a video that we want to extract lots of different features from–in multiple modalities. Specifically, we want to extract all of the following:

  • Object recognition and face detection applied to every 10th frame of the video;
  • A second-by-second estimate of spectral power in the speech frequency band;
  • A word-by-word speech transcript;
  • Estimates of several lexical properties (e.g., word length, written word frequency, etc.) for every word in the transcript;
  • Sentiment analysis applied to the entire transcript.

We’ve already seen some of these features extracted individually, but now we’re going to extract all of them at once. As it turns out, the code looks almost exactly like a concatenated version of several of our examples above.

from pliers.tests.utils import get_test_data_path
from os.path import join
from pliers.filters import FrameSamplingFilter
from pliers.converters import GoogleSpeechAPIConverter
from pliers.extractors import (ClarifaiAPIImageExtractor, GoogleVisionAPIFaceExtractor,
                               ComplexTextExtractor, PredefinedDictionaryExtractor,
                               STFTAudioExtractor, VADERSentimentExtractor,
                               merge_results)

video = join(get_test_data_path(), 'video', 'obama_speech.mp4')

# Store all the returned features in a single list (nested lists
# are fine, the merge_results function will flatten everything)
features = []

# Sample video frames and apply the image-based extractors
sampler = FrameSamplingFilter(every=10)
frames = sampler.transform(video)

obj_ext = ClarifaiAPIImageExtractor()
obj_features = obj_ext.transform(frames)
features.append(obj_features)

face_ext = GoogleVisionAPIFaceExtractor()
face_features = face_ext.transform(frames)
features.append(face_features)

# Power in speech frequencies
stft_ext = STFTAudioExtractor(freq_bins=[(100, 300)])
speech_features = stft_ext.transform(video)
features.append(speech_features)

# Explicitly transcribe the video--we could also skip this step
# and it would be done implicitly, but this way we can specify
# that we want to use the Google Cloud Speech API rather than
# the package default (IBM Watson)
text_conv = GoogleSpeechAPIConverter()
text = text_conv.transform(video)

# Text-based features
text_ext = ComplexTextExtractor()
text_features = text_ext.transform(text)
features.append(text_features)

dict_ext = PredefinedDictionaryExtractor(
    variables=['affect/V.Mean.Sum', 'subtlexusfrequency/Lg10WF'])
norm_features = dict_ext.transform(text)
features.append(norm_features)

sent_ext = VADERSentimentExtractor()
sent_features = sent_ext.transform(text)
features.append(sent_features)

# Ask for data in 'long' format, and code extractor name as a separate
# column instead of prepending it to feature names.
df = merge_results(features, format='long', extractor_names='column')

# Output rows in a sensible order
df.sort_values(['extractor', 'feature', 'onset', 'duration', 'order']).head(10)
object_id onset order duration feature value extractor stim_name class filename history source_file
2 0 0.000000 NaN 0.833333 administration 0.970786 ClarifaiAPIImageExtractor frame[0] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
296 0 0.833333 NaN 0.833333 administration 0.976996 ClarifaiAPIImageExtractor frame[10] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
592 0 1.666667 NaN 0.833333 administration 0.972223 ClarifaiAPIImageExtractor frame[20] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
887 0 2.500000 NaN 0.833333 administration 0.98288 ClarifaiAPIImageExtractor frame[30] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
1198 0 3.333333 NaN 0.833333 administration 0.94764 ClarifaiAPIImageExtractor frame[40] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
1492 0 4.166667 NaN 0.833333 administration 0.952409 ClarifaiAPIImageExtractor frame[50] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
1795 0 5.000000 NaN 0.833333 administration 0.951445 ClarifaiAPIImageExtractor frame[60] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
2096 0 5.833333 NaN 0.833333 administration 0.954552 ClarifaiAPIImageExtractor frame[70] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
2392 0 6.666667 NaN 0.833333 administration 0.953084 ClarifaiAPIImageExtractor frame[80] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
2695 0 7.500000 NaN 0.833333 administration 0.947371 ClarifaiAPIImageExtractor frame[90] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...

The resulting pandas DataFrame is quite large; even for our 9-second video, we get back over 3,000 rows! Importantly, though, the DataFrame contains all kinds of metadata that makes it easy to filter and sort the results in whatever way we might want to (e.g., we can filter on the extractor, stim class, onset or duration, etc.).

Multiple extractors with a Graph

The above code listing is already pretty terse, and has the advantage of being explicit about every step. But if it’s brevity we’re after, pliers is happy to oblige us. The package includes a Graph abstraction that allows us to load an arbitrary number of Transformer into a graph, and execute them all in one shot. The code below is functionally identical to the last example, but only about the third of the length. It also requires fewer imports, since Transformer objects that we don’t need to initialize with custom arguments can be passed to the Graph as strings.

The upshot of all this is that, in just a few lines of Python code, we’re abvle to extract a broad range of multimodal features from video, image, audio or text inputs, using state-of-the-art tools and services!

from pliers.tests.utils import get_test_data_path
from os.path import join
from pliers.graph import Graph
from pliers.filters import FrameSamplingFilter
from pliers.extractors import (PredefinedDictionaryExtractor, STFTAudioExtractor,
                               merge_results)


video = join(get_test_data_path(), 'video', 'obama_speech.mp4')

# Define nodes
nodes = [
    (FrameSamplingFilter(every=10),
         ['ClarifaiAPIImageExtractor', 'GoogleVisionAPIFaceExtractor']),
    (STFTAudioExtractor(freq_bins=[(100, 300)])),
    ('GoogleSpeechAPIConverter',
         ['ComplexTextExtractor',
          PredefinedDictionaryExtractor(['affect/V.Mean.Sum',
                                         'subtlexusfrequency/Lg10WF']),
         'VADERSentimentExtractor'])
]

# Initialize and execute Graph
g = Graph(nodes)

# Arguments to merge_results can be passed in here
df = g.transform(video, format='long', extractor_names='column')

# Output rows in a sensible order
df.sort_values(['extractor', 'feature', 'onset', 'duration', 'order']).head(10)
object_id onset order duration feature value extractor stim_name class filename history source_file
2 0 0.000000 NaN 0.833333 administration 0.970786 ClarifaiAPIImageExtractor frame[0] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
296 0 0.833333 NaN 0.833333 administration 0.976996 ClarifaiAPIImageExtractor frame[10] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
592 0 1.666667 NaN 0.833333 administration 0.972223 ClarifaiAPIImageExtractor frame[20] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
887 0 2.500000 NaN 0.833333 administration 0.98288 ClarifaiAPIImageExtractor frame[30] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
1198 0 3.333333 NaN 0.833333 administration 0.94764 ClarifaiAPIImageExtractor frame[40] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
1492 0 4.166667 NaN 0.833333 administration 0.952409 ClarifaiAPIImageExtractor frame[50] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
1795 0 5.000000 NaN 0.833333 administration 0.951445 ClarifaiAPIImageExtractor frame[60] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
2096 0 5.833333 NaN 0.833333 administration 0.954552 ClarifaiAPIImageExtractor frame[70] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
2392 0 6.666667 NaN 0.833333 administration 0.953084 ClarifaiAPIImageExtractor frame[80] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...
2695 0 7.500000 NaN 0.833333 administration 0.947371 ClarifaiAPIImageExtractor frame[90] VideoFrameStim None VideoStim->FrameSamplingFilter/VideoFrameColle... /Users/tal/Dropbox/Code/pliers/pliers/tests/da...