906 words - 4 pages

12-tone equal temperament. Harmonicity is a characteristic that

differentiates harmonic sounds from in-harmonic sounds.

D. Cepstral Features

Cepstral frequency are log magnitude representations in

which the frequency are smoothed and they possess timbral

properties and pitch. They have orthogonal basis which helps

in performing similarity comparisons. These are widely used

all of audio extraction. The most popular cepstral features used

are Mel-Frequency Cepstral Coefficient, MFCC(first order

derivative), MFCC second order derivative,Bark Frequency

Cepstral Coefficient (BFCC), Homomorphic Cepstral coefficient

(HCC). They represent timbral properties of a signal.

Finding MFCCs involves a conversion of Fourier coefficients

to Mel-scale. Later, the resultant vectors are logarithmized

and decorrelated by Discrete Cosine Transform (DCT), which

helps in removing redundant information.

V. MODELING PARADIGMS

An event can be defined as any human-visible occurrence

that has importance to represent video contents fused with

audio. Each video can consist of many events. Current research

aims at models that handles this problem. Classification is

a technique of modeling a set of labeled instances(training)

and then to classify a test instance into one of the classes

using model. Table III shows various model paradigms used

in literature and Table IV shows the comparisons, data sets

and references.

A. Hidden Markov model

Hidden Markov models have been extensively used for modeling

the temporal dynamics of varying length patterns of short

duration. A HMM is a finite state machine characterized by the

number of states in the model, the state-transition probability

distribution, the observation symbol probability distribution

for each state, and the initial state probability distribution.

Continuous density HMMs use probability densities to represent

the continuous observation distributions of the states.

The continuous observation density for a state is estimated by

assuming that it can be represented by a mixture of Gaussian

density functions. Then the estimation of continuous density

for a state involves estimation of the mean vector and co

variance matrix of each component of the Gaussian mixture

and the estimation of the mixture coefficients. The HMM

for a class is trained using the varying length sequences

corresponding to the sequences of feature vectors of multiple

examples of the class. The HMM for a class is trained to

maximize the likelihood of the model generating the sequences

of that class. During recognition, the sequence of a test pattern

is given as input to the HMM of each class, to compute

the probability of the test sequence being generated by that

model. Then the class of the model with the highest probability

is assigned to the test...

Get inspired and start your paper now!