12-tone equal temperament. Harmonicity is a characteristic that
differentiates harmonic sounds from in-harmonic sounds.
D. Cepstral Features
Cepstral frequency are log magnitude representations in
which the frequency are smoothed and they possess timbral
properties and pitch. They have orthogonal basis which helps
in performing similarity comparisons. These are widely used
all of audio extraction. The most popular cepstral features used
are Mel-Frequency Cepstral Coefficient, MFCC(first order
derivative), MFCC second order derivative,Bark Frequency
Cepstral Coefficient (BFCC), Homomorphic Cepstral coefficient
(HCC). They represent timbral properties of a signal.
Finding MFCCs involves a conversion of Fourier coefficients
to Mel-scale. Later, the resultant vectors are logarithmized
and decorrelated by Discrete Cosine Transform (DCT), which
helps in removing redundant information.
V. MODELING PARADIGMS
An event can be defined as any human-visible occurrence
that has importance to represent video contents fused with
audio. Each video can consist of many events. Current research
aims at models that handles this problem. Classification is
a technique of modeling a set of labeled instances(training)
and then to classify a test instance into one of the classes
using model. Table III shows various model paradigms used
in literature and Table IV shows the comparisons, data sets
A. Hidden Markov model
Hidden Markov models have been extensively used for modeling
the temporal dynamics of varying length patterns of short
duration. A HMM is a finite state machine characterized by the
number of states in the model, the state-transition probability
distribution, the observation symbol probability distribution
for each state, and the initial state probability distribution.
Continuous density HMMs use probability densities to represent
the continuous observation distributions of the states.
The continuous observation density for a state is estimated by
assuming that it can be represented by a mixture of Gaussian
density functions. Then the estimation of continuous density
for a state involves estimation of the mean vector and co
variance matrix of each component of the Gaussian mixture
and the estimation of the mixture coefficients. The HMM
for a class is trained using the varying length sequences
corresponding to the sequences of feature vectors of multiple
examples of the class. The HMM for a class is trained to
maximize the likelihood of the model generating the sequences
of that class. During recognition, the sequence of a test pattern
is given as input to the HMM of each class, to compute
the probability of the test sequence being generated by that
model. Then the class of the model with the highest probability
is assigned to the test...