There are various types of user profile acquisition approaches, which are classified into five groups: (1) data mining, (2) statistics and network analysis, (3) Information retrieval, (4) machine Learning and (5) Cognitive. Most of the methods are dealing with static Websites except a couple of methods that can be applied on dynamic Websites (Nasraoui & Rojas, 2003).
The method employs data mining techniques such as a frequent pattern and reference mining found from (Holland et al., 2003; KieBling & Kostler, 2002) and (Ivancy & Vajk, 2006). Frequent and reference mining is a heavily research area in data mining with wide range applications for discovering a pattern from Web log data to obtain information about navigational behavior of the users. The frequent and reference patterns from their research can be classified into page sets, page sequences and page graphs.
The used of descriptive statistics to extract knowledge from Web log has been introduced by Srivastava, Deshpende & Phang (2000), by analyzing the session files and perform statistics of user interaction such as frequency, mean, and median on variables i.e. page views, viewing time and length of a navigational path. Additionally, Web logs file analysis using statistical approach proposed by Stermsek et al.(2007) allow for a broader perception of user behavior and potential to improve user profiling. Their approach includes several methods such as statistical inference, graph analysis and profile generation: (1) Statistical inference is from pre-processed web log data, (2) structure analysis for selecting a certain structure on the website (e.g., soccer news), then perform structure of the Website related to user’s interest on soccer news and (3) Graph analysis of Web log data by converting Web logs to adjacency matrix.
Adjacency matrix is a matrix representation of the individual user requests (request URLs) and represents the usage pattern of a Website for a certain user. This matrix shows user’s requested page by using request URLs and referer fields in the Web log data. Instead of descriptive statistics, the probability grammar-based approach model can be found in (Jin et al., 2004). They developed a unified framework for the discovery and analysis of Web navigational patterns based on Probabilistic Latent Semantic Analysis (PLSA). PLSA used probabilistic technique of the latent semantic factor to generate page view with probability value. Whereby, the application of regression concept called Support Vector Regression (SVR), applied by Jun (2005). SVR used to cater for the problem related with sparseness which rich and dynamic collection of hyperlink information, Web pages access and usage data in Web log records. They cover the extraction usage pattern from user’s click stream and estimate the dependency between the Web pages on the huge data sets. The SVR is a regression version (a type of statistical learning models) of Support Vector Machine (SVM) that enables to...