This website uses cookies to ensure you have the best experience. Learn more

Analysis Of Large Log Files

10573 words - 42 pages

Analysis of large log files
Kasper Laursen s093078
Kongens Lyngby 2012 IMM-B.Sc.-2012-37

Technical University of Denmark Informatics and Mathematical Modelling Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 4525 3351, Fax +45 4588 2673 reception@imm.dtu.dk www.imm.dtu.dk IMM-B.Sc.-2012-37

Summary
This thesis covers pattern recognition of large log files using clustering analysis in form of mini-batch K-means clustering and data fitting, to find abnormal traffic in network flows provided by DeIC, formerly The Danish Research Network.
The implementation is a modified clustering algorithm using the Mahalanobis distance. In the analysis, more than 109 network flows from a single day was split into different clusters, and outliers were detected. The calculations of the clustering analysis took less than 13 hours, which means that outliers can be detected the following day. The implementation and analysis could be further improved by selecting a different set of fields from the log files, a parallel imple- mentation of the mini-batch K-means clustering algorithm and a more thorough analysis of the detected outliers.

ii

Preface
This bachelor thesis was prepared at the department of Informatics and Math- ematical Modelling at the Technical University of Denmark in fulfillment of the requirements for acquiring a B.Sc.Eng. degree in Software Technology.
Lyngby, 14 December 2012
Kasper Laursen

iv

Acknowledgements
I would like to thank my supervisor Robin Sharp for weekly meetings and sup- port through the whole project.
Tanks to The Danish Research Network for providing network log files for this analysis.
I would like to give a special thanks to Rasmus Jul Hansen for proofreading this project, thanks to Simon Laursen for discussion and finalizing the report and thanks to Søren Løvborg for proofreading, help and discussion through the whole project phase.

vi

Contents
1 Introduction 1
2 Preliminaries 3 2.1 Machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Clustering analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Network flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Problem analysis 15
4 Handling large datasets 19 4.1 Large log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Log file variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.3 Scaling of variables . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4 Packed format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5 Implementation 29 5.1 Mini-batch K-means . . . . . . . . . . . . . . . . . . . . . . . . . 29 5.2 Data fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6 Analysis of log files 35 6.1...

Find Another Essay On Analysis of large log files

Analyzing Windows Memory Essay

1339 words - 5 pages . Terminated objects may even be found in memory days after they were killed. The memory also will have the state of active network connections (Burdach). “Windows memory analysis techniques depend on the examiner’s ability to translate the virtual addresses used by programs and operating system components into the true locations of data in a memory image,” (Schuster). Due to Windows caching large amounts of file data in memory we need to ensure we

Digital forensic investigation Essay

3193 words - 13 pages extension from the collected media with actual data type and header (Solomon et al 2011: 92). • Unerase tools The forensic tool will also assist in recovering deleted files from the recycle bin. Deleted files will be recovered and analysed as part of the evidence collected (Solomon et al 2011: 92). • Searching tools A forensic investigator will make use of searching tools as analysis entails large-scale searches. The searches involved

testx

10872 words - 43 pages /Applications. If you prefer, use the usualy drag and drop to create an icon in the dock. c© 2010 syntevo GmbH, www.syntevo.com 5 Chapter 3 Major Features 3.1 Change Sets (Pro Only) A Change Set is a group of files with an assigned log message and might be known as "prepared commit" from other version control systems. Optionally, files assigned to a change set are not shown in the project structure (see Section 4.4.3). Starting with SmartCVS 7

User Profile Acquisition Approaches

1457 words - 6 pages descriptive statistics to extract knowledge from Web log has been introduced by Srivastava, Deshpende & Phang (2000), by analyzing the session files and perform statistics of user interaction such as frequency, mean, and median on variables i.e. page views, viewing time and length of a navigational path. Additionally, Web logs file analysis using statistical approach proposed by Stermsek et al.(2007) allow for a broader perception of user behavior

Store Your Data in the Cloud

1809 words - 8 pages computer anywhere in the world, with an Internet connection. Cloud storage needs hosting companies to operate a large data centers, and people who require their data to be hosted a data storage capacity from them. That can be access using public network or WAN (Wide Area Network). How it cloud storage works? In modern technology, the number of files and document are keep growing until no space to store. That’s the idea

Investigators: Digital Evidence

1410 words - 6 pages should be documented. Any software used should be in compliance with the law and NIST to ensure its validity when being used to conduct forensic analysis. Using improper software to conduct analysis could seriously jeopardize the integrity of the investigation. Standard things to search for when conducting forensic analysis of computer systems are system logs, cookies, deleted files, emails and email headers, files with strange extensions or

Data Acquisition

1869 words - 7 pages forensics analysis tools can read other vendors’ formatted acquisitions. Raw Format: There was only one practical way of copying data for the purpose of evidence preservation and examination. Examiners performed a bit-by-bit copy from one disk to another disk the same size or larger. As a practical way to preserve digital evidence, vendors (and some OS utilities, such as the Linux/UNIX dd command) made it possible to write bit-stream data to files

Technical FAQ 2

1552 words - 7 pages EDB files of Exchange Server 2003 and 2007, the deleted items are displayed with red color and a strikethrough in the same folder from where they’ve been deleted. • For the EDB files of Exchange 2010 and 2013, a separate folder “Removable Items” will appear in the Source List to display the deleted items. 26. How can I restore the deleted items? The deleted items can be restored during the migration process. Depending upon the version of your

Overview of Cloud Computing

2304 words - 9 pages computing is very reliable to this technology. It is because cloud computing always provided good and satisfied services to the users which the user only need to log into the software by using any of the electronic goods such as laptop or mobile phone. However, users might worry about the security of the document in the cloud computing. When the files or document been destroyed, cloud computing enables destroyed or lost files being recovered in the

Australian Customers and Online Shopping

1383 words - 6 pages 1.2 Executive Summary The Australian customers have a big appetite for the online shopping and the use of e-commerce website in Australia is increasing every year. The Perspective Private Limited is successfully providing Data Analysis & Recommendation System Services for the past five years to the e-commerce websites and has proven to enhance the customers recommendation system by obtaining competitive advantage in the e-commerce market

Testing Evolutionary Brain Size Change in Bats

2593 words - 10 pages of maintaining a large brain. However, that conclusion was based on a theoretical analysis of brain size in living bats and made no mention of the need to test it with reference to the fossil record. In fact, the major problem in this paper is that brain size in fossil bats is a totally neglected area. How could it be possible to test for an evolutionary trend without looking at the ancestors? The paper by Safi et al. goes on to claim

Similar Essays

Prospects Of Large Scale Rice Suitability Analysis In Papua New Guinea

2239 words - 9 pages with the passing time. With the progress of development of human society, the new generation Papua New Guineans are showing ostensible preference for grain crop ‘rice’ as the staple food. Here is the relevance of finding suitable rice growing areas in Papua New Guinea in order to discover its inherent potential to transcend into a rice exporting country from a rice importing country. Crop-land suitability analysis is a prerequisite to be

Othello This Is Character Analysis About Othello The Charater. It's Got Plenty Of Quotes To Use And Very A Large Resource Of Detail About The Characteristic's Of Othello

2104 words - 8 pages . Nobody could share his same passion for Desdemona and therefore he was forced to a choice, Desdemona's life or everyman whom would also love her like he. His view as a general forced himself to choose the path leading to the least casualties. Realised all his mistakes in one instant and recognised how foolishly jealous he had been and how his jealousy lead to a large amount of life loss. Another instant where Othello's intelligence is blinded by

Reconstruction Of Image With Hebcot Compression Technique

2420 words - 10 pages knowledge and is chargeable for work access thereto instance or copy. The log harmonizer forms the central element that permits the user access to the log files. 4.2 JAR Generation The JAR file contains a collection of access management rules specifying whether or not and the way the cloud servers and presumably different information interested party (users, companies) square measure licensed to access the content itself. Looking on the

Log Analysis

2121 words - 9 pages certain reactive measures in a scenario. With such a diverse and rich lot of information, statistical analysis will easily monitor the system performance and take proactive measures to improve it without human intervention. A screenshot in Figure-1 showing 195 log files on a Windows system can give an estimate how diverse and rich information does they contain. Logs can be classified into various categories based on the type of activity they