Data Preprocessing: A preliminary step for web data mining

Huma Jamshed; Sadiq Ali Khan; Muhammad Khurrum; Syed Inayatullah; Sameen Athar

Ayuda

Data Preprocessing: A preliminary step for web data mining

Huma Jamshed ^[1] ; M. Sadiq Ali Khan ^[1] ; Muhammad Khurram ^[1] ; Syed Inayatullah ^[1] ; Sameen Athar ^[1]
1. [1] University of Karachi
  
  University of Karachi
  
  Pakistán
Localización: 3c Tecnología: glosas de innovación aplicadas a la pyme, ISSN-e 2254-4143, Vol. 8, Nº. Extra 1, 2019 (Ejemplar dedicado a: “2nd International Multi–Topic Conference on Engineering and Science”), págs. 206-221
Idioma: inglés
Enlaces
- Texto completo (pdf)
Resumen
- In recent years immense growth of data i.e. big data is observed resulting in a brighter and more optimized future. Big Data demands large computational infrastructure with high–performance processing capabilities. Preparing big data for mining and analysis is a challenging task and requires data to be preprocessed to improve the quality of raw data. The data instance representation and quality are foremost. Data preprocessing is preliminary data mining practice in which raw data is transformed into a format suitable for another processing procedure. Data preprocessing improves the data quality by cleaning, normalizing, transforming and extracting relevant feature from raw data. Data preprocessing significantly improve the performance of machine learning algorithms which in turn leads to accurate data mining. Knowledge discovery from noisy, irrelevant and redundant data is a difficult task therefore precise identification of extreme values and outlier, filling up missing values poses challenges. This paper discusses various big data pre–processing techniques in order to prepare it for mining and analysis tasks.