Description
BD-101 Big Data Experiment Lab
The rich and comprehensive training courseware provided by BD-101 enables students to learn big data analytical skills effectively and efficiently.
Big data is no longer just data processing, but an unprecedented tool for business intelligence
The Characteristics of Big Data
Volume
In various fields such as financial services, energy management, biomedicine, and multimedia communities, a large number of data sets is being generated every second.
Velocity
Whenever data is sent to servers, it will be analyzed immediately and bring real-time modifications on the previous results so as to obtain latest findings with maximum data values.
Variety
Diverse data includes structured and unstructured data, such as text, location, sound, videos, and pictures, all of which can be interactively analyzed to reveal correlations between data sets.
Veracity
Is the data source correct? Is the data accurately recorded even if it’s true? Are there any anomalies in data sets? Wrong data sources may result in deviations in the analysis results and affect the accuracy of the predictions. Therefore, ensuring authenticity of data sources is also one of the key points of big data analysis.
Value
The greatest values of big data analysis lie in digging out the data that is valuable for future trends from massive data, and performing in-depth analysis through artificial intelligence and machine learning to improve accuracy.
Hadoop – The System Foundation of Big Data
Hadoop is an open source software framework that can successfully solve various problems, such as file storage, file backup and data processing.
Therefore, it is widely used and has become the mainstream technology of big data analysis.
Spark – Data Processing for Big Data
Speed is very important in data processing when it comes to big data. An important feature of Spark is that it can be operated in memory, which makes Spark more efficient in data analysis and calculation than Map Reduce.
Python – The Extraction of Big Data
Python is a programming language commonly used in various fields. It can crawl large amounts of effective data from the network in a low-cost and automated manner. The powerful data processing capability is the main reason why Python becomes an important programming language when analyzing big data.
The System Features of BD-101
- System independence: it can be operated without any internet connection nor any additional hardware/software installation. The cabinet design makes it easy to move.
- Convenience: the troubleshooting function provides system restoring
function, so users can troubleshoot quickly. Through 6 different models of random data generation, users can generate datasets suitable for different algorithms easily. - Expansion: it can be applied to different researches and experiments for big data analysis, and it can also be combined with IoT-enabled devices to store and analyze various data sets from sensors to achieve cross-domain applications.
- Richness : it provides comprehensive courseware for big data analysis
- 9 different algorithms & 20+ classic big data analysis examples
- Tools, such as Hadoop, Yarn, Spark, Hive, and HBase, are introduced and applied.
Learning Objectives of BD-101
- Data cleaning, regularization and standardization
- Architecture and configuration of the big data ecosystem
- Comparison of assorted databases
- Use various algorithms to extract, store, retrieve and analyze big data
- Integration of big data analysis and artificial intelligence
List of Modules
Video