Fundamentals of Scalable Data Science

Apache Spark is the de-facto standard for large scale data processing. This is the first course of a series of courses towards the IBM Advanced Data Science Specialization. We strongly believe that is is crucial for success to start learning a scalable data science platform since memory and CPU constraints are to most limiting factors when it comes to building advanced machine learning models.In this course we teach you the fundamentals of Apache Spark using python and pyspark. We’ll introduce Apache Spark in the first two weeks and learn how to apply it to compute basic exploratory and data pre-processing tasks in the last two weeks. Through this exercise you’ll also be introduced to the most fundamental statistical measures and data visualization technologies.

This gives you enough knowledge to take over the role of a data engineer in any modern environment. But it gives you also the basis for advancing your career towards data science.

Please have a look at the full specialization curriculum: https://www.coursera.org/specializations/advanced-data-science-ibm

If you choose to take this course and earn the Coursera course certificate, you will also earn an IBM digital badge. To find out more about IBM digital badges follow the link ibm.biz/badging.

After completing this course, you will be able to: • Describe how basic statistical measures, are used to reveal patterns within the data • Recognize data characteristics, patterns, trends, deviations or inconsistencies, and potential outliers. • Identify useful techniques for working with big data such as dimension reduction and feature selection methods • Use advanced tools and charting libraries to: o improve efficiency of analysis of big-data with partitioning and parallel analysis o Visualize the data in an number of 2D and 3D formats (Box Plot, Run Chart, Scatter Plot, Pareto Chart, and Multidimensional Scaling)

For successful completion of the course, the following prerequisites are recommended: • Basic programming skills in python • Basic math • Basic SQL (you can get it easily from https://www.coursera.org/learn/sql-data-science if needed)

In order to complete this course, the following technologies will be used: (These technologies are introduced in the course as necessary so no previous knowledge is required.) • Jupyter notebooks (brought to you by IBM Watson Studio for free) • ApacheSpark (brought to you by IBM Watson Studio for free) • Python

We’ve been reported that some of the material in this course is too advanced. So in case you feel the same, please have a look at the following materials first before starting this course, we’ve been reported that this really helps.

Of course, you can give this course a try first and then in case you need, take the following courses / materials. It’s free…

https://cognitiveclass.ai/learn/spark

https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/f8982db1-5e55-46d6-a272-fd11b670be38/view?access_token=533a1925cd1c4c362aabe7b3336b3eae2a99e0dc923ec0775d891c31c5bbbc68

This course takes four weeks, 4-6h per week

None

Syllabus

Syllabus - What you will learn from this course
Week 1
Introduction the course and grading environment
Week 2
Tools that support BigData solutions
Week 3
Scaling Math for Statistics on Apache Spark
Week 4
Data Visualization of Big Data

FAQ

When will I have access to the lectures and assignments?

Access to lectures and assignments depends on your type of enrollment. If you take a course in audit mode, you will be able to see most course materials for free. To access graded assignments and to earn a Certificate, you will need to purchase the Certificate experience, during or after your audit. If you don't see the audit option:

The course may not offer an audit option. You can try a Free Trial instead, or apply for Financial Aid.

The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.

What will I get if I subscribe to this Specialization?

When you enroll in the course, you get access to all of the courses in the Specialization, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile. If you only want to read and view the course content, you can audit the course for free.

Is financial aid available?

Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.

I am in the middle of taking the course, and my IBM Bluemix trial has expired. What do I do now?

If you have started a course that depends on the IBM Bluemix, and your trial has expired, you can continue taking the course on the same environment by providing your credit card information. To avoid being charged, close any application instances you are not using and pay attention to the usage of your environment details.

Alternative, you can export any projects you are working on. Then, you can register for a new trial using a different email account, not used on IBM Bluemix before. Finally, import the projects to the new account.

When exporting your projects, for Node-RED use the process used when submitting assignments (export flow form the old project, then import to the new project via clipboard). For Node.js you can redeploy the code to Bluemix using your new account credentials.

If you have customized your GIT repository, or registered devices, migrating to a new environment will require you to redo those steps to reflect in the new environment.

I am about to start the course, my IBM Bluemix trial has expired, how do I proceed with this course?

If you already have an IBM Bluemix account, but your trial period has expired, you can always create a new account with a different email address.

Reviews

Great Introductory course for Big Data Analytics. The exercises and the assignments had the appropriate level of difficulty considering this was an advanced course. Thank you IBM and Coursera.

Pretty fun introduction, assignments were moslty copy-paste from instruction videos, so you don't get to 'learn' the right way in my opinion

It feels really good when you get to learn something new and Coursera helped me to achieve something and learn something new and good.

A bit on the easy side especially if you are proficient with SQL. But otherwise a decent into to spark and nice flavour of data analysis with python.

Duration Course 1 of 4 in the

Start your Free Trial

Self paced

65,644 already enrolled

4.3stars Rating out of 5 (1,978 ratings in Coursera)

Go to the Course
We have partnered with providers to bring you collection of courses, When you buy through links on our site, we may earn an affiliate commission from provider.