Starting with R for Data Science


    R and Python, both solid and robust for data analytics and machine learning, are the most popular languages used in Data Science.1 I embarked my learning journey with Python (indeed it’s my first programming language), and fell in love immediately with the graceful language for its elegant syntax, extensive flexibility, and powerful data science packages: Numpy, Pandas, Matplolib, Seaborn, and more. It enables me to navigate the world in a way that I’ve never imagined.

    There’re a lot of discussions regarding the choice of the first language in Data Science for people who have completely no experience in the field. I’m incompetent in making detail comparisons of the pros and cons of the languages with legitimate justifications.2 However, in summary of what I’ve learnt, Python is ideal for programmers, as well as people without any programming experience (like me!); while R is perfect for statistician, and also anyone with a solid mathematical background.

    We’ve always heard programming advice for beginners to first become an expert of one language, then to proceed in studying another one, and soon for more. In Data Science, it is important to learn one of the two essential languages extremely well to sharpen the swiss-army knife in one’s analytic toolbox.

    I haven’t been learning Python and its Data Science packages for long, but R was definitely on my future roadmap. However, due to a recent change of events, I’m preparing myself to land a junior data scientist position, and the work requires the application of R. Thus, I begin my study of R with the reputed interactive courses at DataCamp and the notorious book R for Data Science written by Hadley Wickham, one of the best R programmers in the world.

    I always enjoy writing programming notes, and I reckon writing is an incredible way to consolidate knowledge and to enhance understanding of a domain. Hence, despite a myriad of R tutorials out there, I hope to share my notes and journey here to other passionate Data Science learners.

    1. From “The State of Data Science & Machine Learning 2017”, an industry-wide survey by Kaggle, within 7,955 valid responses, 76.3% voted Python, while 59.2% voted R as the tool most used at work. 

    2. “Everyone data scientist has an opinions on what language you should learn first. As it turns out, people who solely use Python or R feel like they made the right choice. But if you ask people that use both R and Python, they are twice as likely to recommend Python.”, quoted from “The State of Data Science & Machine Learning 2017”. For comparison of Data Science languages in a nutshell, read “Which language should you learn for Data Science?”