14 important books on data science based on experts

14 important books on data science based on experts

Updated: March 15, 2020

Written by Digital Shivam

Chico Camargo, a postdoctoral researcher in data science at the Oxford Internet Institute, came to the field of data science from a biological background.

He told Built-in, "Biology is huge, confusing, and complex, so I was attracted to tools that could help me gain some meaning from it."

Humans often understand the complexity of the natural world through our own natural tools, namely the brain and the senses. Data science enhances these innate capabilities through algorithms and predictive models.

Camargo is particularly constrained by unsupervised machine learning and natural language processing, which can help humans perform various operations, from detecting signs of metastatic cancer to using Google Translate to learn foreign languages.

In fact, so far, data science has become so complicated that it can not only enhance our natural abilities, but also imitate them.

Take deep learning as an example. Camargo explained: "It uses multiple layers of [algorithm] to progressively extract advanced features from the original input."

Human vision works in a similar layered manner. Camargo said: "The first layer of neurons in the visual system is responsible for identifying light and dark, while the deeper layers respond to patterns such as curves and lines." Finally, the "nth" layer of neurons can identify visual content: " Aha, this is a face! "

In a way, data science has become the sixth sense of humankind. However, this may also be the feeling most ordinary people don't understand. So for those who want more information, we ask three experts to recommend their favorite data science books. Our team includes:

Zach Miller, chief data scientist at Ninja Credit

Jeff Herman, Chief Data Science Instructor, Flatiron School

Chico Camargo, Postdoctoral Fellow, Data Science, Oxford Internet College

The final reading list includes technical machine learning and math textbooks, as well as sociological research on how algorithms affect our daily lives.

General Interest Book

Everyone is lying: big data, new data and who the Internet can tell us, we are really SETH STEPHENS-DAVIDOWITZ

CAMARGO: This book is like Freakonomics in the age of data science. This is a 100% non-technical book. Each chapter tells a strange story that illustrates the concepts of data science. For example, one chapter is about Google search, another chapter is about news, and another chapter is about image data. It's a bunch of stories about people being creative and looking for patterns in the most random things, because these random things actually reveal a lot. This book has such a name because you can lie about what you eat and read, and you can lie to the people you want to vote for. This book is for those who are curious about what data science is and what it can do, especially when it comes to social data. The author concludes that the next Freud will be a data scientist, the next Foucault will be a data scientist, and the next Marx will be a data scientist. I think it might be a bit more because data science cannot answer all the questions. But this is an interesting book that requires a little salt to read.

Bare stats: Charles WHEELAN reads fear from the data

HERMAN: This book provides many examples of how statistical concepts can be applied in the real world. Wheelan doesn't cover many theories, but he has some very interesting examples and a dry sense of humor. This is the only statistical book that makes me laugh, and it is the book we recommend that students who enroll at Flatiron School read in advance. Our students come from a variety of statistical backgrounds, but I have always received very positive feedback on this. It is ideal for beginners, but I also think that if you have never read it and are proficient in data science, then it is a good book.

The weapon of mathematical destruction: how Cathy O'Neil big data increases inequality and threatens democracy

CAMARGO: The author of the book, Cathy O’Neil, was an academic mathematician. Then she went to Wall Street, then to Occupy Wall Street, and now she is a activist, raising awareness of how algorithms govern our lives and how they are not as neutral or fair as we would like. This book is a collection of stories about the practical application of algorithms, many of which are about people who are classified as unworthy by algorithms. For example, someone buys a product in a specific store and automatically reduces the credit card limit, or college students cannot find a job at a local grocery store because of the algorithm.

She didn't just say "shh, bad algorithm, bad machine!" But-she worked hard to explain the mechanisms that could lead to algorithmic racial discrimination. So why is there a law and order algorithm that sends police to black communities more frequently? Well, what happened in this case was that the algorithm obtained data from previous police patrols, which were more in black communities. Therefore, the algorithm learns that those blocks are those with more patrols. The algorithm just copied what it taught. This book makes you think about how to design algorithms and data science practices.

The compression algorithm of SAFIYA NOBLE

CAMARGO: This book has some stories with very simple "data" that the author delves into. I found this a very interesting article because the author's background is almost exactly the opposite of mine. She is 100% qualitative and tells stories based on "small data" with a lot of context.

In one of the stories, author Safiya Noble is organizing a party for her niece and other children, and she searches on Google for things like "black girls." To her surprise, she could not find a picture of the child. She found sites like "Hot Singles in Your Area". She found the same for other search terms, such as "Latin girl" and "Asian girl."

She explained that this happened because of Google's revenue model. The algorithm will serve the ad with the highest advertising cost. The situation becomes disturbing because even if Google is an advertising company, we use it like a public library-like some kind of publicly accessible repository. I found it sober to read.

Entry-friendly textbook

Introduction to Statistical Learning: Applications by R. Garris James, Daniela Witteng, Trevor Hasti and Robert Tibuschi

HERMAN: When I first studied data science, most statistics textbooks were unreadable. They delved into the theory and didn't really show the application side. This book is not statistically deeper than many other books, but it gives you enough knowledge to make him a data scientist and covers key machine learning algorithms. One of the problems people face in data science is that algorithms are these black boxes, where you put data and then take out the data without knowing what happens in the middle. This book gives you enough statistical knowledge to understand what is happening in that black box.

It is aimed at people without programming or statistical background. Having said that, I have actually read this book many times. Even if you are an experienced data scientist and many statistical concepts, you will forget them over time. At work, you don't use every algorithm. You will be comfortable This book allows you to say, well, maybe I should try other algorithms.

Scraping Data Science: Joseph Gruss's PYTHON First Principles

MILLER: This book is about how to write data science algorithms in Python. It's a mix between a textbook and a regular book-a great introductory book, perfect for the layman. So, for example, if I want to learn the machine learning algorithm Naive Bayes, the book says: "We will program Naive Bayes as if it doesn't exist in the world. We will first learn math and then write code in it . We will build the algorithm with Python. "

You might want to know some Python and some statistics, but this book assumes you have almost no depth of knowledge. Not one of these books, "It's easy for the reader." It will teach you all standard machine learning algorithms, there may be 10 or 15 different algorithms.

AURÉLIENGÉRON's SCIKIT-LEARN, KERAS and TENSORFLOW handheld machine learning

HERMAN: This book will teach you how to run predictive analytics. In the data science world, there are two main programming languages: Python and R. Both have advantages and disadvantages, but this book is specific to Python. Scikit-Learn, Keras, and TensorFlow are all libraries for machine learning and deep learning functions in the Python programming language.

You must be proficient in these libraries to become a data scientist. When I first started, I would refer to this book every day. So far, I might look at it at least once a month for reference, as he does give a deep dive into how each algorithm works. Many algorithms have many knobs or levers that you can rotate-so depending on the data, you may change the algorithm slightly. The author explains these different knobs and joysticks in a way that beginners can understand, but those with more experience can appreciate the level of detail he has studied.

Mind Statistics: Allen B. Downey

MILLER: Data science is a combination of three different disciplines. One is programming and computer science; the other is programming and computer science. One is linear algebra, statistical data, a lot of mathematical analysis; then machine learning and algorithms. The ideal data scientist is really good at all of these things. But this does not always happen, so this book is about building analytical, mathematical, and statistical knowledge in data science knowledge. How do you test, how do you determine if the solution works and the distribution is correct, and how do you use mathematical tools to solve business problems?

It's a textbook, but not a rigid textbook. It also merges statistical analysis with how you write in Python. Earlier in my career, I found statistics fairly easy, but incorporating statistics into the program was more challenging. I found this very helpful for establishing a connection.

Andre W. Trak

CAMARGO: This book is an introductory textbook for beginners who want to go beyond its scope and understand the principles of deep learning. People who develop deep learning tools usually draw resources from many mathematics: multivariate calculus, linear algebra, optimization, and usually some physics. But you don't need all these things to understand what deep learning is doing. In the words of the author, "If you have passed high school math and mastered Python, then you can start writing this book." It covers some very general and basic bits such as gradient descent, backpropagation, and regularization These bits are used in many advanced tools and cannot be done without a thorough understanding of them.

I think books like this are important because with online tutorials you can implement something that is complex without actually understanding it-all you need is Python and an internet connection. Sometimes this is troublesome. People may waste resources by using deep neural networks that linear regression will do (in a sense, using a rocket launcher to kill fruit flies), or by implementing algorithms that lead to decisions that harm humans, and programmers do n’t Be aware of this happening.

SHELDON AXLER linear algebra done correctly

Miller: This book is an undergraduate math textbook. It is designed for intermediate linear algebra courses and can be used by every data scientist. Not sexy It's not machine learning, nor Flash programming. But what I use more than anything else is my ability to take a matrix or high-dimensional space and think. This is one of the books, and when you are done, you will know inside and outside how to do matrices, how to handle vector spaces, and how to perform pure mathematical operations on high-dimensional spaces. However, I would not say that it is suitable for everyone. If this is your first math book, you will find it daunting. This is for a 200 or 300 level course.

More advanced textbooks

RI Pattern Recognition and Machine Learning CHRISTOPHER M. BISHOP

Miller: This book is definitely a textbook. Similarly, if you are learning data science from scratch and then raising your math level to 11, this is the book. It's all based on the so-called Bayesian point of view, and says it has an introduction to Bayesian learning. Technically, it does, but any beginner will get messed up by two of them. When I talk to other data scientists, as much as I hate, this is the book we often talk about.

As for what pattern recognition means here-any machine learning is pattern recognition, right? Look at the past performance of the stock market and predict what will happen next, and that is pattern recognition. But looking at a bunch of signs and learning similarly, this model means "stop", which is similar. Machine learning is a big and interesting term that basically means using old data to think about data you have never seen before. In terms of depth and clarity of expression, this is probably the best book I have ever read. He didn't hide anything or make it beginner friendly. But this is how it works, you can choose to accept or keep it.

Deep Learning with PYTHON byFRANÇOISCHOLLET

HERMAN: The author of this book is the creator of a library called Keras, which makes it much easier to build neural networks in Python-usually, in deep learning you are using neural networks for unstructured data. So if you are trying to predict if someone is in the image, or if your comment on Yelp is positive or negative, you can use a deep neural network. I remember when reading this, in Chapter 2, you were building a neural network for the first time. He wrote the code in the book, and then you tried it for yourself on a computer and you got 98% accuracy. The dataset is a bunch of handwritten numbers, and even if everyone's handwritten content is different, you still have to try to predict what the numbers will be. The ones where the algorithm becomes incorrect are the ones I might become incorrect. I was able to do this in Chapter Two, and I thought, "Well, I will definitely finish this book."

Data-intensive applications by MAR MARTIN KLEPPMAN

MILLER: This book is not the standard choice for data science books, because it is very important in the computer science corner of the three pillars of data engineering and data science. More information on designing databases and ensuring that data can flow into and out of the system. If I want to build a system to store all the Yelp reviews that existed, every Yelp user, and all this information, then this book is about how to store it. How do you ensure that data can go in and out? How do you ensure data is consistent and reliable? When you have one million users instead of 100,000 users, how do you ensure that the system is not interrupted?

This is not super data science, but I think it is a difficult problem that many data scientists overlook, and it explains why your system should be so clearly. It does not assume that you are a data engineer or administrator. What I want to say is that any data scientist owes it to himself to understand how they rely on the system to work. But you may not sit back and read this guide. More references.

© Built-in 15/03/2020
By Digital Shivam

जमशेदपुर में एक व्यक्ति ने सड़क किनारे बिक रहे 12 आम के लिए एक लाख 20 हजार (1.20 लाख) रुपये कीमत चुकायी है।

जमशेदपुर में एक व्यक्ति ने सड़क किनारे बिक रहे 12 आम के लिए एक लाख 20 हजार (1.20 लाख) रुपये कीमत चुकायी है। 10 हजार रुपये में एक आम बिकने की ये खबर क्षेत्र में चर्चा का विषय बन गई है। ये आम एक गरीब लड़की बेचे हैं, जिसकी इसके बाद किस्मत बदल गई है। सड़क किनारे 1.20 लाख रुपये में आम बेचने वाली लड़की जमशेदपुर निवासी 12 वर्षीय तुलसी कुमारी है। उसके 12 आम के लिए 1.20 लाख रुपये चुकाए हैं मुंबई की कंपनी वैल्युएबल एडुटेनमेंट प्राइवेट लिमिटेड ने। कंपनी ने तुलसी कुमारी से मात्र एक दर्जन आम 1.20 लाख रुपये में खरीदे हैं। ताकि सड़क किनारे आम बेचने वाली वो गरीब लड़की फिर अपनी पढ़ाई शुरू कर सके। कोरोना काल में ऑनलाइन पढ़ाई करने के लिए तुलसी को एक स्मार्ट मोबाइल फोन की आवश्यकता थी। जमशेदपुर के स्ट्रैट माइल रोड के आउट हाउस में अपने माता-पिता के साथ रहने वाली और 5वीं कक्षा की छात्रा तुलसी विगत दिनों कीनन स्टेडियम के पास लॉकडाउन के दौरान आम बेच रही थी। तुलसी ने बताया कि वह 5000 रुपये कमाना चाहती थी, ताकि वह एक मोबाइल फोन खरीद सके और अपनी ऑनलाइन पढ़ाई फिर से शुरू कर सके। स्मार्ट फोन के अभाव में ...

Tech News By Digital Shivam

Search This Blog

14 important books on data science based on experts

Comments

Popular posts from this blog

जमशेदपुर में एक व्यक्ति ने सड़क किनारे बिक रहे 12 आम के लिए एक लाख 20 हजार (1.20 लाख) रुपये कीमत चुकायी है।

बिहार पंचायत चुनाव में बैलेट पर पड़ेंगे वोट, 2.09 लाख ईवीएम तैयार