Data Science

Regression-Discontinuity Analysis

https://conjointly.com/kb/regression-discontinuity-analysis/

Deep Learning with Structured Data

https://www.manning.com/books/deep-learning-with-structured-data

Data Analysis

https://towardsdatascience.com/the-six-types-of-data-analysis-75517ba7ea61
https://digital.gov/2015/04/16/using-a-hypothesis-driven-approach-in-analyzing-and-making-sense-of-your-website-traffic-data/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4552232/
https://towardsdev.com/exploratory-data-analysis-a-walkthrough-using-python-libraries-670ad3cf3659

Data Viz

https://extremepresentation.typepad.com/blog/2008/06/index.html

Intro to statistics

https://michael-bar.github.io/Introduction-to-statistics/

Transformers

https://e2eml.school/transformers.html
https://jalammar.github.io/illustrated-transformer/
https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
https://nlp.seas.harvard.edu/2018/04/03/attention.html
https://aman.ai/primers/ai/transformers/
https://nn.labml.ai/transformers/index.html

Hypothesis Testing

https://www.visual-design.net/post/an-interactive-guide-to-hypothesis-testing-in-python

Data Profiling

https://oralytics.com/2022/04/04/python-data-profiling-libraries/

Machine Learning for Retail Demand Forecasting

https://towardsdatascience.com/machine-learning-for-store-demand-forecasting-and-inventory-optimization-part-1-xgboost-vs-9952d8303b48

SQL Visualizer

https://sqlflow.gudusoft.com/#/

JSON Visualizer

https://jsoncrack.com/editor

Public APIs

https://github.com/public-apis/public-apis

30 books to learn advanced math

https://abakcus.com/30-best-math-books-to-learn-advanced-mathematics-for-self-learners/

Quantile Regression

https://towardsdatascience.com/quantile-regression-ff2343c4a03

Data Analysis and Visualization Course Notes

Udemy - Python Data Analysis & Visualization Masterclass - Colt Steele

Question	Answer
Where does the name pandas come from?	The econometrics term, “panel data”, which refers to multidimensional data sets.
What is pandas?	Open-source data analysis and manipulation package for Python.
How do you provide column names while reading a dataset that doesn’t come with it?	pd.read_csv(“filename.csv”, names=[“col1”, “col2”, …, “colN”])
What’s the output of df.info() ?	df shape, column names, column data types, null count per column, memory usage
What’s a pandas Dataframe?	2D, size-mutable, potentially heterogeneous tablular data structure with labeled axes.
How do you generate descriptive statistics for a string columns?	df.describe(include=[‘object’])
What are the return types of the following dataframe methods? info(), head()/tail(), describe(), mode(), sum()	NoneType, DataFrame, DataFrame, DataFrame, Series
Can you give a few examples where the dot notation for selecting columns doesn’t work?	When the column name has either a space or a dot in it, the dot notation doesn’t work. The dot notation doesn’t allow us to select multiple columns at the same time. The dot notation is also not helpful when we reference the column name through a variable. Another situation where the dot notation isn’t helpful is when the column name is the same as a dataframe method. Eg. a column with name “head”
What’s a pandas Series?	`class pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)`. One-dimensional ndarray with axis labels (including time series).
What are the properties of Labels in a pandas Series?	Labels need not be unique but must be a hashable type.
What data types does the Series object support?	The Series object supports both integer- and label-based indexing.
What kind of methods does a pandas Series provide?	Series provides methods for performing operations involving the index.
What data structure is the pandas Series built on?	Numpy array.
How do the statistical methods in a pandas Series deal with NaNs?	Statistical methods from ndarray have been overridden to automatically exclude missing data (currently represented as NaN).
Why is it not necessary for pandas Series to be of same length for performing operations on them?	Operations between Series (+, -, /, , *) align values based on their associated index values. So, they need not be the same length.
What would be the result index of an operation on two Series objects?	The result index will be the sorted union of the two indexes.
How does pandas Series handle different dtypes in the same series?	When working with heterogeneous data, the dtype of the resulting ndarray will be chosen to accommodate all of the data involved. For example, if strings are involved, the result will be of object dtype. If there are only floats and integers, the resulting array will be of float dtype. Try: `s = pd.Series([1, 2.0, "3"])`, `type(s[0]), type(s[1]), type(s[2])`
How do `nlargest` and `nsmallest` work on DataFrames?	df.nlargest(n, [“col1”, “col2”, …, “colN”]) - returns top n rows from dataframe sorted by col1, col2, …, colN

Analysis Operations Steps	Methods	Documentation	Source Code
1. Read the dataset as a DataFrame	pd.read_csv(“filename.csv”)	https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html	https://github.com/pandas-dev/pandas/blob/main/pandas/io/parsers/readers.py
2. Inspect the DataFrame	df.info(), df.head(), df.tail(), df.dtypes	https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html	https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py
3. Generate Descriptive Statistics	df.describe(), df.sum(), df.mode(), df.describe(include=[‘object’])	https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html	https://github.com/pandas-dev/pandas/blob/main/pandas/core/describe.py
4. Work with individual columns	col = df[“COLUMN NAME”], col.min(), col.max(), col.shape, col.values, col.index, col.head(), col.tail(), col.describe(), col.unique(dropna=True), col.nunique(), col.nlargest(n=5, keep=”first”), col.nsmallest(n=5, keep=”first”), col.value_counts(), col.plot(), col.dtype

LinkedIn Posts

Daniel Lee • Following 🚀 Land Top Jobs with DataInterview.com (Ex-Google) 3h • Edited • 3 hours ago

While cleaning my closet, I found a stack of lecture notes I collected over the course of 8 years of studying data science.

These notes detail the theory, formulas and application of various subjects including matrix algebra, Neural networks, Bayesian statistics, experimental design and recommender systems.

Some of these pages have coffee stains from the early days of my career when I went to Starbucks right after work to learn about data science.

It’s a great reminder of why I love data science. I am glad that I. had a chance to study it as an undergraduate at Virginia Tech, apply it as a data scientist at Google, and teach it as an interview coach on DataInterview.com.

If you are wondering what resources I personally used to learn data science, here’s a detailed list:

✅ Machine Learning

Stanford CS224 (NLP)
Stanford CS229 (ML)
Stanford CS231 (Computer Vision)
Intro to Deep Learning by Lex Fridman
Deep Learning by Ian Goodfellow

✅ Statistics

Probability by Jim Pitman
Penn State Design of Experiments
Time Series Analysis & Forecasting by Douglas C. Montgomery

✅ Software Engineering

Harvard CS50 (Intro to Computer Science)
Effective Python by Brett Slatkin
Seven Databases in Seven Weeks
Stanford SQL & DB course
Automate the Boring Stuff with Python

✅ Specializations

Fraud Analytics by Bart Baesens
Practical Recommender Systems by Kim Falk

Other notable mentions include Khan Academy, Udx, Coursera and Udacity.

Fixed Effects / Random Effects / Mixed Models and Omitted Variable Bias

Source: https://www.statisticshowto.com/experimental-design/fixed-effects-random-mixed-omitted-variable-bias/