Data Science Interview Questions and Answers
WHAT IS CLUSTER SAMPLING?
Data Science interview questions
Non-probability data sampling methods include:
Q six. What is a statistical interaction?
Q seven. What is selection bias?
Q eight. What is an example of a data set with a non-Gaussian distribution?
The differences between supervised and unsupervised learning are:
The types of selection bias include:
Q three. What is bias-variance trade-off?
Possible high variance - polynomial regression
Question Four. What is a confusion matrix?
Question Five. What is the difference between "long" and "wide" format data?
What do you understand by the term Normal Distribution?
Question Seven. What is correlation and covariance in statistics?
Question Eight. What is the difference between Point Estimates and Confidence Interval?
Question Nine. What is the goal of A/B Testing?
Question Ten. What is p-value?
Question Twelve. How can you generate a random number between one and seven with only a die?
Q fifteen. What do you understand by statistical power of sensitivity and how do you calculate it?
Q sixteen. Why is Resampling done?
Resampling is done in any of these cases:
Q seventeen. What are the differences between over-fitting and under-fitting?
Q eighteen. How to combat Overfitting and Underfitting?
Q nineteen. What is regularization? Why is it useful?
Q twenty. What Is the Law of Large Numbers?
Q twenty-one. What Are Confounding Variables?
Q twenty-two. What Are the Types of Biases That Can Occur During Sampling?
What is Survivorship Bias?
Q twenty-four. What is Selection Bias? What is under coverage bias?
Q twenty-five. Explain how a ROC curve works?
What is TF-IDF vectorization?
Q two. How does data cleaning play a vital role in the analysis?
Q three. Differentiate between univariate, bivariate and multivariate analysis.
Q four. Explain Star Schema.
Q six. What is Systematic Sampling?
Q seven. What are Eigenvectors and Eigenvalues?
Q eight. Can you cite some examples where a false positive is more important than a false negative?
Q nine. Can you cite some examples where a false negative is more important than a false positive? And vice versa?
Q ten. Can you cite some examples where both false positive and false negatives are equally important?
Q eleven. Can you explain the difference between a Validation Set and a Test Set?
Q twelve. Explain cross-validation.
Q two. What is Supervised Learning?
Q three. What is Unsupervised learning?
Q four. What are the various algorithms?
Q five. What is 'Naive' in a Naive Bayes?
Gini Impurity and Information Gain - CART
Entropy and Information Gain - ID Three
Q thirteen. What is pruning in Decision Tree?
Q fourteen. What is logistic regression? State an example when you have used logistic regression recently.
Q fifteen. What is Linear Regression?
Q sixteen. What are the Drawbacks of the Linear Model?
Q seventeen. What is the difference between Regression and classification ML techniques?
Q eighteen. What are Recommender Systems?
Q nineteen. What is Collaborative filtering? And a content based?
Q twenty. How can outlier values be treated?
Q twenty-one. What are the various steps involved in an analytics project?
Q twenty-two. During analysis, how do you treat missing values?
Q twenty-three. How will you define the number of clusters in a clustering algorithm?
For SVM: Partial fit will work. Steps:
What is reinforcement learning?
Question five. What are Artificial Neural Networks?
Question six. Describe the structure of Artificial Neural Networks?
Question seven. How Are Weights Initialized in a Network?
Question nine. What Are Hyperparameters?
Question ten. What Will Happen If the Learning Rate Is Set inaccurately (Too Low or Too High)?
Question eleven. What Is The Difference Between Epoch, Batch, and Iteration in Deep Learning?
Question twelve. What Are the Different Layers on CNN?
There are four layers in CNN:
Sequential convolutional layers after the first one
Q thirteen. What Is Pooling on CNN, and How Does It Work?
Q fourteen. What are Recurrent Neural Networks?
Encoder Decoder Sequence to Sequence RNNs
Question fifteen. How Does an LSTM Network Work?
Recurrent Neural Networks
The Problem of Long-Term Dependencies
The Core Idea Behind LSTMs
Q sixteen. What Is a Multi-layer Perceptron (MLP)?
Q seventeen. Explain Gradient Descent.
Q eighteen. What is exploding gradients?
Q nineteen. What is vanishing gradients?
Q twenty. What is Back Propagation and Explain it Works.
Q twenty-one. What are the variants of Back Propagation?
Q twenty-two. What are the different Deep Learning Frameworks?
Q twenty-three. What is the role of the Activation Function?
Q twenty-four. Name a few Machine Learning libraries for various purposes.
Q twenty-six. What is a Boltzmann Machine?
What Is Dropout and Batch Normalization?
Q twenty-eight. Why Is TensorFlow the Most Preferred Library in Deep Learning?
Q twenty-nine. What Do You Mean by Tensor in TensorFlow?
Q thirty. What is the Computational Graph?
Q thirty-one. How is logistic regression done?
Q two. How do you build a random forest model?
Steps to build a random forest model:
Q three. Differentiate between univariate, bivariate, and multivariate analysis.
Q four. What are the feature selection methods used to select the right variables?