Machine Learning interview Questions¶

Top Machine Learning Interview Questions¶

Material resources: link

What Are the Different Types of Machine Learning?

There are three types of machine learning:

Supervised Learning In supervised machine learning, a model makes predictions or decisions based on past or labeled data. Labeled data refers to sets of data that are given tags or labels, and thus made more meaningful.
Unsupervised Learning In unsupervised learning, we don’t have labeled data. A model can identify patterns, anomabiles, and relationships in the input data.
Reinforcement Learning Using reinforcement learning, the model can learn based on the rewards it received for its previous action.Consider an environment where an agent is working. The agent is given a target to achieve. Every time the agent takes some action toward the target, it is given positive feedback. And, if the action taken is going away from the goal, the agent is given negative feedback.
What is Overfitting, and How Can You Avoid It?

The Overfitting is a situation that occurs when a model learns the training set too well, taking up random fluctuations in the training data as concepts. These impact the model’s ability to generalize and don’t apply to new data. When a model is given the training data, it shows 100 percent accuracy-technically a slight loss. But, when we use the test data, there may be an error and low efficiency. This condition is known as overfitting.

There are multiple ways of avoiding overfitting, such as:
- Regularization. It involves a cost term for the features involved with the objective function.
- Making a simple model. with lesser variables and parameters, the variance can be reduced.
- Cross-validation methods like k-folds can also be used.
- if some model parameters are likely to cause overfitting, techniques for regularization like LASSO can be used that penalize these parameters.
What is ‘training Set’ and ‘test Set’ in a Machine Learning Model? How Much Data will You Allocate for Your Training, Validation, and Test Sets?

There is a three-step process followed to create a model: * Train the model * Test the model * Deploy the model

Training Set
- The training set is examples given to the model to analyze and learn.
- 70% of the total data is typically taken as the training dataset.
- This is labeled data used to train the model.
Test Set
- The test set is used to test the arccuracy of the hypothesis generated by the model.
- Remaining 30% is taken as testing dataset.
- We test without labeled data and then verify results with labels.
How Do You Handle Missing or Corrupted Data in a Dataset?