Before starting this please check below r square blog.

Adding a variable to your model increases the r square value. That new variable may or may not be a useful variable to our model. Since r square searches for best fit ,r square doesn’t bother about new variable, it just keeps on increasing whenever you add a new variable.

**So question is **how can we solve this issue?

**Answer: **Using adjusted** **r square

When a useless variable is added

**“k” increases**- In turn overall
**“denominator”**gets**reduced.** - We saw Adding a variable to your model increases the r square value…

we all know about correlation (R), correlation values close to 1 or -1 are good and tell you two quantitative variables are strongly related. Whereas variables with correlation 0 are not correlated.

What is r square and why should we go to r square?

r square is very similar to R but interpretation can be made using r square as follows,

- R = 0.7 is not twice as good as R = 0.5
- but R square =0.7 is 1.4 times as good as R square=0.5

Lets consider a example, here we are plotting person Id in x axis and weight…

Performance of a model must be considered to choose the best model for our data. It is also important to choose specific performance metric for specific scenario. In this blog, I will explain

- confusion matrix
- TRP,FPR,TNR,FNR
- Type I and II error
- Sensitivity, subjectivity and miss rate
- Precision and Recall
- positive prediction value(ppv)
- F beta score

and when to use them. Looks like a long list? but its a short blog.

This matrix helps in representing classification results. This representation helps in analyzing the performance of the model.

Follow the below steps to avoid confusion while building a confusion matrix,

- In…

In this blog let’s discuss all preprocessing methods. This blog may look big but it’s very effective.

The quality of data determines the performance of machine learning algorithm. The quality refers to the preprocessed data. Hence preprocessing is very essential for building a model.

Here is the methods of preprocessing,

Most of the machine learning algorithms don’t support data with null values. So there is a need to handle null values

Imputation is the process of replacing missing data with substituted values.

Whenever we build a ML model, we need to know about variables in our dataset to decide the which Ml model to be used. Since we are about to work with variables, let’s know about the types of variables.

There are two major types of variable, one is categorical and another one is numerical.

Catergorical or qualitative data is divided into categories or groups. It is the measures of ‘types’ and may be represented by a name or symbol

**Example: **gender, level of education, animal name, brand name etc,..

Catergorical variable gets furt6her classified into two types,

Normal distribution also known as gaussian distribution was Discovered by Carl Friedrich Gauss**.**

It is the most important probability distribution in statistics because it fits many natural phenomena such as heights, blood pressure, measurement error, and IQ scores which follow the **normal distribution.**

It is a probability distribution that is symmetric about the mean

Mean = median = mode

Symmetric around center

Forms bell shaped curve and are under curve =1

Consider a variable x which belongs to normal/gaussian distribution with mean, variance and standard distribution given below,

In this blog lets see about cross validation and its types.

While running a ML model we may get certain accuracy ,for example lets consider 95% .We will report to manager that our model gives this accuracy but the same model may give 93% while running in front of client. So we can’t fix specific accuracy for our model, to solve this issue we use cross validation**.**

When ever we do train test split, we use **random state **variable. When random state value changes accuracy also will change.

**Cross**-**validation** is a resampling technique for evaluating ML models by building multiple…

In this blog I will explain the concept of bias and variance

Let’s get clear about overfit and under fit,

Each and every data points in the training data are satisfied by best fit line but the same best fit line cannot satisfy the testing data. The inability of the best fit line to satisfy the testing data while satisfying the training data is called **overfitting. **In other words, Overfitting is a scenario in which a model perform so well over the training set and just as poorly on the test set

In underfitting, the error is very high with…

Measure of **directional relationship**(whether +ve or -ve) between two variables

Co(two) + variance = covariance

**Variance**: how much a **single variable** varies

**Covariance:** how much **two random variable** varies together

Covariance between two variables x and y if given by,

**disclaimer** — this story may waste your time 😂 and I love humor , so I will use “😂” in places where I feel that sense, if you don’t feel same humor in certain places forgive me😂.

I’m from Dindigul, Taminadu. Nothing interesting to tell about me .I’m a twin(this may be interesting). I just want to live my life to the maximum level of satisfaction. I love exploring new places and climates.

Student from Coimbatore Institute of Technology, R and D engineer trainee