Photo by Arseny Togulev on Unsplash

Data analysis Using k-NN Algorithm- (Machine Learning)

Rajeev Raveendran
4 min readJul 8, 2021

--

The analysis of data is important for any research, analysis gives a clear explanation of the workings and theories. Data analysis helps the researcher to reach the final result easly.

The machine learning algorithm used to analyse the data is k-nearest neighbour k-NN. k-NN is usually used to classify data by assigning them to a group of data that is most similar to the right given to it for analysis. For example, when given a query, KNN Works By calculating the distance between the start particular query and all the data in the dataset. It selects the data that closely resembles the query within the k value parameters. The characteristics of both training data and test datasets are collected.

The implementation k-Nearest Neighbour for the analysis of data is in a particular structure. The first step is to calculate the euclidean distance between the datasets, then after finding the distance get the nearest neighbour for a new piece of data that is similar to a previously defined degree. The last step is to make predictions of knowledge collected from training the data set. In a typical data set the rows of data mostly contain numbers so it is easy to calculate the distance between two rows. The method used to calculate the distance between numbers is 2. You consider it as a straight line; the characterization of the distance between two points in a straight line is computed using the Euclidean distance measure equation. The real-world effect on measuring distance is smaller the distance the more similar the data will be. If the distance between two points is zero it means that there is no difference between the values of two records or data

When the real-world-NN function is used in the analysis, the euclidean distance between two points is found out using the following equation.

Where p is the first row of data, q is the second row of data and 1,2,3…n is the specific column as we sum across all columns.

In the k-NN algorithm, real-world K is an important parameter, K is the value that defines the number of neighbours that are nearest to the selected data. in the KNN algorithm, the number of neighbours is the main deciding factor because if the value of k is lark then the algorithm grossly approximates the functions to its nearest values as a result of this the algorithm may ignore a small but very important pattern of current during analysis the idea is to choose an appropriate value of k that can help in decreasing the impact of errors in the data.

Performing kNN algorithm

While performing the KNN algorithm the aim is to get the nearest neighbours The Neighbours are defined by the dataset that is K closest instances, to identify the closest neighbours in a dataset the first step is the calculation of distance between pleasing matter in the dataset to the new input data. This task is performed using the euclidean distance formula mentioned above. After the calculation of distance, the next step is to sort all records in the dataset. Sorting is done according to the distance between the input data and training datasets. Identification of most similar neighbours keeping track of distance for every single record in the dataset as a tuple, since it can store multiple variables as a single variable.

Prediction

In the final step, the most similar neighbours are sorted out by a training data set used to make predictions.

Since this project the process of classification is occurring, the value of the most presented class in the selected neighbours are returned. For the k-NN algorithm to predict the next value from the given input data max() function is used. The function sorts unique sets of values obtained from the immediate neighbours and outputs the list of values for every input value in the dataset

Therefore, one can use the KNN algorithm for applications that require high accuracy but that do not require a human-readable model. The quality of the predictions depends on the distance measured between neighbours.

Photo by Kevin Ku on Unsplash

--

--

Rajeev Raveendran
0 Followers

Electronics Engineer, likes to explore life and universe