Machine Learning Assignment Sample
3 weeks ago
Data pre-processing Data preprocessing results in the quality of data from the dataset that can be processed by Data cleaning Data reduction Data transformation Data cleaning describes the null value and missing value checking from the dataset, which increases the performance and the efficiency (Pearson, 2018). Reduction of data reduces the compiled time and the threshold value can be removed from the dataset. Data transformation defines the conversion of data from alphabetical to numerical, which helps to nurture the reach attribute for analysis. The above image shows the null and missing value-free dataset from the original dataset, it increases the performance and efficiency of the total model. The missing value does not exist in the dataset and generated new dataset returns the other value that remains unchanged. The above dataset is the result of a data reduction process, where the missing data are removed from the dataset. The reduced data allows a less running time with high performance and accuracy. It represents the condensed representation data from a huge dataset that results in efficient and similar output after the reduction of data volume. The above image shows the conversion of data from a string value to a numeric value, where the weekend data has been changed accordingly to reduce its complexity at the time of output (Batch and Elmqvist, 2017). The above image shows the transform data set from string value to numerical value, where the weekend attributes contains only the string value that has been converted into numerical value for better performance and accuracy. All the generated values are implemented in the Jupyter notebook that produces the predicted results for the relationship between the attributes that develop an E-commerce business. Data analysis and modeling The representative classification methods that have been implemented in this assignment are as follows: The dataset that has been used in this application consists of several rows of data that can not be completely assessed in a simple manner. In this case, the aim of the program is to find unlabeled groups of data. Thus, the representation of this information has required an algorithm that is able to represent the final data in a meticulous manner. For this purpose, the program needed to implement an algorithm that could address the necessity of representing the data based on patterns. Hence the use of the “K means clustering” algorithm was used to find the possible changes that were to be implemented in the application. Linear regression can be used to find the graph of the selected values. Similarly, the program can also be used for predicting the values that are used for creating the graphs. This program has used the second factor. Graphs were not created, instead, the values that were to be followed to plot the graph was represented, as depicted in the figure below. The predicted values, represented above, determine the path that would be followed by the graph. Logistic regression Following the logistics regression algorithm an accuracy score was achieved, as depicted in the figure below: This accuracy score could have been far higher if the dataset that was used could been trimmed into a smaller section. Logistic regression is the process of statistical analysis (geeksforgeeks.org, 2021). Performance evaluation The accuracy scores that are presented in the program are the evaluation of the performance of the algorithms. For example, the use of a logistic regression algorithm as depicted in Figure 3 shows the level of accuracy that has achieved. This level of accuracy could only be achieved after the initial process of dropping unnecessary data. Thus the level of accuracy could also be enhanced in case the algorithm was using a smaller dataset. The heat map that had initially been created in the program could be used to evaluate the overall performance of the dataset. This heat map depicts the path that was travelled most by the revenue and exit rates. Result analysis and discussion In this Machine Learning Assignment , the logistic regression algorithm implemented in the program achieved an impressive accuracy score of nearly 85%. This high level of accuracy is particularly significant given the large size of the dataset used. Such accuracy would not have been possible without effective preprocessing, including the trimming of the initial dataset. To prepare the data, redundant columns were removed, and all null values were dropped, ensuring a clean and relevant dataset for analysis. These preprocessing steps played a crucial role in the algorithm's performance. During the data exploration phase, various visualizations were created, including scatter plots, heat maps, histograms, and box plots, to illustrate relationships between different variables. Logistic regression results were further represented using a heat map to highlight the accuracy scores. Additionally, linear regression was conducted and visualized using line plots to reflect its respective accuracy score. References Batch, A. and Elmqvist, N., 2017. The interactive visualization gap in initial exploratory data analysis.IEEE transactions on visualization and computer graphics,24(1), pp.278-287. Pearson, R.K., 2018.Exploratory data analysis using R. CRC Press. Sahoo, K., Samal, A.K., Pramanik, J. and Pani, S.K., 2019. Exploratory data analysis using Python.International Journal of Innovative Technology and Exploring Engineering (IJITEE),8(12), p.2019. #J-18808-Ljbffr
-
School Administrator
1 week ago
Samal, Davao del Norte, Philippines Davao Airport View Hotel, Inc. Full time ₱600,000 - ₱800,000 per yearMust be a graduate of Bachelor's Degree in Education or Bachelor of Science in Commerce or any related .Preferably with 3 years in a similar role; andExcellent leadership and organizational skills.Proven ability to manage school facilities and offices.Commitment to ensuring a safe, productive learning environment.Must be keen to details; andMust possess...