Quick Machine Learning Template
Ich finde Infografiken echt ansprechend, weil sie die wichtigsten Punkte eines Themas effizient und anschaulich vermitteln. Daher war es Zeit sich mit Canva in die Thematik einzuarbeiten.
Gleichzeitig wird Machine Learning in der Software-Entwicklung immer populärer. Also warum nicht ein Template für grundlegende erste Schritte im Bereich Machine Learning erstellen und beide Welten zusammen bringen? Folgende Infografik ist dabei rausgekommen.
Die Code-Snippets habe ich bei LinkedIn etwas genauer beschrieben
I just tried to get in touch with the development of Infographics. So I thought it’s a good idea to create a template for Machine Learning. Of course, it’s just quick coding, but it shows the essential steps towards the usage of Logistic Regression as one of the Machine Learning algorithms. The Infographic was created with canva. I used Python for the code snippets.
Logistic Regression is used for classification purposes and is part of the supervised learning approaches in Machine Learning in which the program learns from given input data and uses this learning to classify new observations. You can use Logistic Regression for deciding if an e-mail is spam or not, identify whether a person is male or female, and so on.
The array values for the independent variable x depend on your dataset structure. In our case I chose an imaginary set of 4 features –> x = fitData.iloc[:, [1, 2, 3, 4]].values. The dependent value y here is listed as the 5th column of the imported dataset –> y = fitData.iloc[:, 5]].values
Before feeding the data to the machine learning algorithm we have to split it into a part for calculating the model and the other one for testing the accuracy of the model. This is done with –> train_test_split(x, y, test_size = 0.2 …). Here we separated 80% for the model and 20% for the test.
Now we can use the Logistic Regression to train the model. The result is stored in the classifier object. After you predict the vector for y (stored in y_predict) based on the reserved test data (x_test), you can estimate the accuracy with a so called confusion_matrix by feeding the function with the already known y_test results and the predicted y_predict vector.
This is just a brief overview and a really quick approach. The template does not take care of missing data nor does it consider error handling or a correlation matrix between the independent x-values (this is usually done to generate a heatmap) and lot of other useful techniques. There is lot of space for improvement. Anyway, I hope, you like it.