Learn modern methods for evaluating and testing AI systems. This course will teach you cutting-edge approaches and tools for analyzing and monitoring AI performance. Perfect for IT professionals, developers, QA engineers, and anyone looking to deepen their expertise in this field.
I've spent 12+ years in software testing and currently lead AI Evaluation practice at FirstLineSoftware. I'm the creator of eval-ai-library, an open-source AI evaluation framework, and have built custom AI evaluation tools, methodologies, and applications specifically for AI systems. This course brings together all the practical knowledge and tools I've developed working on real-world AI projects.
What You'll Learn
Understanding AI basics
Introduction to evaluation and its importance
Python data types and base classes
Functions and custom classes in Python
Integrating external Python libraries
Introduction to Machine Learning
What is a Confusion Matrix
Classic ML metrics for classification (ROC-AUC, PR-AUC)
Classic ML metrics for regression (R-squared, RMSE, MAE)
Introduction in Deep learning and neural networks
Neural network evaluation specifics
Validation in deep learning (Softmax outputs, Confidence scores, Top-k accuracy, Entropy)
Training monitoring tools: TensorBoard and W&B
How LLMs work
Approaches to LLM evaluation
Popular evaluation metrics (Reference-based, Reference-free, Embedding-based)
Evaluation using benchmark datasets (MMLU, TruthfulQA, BIG-Bench Hard, HellaSwag, etc.)
The program for the AI evaluation course is structured wholly and logically. I would especially like to note the presentation of the material - clear, structured, without unnecessary water. The teacher clearly explains key concepts and shows practical examples, which really help you learn the material. Feedback comes promptly; after homework, there were always helpful comments and recommendations for improvement. The practical part is one of the main advantages of the course. The assignments are well-designed and allow you to apply the theory to real-life scenarios (LLM evaluation, RAG tests, AI agent scenarios, etc.), which is especially valuable for those who want to work on real-world problems. At the same time, it is essential to plan your schedule, as it may take up to 16 hours to study the practical tasks for each block independently.
Arthur Kim
Thank you very much for such a course. I have been looking for it for a long time and have never regretted my choice. Everything was presented in great detail, with many practical tasks. I especially liked that enough time was allocated for each sprint, and not, as often happens, everything was “running and running” in a week. It was also very convenient with the practice notes; they were super detailed and precise, and any questions were promptly resolved. Previously, I did not know this area, but now I understand exactly how you can work with AI and evaluate it. Ahead is an independent, in-depth study, and I hope to be able to quickly apply the knowledge I acquire in practice.
Arina Rodina
This course was one of the most rewarding and structured educational experiences I've had in AI. The program is logically and thoroughly structured, without unnecessary theoretical load, but with an emphasis on practical application. The material is presented clearly and is accessible, even for those who have not previously encountered AI evaluation. The teacher not only explains key concepts but also reinforces them with real-life examples, which makes learning much easier. The course really helps you go from zero understanding to confident use of AI evaluation tools. I recommend it to everyone in IT, whether developers, testers, or analysts, as well as to those who want to systematize knowledge in this fast-growing field. Excellent balance of theory and practice, high-quality presentation, and real applicability in work!