About this project
AI is the new electricity. Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don’t think AI will transform in the next several years. - Andrew Ng
This project seeks to transform container management through the integration of AI-driven operations,
aimed at enhancing monitoring and predictive analytics. Traditional tools struggle with the
complexity of modern applications, causing delays in problem resolution and diminishing system reliability.
By utilising machine learning models to analyse historical data, this project aims to forecast and prevent
system failures and performance issues. This proactive strategy not only boosts operational efficiency
but also optimises resource use. The objective is to develop an intelligent monitoring solution that
enhances system resilience and agility.
Angular Docker Electron
Ionic Framework Kubernetes NodeJS
Python
Key Features
The project is currently under development. Most of the major milestones have been achieved, and it should be ready by the deadline. However, please note that the project is still in the testing phase, and some features may not be fully functional.
Log Interpretation
The first part of our project focuses on making log data more accessible and understandable for developers who may lack a background in DevOps. By leveraging on-device machine learning models, our system translates complex log information into easy-to-understand narratives, helping developers quickly grasp the essence of logs and focus on critical issues in the application and infrastructure. This is especially helpful when you’re looking for a quick summary of the logs and don’t want to spend time reading through the entire log file. Log summaries are also available by using Generative AI models.
Event Extraction and Predictive Analytics
Building on the foundation of simplified log data, the second part of our project involves extracting significant events from these logs. Using advanced machine learning models, we analyse these events to predict potential anomalies and understand the root causes of system disruptions. Once an anomaly is identified, the system is able to become better at predicting it the next time a similar event occurs. This proactive approach allows teams to address issues before they escalate, ensuring smoother and more reliable performance and leading to overall satisfaction.
Proactive Anomaly Prevention and Recommendations
The third and most challenging aspect of our project is the provision of actionable recommendations to prevent detected anomalies and restore the system to a stable state. By identifying patterns and potential precursors to system disruptions, our AI system offers tailored advice to developers and IT operators. These recommendations not only help in quickly mitigating risks but also in optimizing the development process and resource allocation, enhancing overall system resilience and operational efficiency.
Our solution aims to create an intelligent monitoring environment that empowers developers and IT operators to stay ahead of potential issues, fostering a more resilient and agile technological infrastructure. This integration of AI-driven insights and proactive management sets a new benchmark in container management, addressing the dual needs of operational efficiency and proactive issue resolution.
Technological Stack
Our project leverages a robust and diverse technological stack to deliver an intelligent, AI-driven container
management solution. Each technology plays a critical role in ensuring the efficiency, reliability, and
scalability of the system.
The application uses Electron to provide a cross-platform desktop interface that is compatible with different
operating systems. electron-packager creates application binaries for Windows/macOS/Linux, allowing developers and IT operators to access the application from any device, enhancing usability.
Electron handles the application lifecycle, including window management, menu creation, and system interactions.
The front-end of the application is built using Angular and Ionic Framework, which provide a responsive
and user-friendly interface. Angular enables the creation of frontend components, while Ionic Framework ensures a consistent
ensures a consistent and intuitive design across different devices. The backend for the application is written in Fast API
using REST APIs to communicate with the rest of the application through an Express server. The APIs handle data from the
Docker and Kubernetes clusters, providing real-time insights and analytics. The application uses NodeJS
to manage the server-side logic and data processing, ensuring smooth communication between the frontend and backend. The application
also incorporates Python scripts for data analysis and machine learning models, which provide predictive analytics and
anomaly detection. These models are trained on historical log data to identify patterns and predict potential system failures.
The application storage is managed by a Data Access Layer (DAL) that interacts with the databases to ensure that data is stored securely
and efficiently. The DAL uses SQLite for local storage and MongoDB for cloud storage, providing flexibility and
scalability. The application also stores some data using electron-store, a simple data persistence library for Electron applications.
Use Cases
Enhanced DevOps efficiency
In a large-scale enterprise environment, managing thousands of microservices can become incredibly complex.
This application can streamline DevOps operations by providing real-time insights and predictive analytics.
By translating complex log data into understandable narratives, even developers without extensive DevOps
backgrounds can quickly identify and address issues. This enhances overall efficiency, reduces downtime,
and ensures that development and operations teams are always on the same page.
Proactive Incident Management
For organisations that rely heavily on continuous uptime, such as e-commerce platforms or financial services,
the ability to predict and prevent system failures is crucial. By analysing historical log data and
identifying patterns that precede critical events, the application can predict potential failures and
performance bottlenecks. This allows IT teams to take pre-emptive action, avoiding costly downtime and
maintaining a seamless user experience.
Compliance and Security Monitoring
In industries with stringent compliance and security requirements, such as healthcare or finance,
continuous monitoring and anomaly detection are essential. This application can help ensure
compliance by monitoring logs and telemetry data for unusual patterns that might indicate security
breaches or non-compliant behavior. By providing real-time alerts and recommendations for corrective
actions, it helps organizations maintain compliance and secure their systems against potential threats.
Open Source Notice
This project and it’s outputs are available as a permanent open-access research repository. Please reach out to
the project owner for more information on how to access the repository, along with your research credentials.
The source code for the application is open source and will be made available on GitHub for public use starting 2025.
Generative AI Notice
The application incorporates Generative AI models to enhance its features. Please note that AI-generated outputs, wherever
available, may not be 100% accurate and should be used with caution. User discretion is advised when interpreting any such
outputs. Some data is collected and stored by the application and processed in accordance with our Data Privacy Statement.
Data Privacy Statement
The application collects and processes data like application usage, user interactions, and system logs to improve its features.
This data is stored securely and is not shared with any third parties. By using the application, you consent to the collection
and processing of this data for the purposes of improving the application’s functionality and user experience.