Security Data Ingestor (SDI)

1 Project Assignment

1.1 Context & Background

Guardian360 has developed a SaaS platform that performs daily information security scans and integrates the results of these scans into dashboards and reports. Until now, the entire technology stack has been managed by Guardian360, meaning they are fully responsible for operating the scanners, processing the data, and presenting the results.

However, due to recent developments, there is also a need to process scan results and other information security insights within the platform. This involves scanners and other applications that are not managed by Guardian360 but whose results still need to be processed in a secure, scalable, and stable manner.

1.2 Problem Description

Guardian360 works with a variety of security insights tools, that perform host and network scans and report back vulnerabilities and privacy issues that have been detected. Each scanner has it’s own reporting mechanism for the results of said scans. Moreover, not every scanner is capable to share data with a system on the internet, therefore pushing the automatically generated data to Guardian360 centralized dashboard would raise errors and formatting issues.

1.3 Business case

Having the possibility to process different kind of security insights enables us to keep up quickly with market trends and wishes of users of our platform. Offering more insights adds values for the user of our platform as well, as it would provide a better overview of their security, increasing the happiness and satisfaction of the partners and customers. If scan load can be transferred to clients’ systems, we can limit resources of our SaaS platform. Which in the end of course would keep them as partners and customers when renewing contracts.

1.4 Project Goal

An application that ensures data from various external sources can be processed in a generic manner by Guardian360 Lighthouse (Guardian360’s centralized dashboard). This should be both possible from any Rest API resource and direct handling of JSON/YAML input. The first applications to be set up with this generic ingestor must be Trivy and InfoSec Agent.

Ingested data must be normalized to be used for generic scan result aggregation. The solution should send back the results to Guardian360 Lighthouse Rest API. To extend future compatibility with our platform it should be platform agnostic and therefore work on Windows, Mac and Linux.

2 Conclusion

This research explored how a platform-agnostic, CLI-based modular tool can ingest third-party security insights from diverse formats and deliver normalized results to Guardian360’s Lighthouse system. By addressing the sub-questions step-by-step, the following conclusions were drawn:

Data Source Handling: The SDI successfully distinguishes structural and semantic differences between Trivy, InfoSec, and generic JSON/YAML inputs. It reliably parses diverse sources using dynamic schema selection and fallback strategies.
Normalization Strategy: A standard schema, inspired by Nuclei and Lighthouse integration, was defined and implemented through configurable key maps (keys.yaml). Moreover, a dynamic schema mapping mechanism was built into the tool using NLP (Hugging Face Sentence Transformer). The system ensures consistent structure and supports fallback for unmapped fields when enabled.
Robust API Output: Output to Lighthouse uses secure and validated HTTP communication with proper error messages as per RFC7807. Retry logic and content-type validation reduce transmission failures.
Platform Compatibility: The tool is portable across Windows, Linux, and macOS, designed with cross-platform Go binaries and lightweight Docker images. Kubernetes deployment was also successfully demonstrated.
CI/CD and Testing: GitHub Actions was used to integrate unit testing and Docker builds into the development lifecycle. Unit tests validate critical logic like entry parsing and configuration loading, supporting maintainability.
Security: The tool prevents data injections through strict input validation, MIME filtering, and logging. Rate-limiting mitigates brute-force and DoS risks.

The main research question was effectively answered through both theoretical and practical research. The SDI tool can now act as a secure, scalable ingestion gateway for multiple scanners and custom data sources, supporting Guardian360’s platform growth.