Empowering Data Engineering Pipelines with Zero-Shot Learning for Seamless Automated Mapping

Authors

  • Yuvaraj Kavala Author

Keywords:

Zero-Shot Learning, Data Engineering, Automated Data Mapping, Semantic Embeddings, Schema Matching, Ontology Alignment, Data Integration, Unsupervised Learning

Abstract

Zero-shot learning (ZSL), which enables models to recognize unseen classes without prior labeled examples, has gained significant interest in machine learning, yet its application in data engineering—particularly for automating data mapping across heterogeneous sources—remains underexplored. Data mapping, the alignment of data attributes between disparate systems, is traditionally labour-intensive and error-prone, limiting scalability in complex integration scenarios. This paper proposes a novel zero-shot learning framework designed to fully automate data mapping without the need for extensive labeled data. Leveraging semantic embeddings, natural language processing, and ontology alignment, the approach infers attribute mappings by understanding semantic relationships and domain context in an unsupervised manner. Evaluations on real-world healthcare and financial datasets featuring diverse and evolving schemas demonstrate that the framework achieves over 90% mapping accuracy on unseen attribute pairs, outperforming baseline unsupervised and rule-based methods. Precision and recall metrics further confirm its robustness across heterogeneous data types. Qualitative feedback from domain experts highlights the high interpretability and practical usefulness of automated mapping explanations, fostering greater trust and easier downstream validation. Compared to traditional supervised approaches, the zero-shot framework significantly reduces dependence on labeled data and manual effort, accelerating deployment timelines by up to 40%. Case studies also showcase its ability to adapt seamlessly to schema changes without retraining, emphasizing scalability and flexibility in dynamic data environments. While semantic ambiguities occasionally impact mapping precision, future work will focus on improved disambiguation mechanisms. Overall, this study demonstrates the potential of integrating zero-shot learning into data engineering pipelines to transform data integration workflows and support intelligent, adaptable data ecosystems.

Downloads

Published

2025-02-17