Job Summary
Role Summary
Our client is looking for a strong Data Engineer with hands-on experience in data platforms, large-scale scraping, LLMs, and agentic AI solutions.
This person should be able to build reliable data pipelines and also help design AI agents that can use tools, query systems, process data, and automate complex workflows.
Key Responsibilities
- Build scalable data pipelines for batch and near-real-time ingestion.
- Ingest data from APIs, files, databases, event streams, and web sources.
- Design and operate large-scale scraping and data acquisition solutions.
- Clean, validate, transform, and model data for analytics, reporting, and AI use cases.
- Build LLM-powered workflows and agentic solutions using tool calling, structured outputs, RAG, and API/database integration.
- Support data lake, warehouse, and lakehouse architectures.
- Implement data quality checks, schema validation, monitoring, and observability.
- Work with engineering teams to build production-ready, secure, and maintainable solutions.
Required Experience
- Strong Python and/or Node.js/TypeScript experience.
- Strong SQL and relational database experience.
- Proven experience building production-grade data pipelines.
- Experience with AWS or similar cloud platforms.
- Experience with data lakes, warehouses, or lakehouse technologies.
- Practical experience with LLM APIs such as OpenAI, Anthropic, Gemini, Bedrock, or similar.
- Experience with agentic patterns such as tool use, task decomposition, retrieval, memory, and human-in-the-loop workflows.
- Experience with scraping tools such as Playwright, Puppeteer, Scrapy, or Selenium.
- Good understanding of APIs, CI/CD, containers, testing, and secure engineering practices.
Desirable Experience
- Snowflake, Trino, Athena, Iceberg, Delta Lake, Databricks, or BigQuery.
- Kafka, Kinesis, SQS, SNS, Airflow, Dagster, Prefect, or Temporal.
- Vector databases such as pgvector, Pinecone, Weaviate, Qdrant, or OpenSearch.
- LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, Semantic Kernel, or MCP.
- Experience with data governance, PII handling, encryption, auditability, and compliance.
Candidate Profile
- The ideal candidate is a practical builder who can turn ambiguous requirements into working systems.
- They should be comfortable across data engineering, scraping, automation, and applied AI — with a focus on reliable production solutions rather than demos.
Example Projects
- Build scalable scraping pipelines for public market and supplier data.
- Create AI agents that extract, validate, and structure data from approved sources.
- Build data pipelines into a lakehouse or warehouse.
- Develop RAG and LLM-powered assistants over business data.
- Automate manual research and data preparation workflows.
Success Measures
- Reliable data pipelines in production.
- High-quality structured data available for analytics and AI.
- Useful and controlled agentic workflows.
- Reduced manual data collection and preparation effort.
- Strong engineering standards across code, documentation, and operations.
Kindly regard your application as unsuccessful if you have not heard from the agency within 2 weeks.