Milos Nikolic
Senior AI/ML Engineer
Experience
Senior AI/ML Engineer
Meta AI
- Agentic AI Platform (V3 Architecture): Architected and implemented a production multi-agent system with 5-stage orchestration (Planner Actioner Executor Feedback Evaluator); achieved 85% task success rate (+20% vs V2) through modular pipeline design and self-correcting feedback loops.
- RAG Retrieval System: Built end-to-end retrieval pipeline with sliding-window chunking (50 lines + 10-line overlap), hybrid BM25+vector search, and parallel LLM summarization; improved retrieval precision by 85% and reduced hallucinations by 35% through semantic indexing.
- Enterprise Connectors: Developed Confluence and Jira integrations with full authentication, webhook support, and error handling; enabled real-time knowledge base updates and cross-platform data synchronization for agent context.
- LLM-as-Judge Evaluation Framework: Implemented automated evaluation system with golden test cases, tournament scoring, and regression testing; shifted from subjective 'looks good' assessments to quantitative evaluation with 15+ benchmark configurations across 6 task categories.
- Web-Scale Ranking Platform: Built TensorFlow/JAX/TFX pipelines on Vertex AI + Dataflow + BigQuery; improved CTR +18% on 100M+ sessions/month; cut p95 latency -35% (≈210ms→136ms) via feature-store redesign and hard negatives.
- Privacy-preserving personalization: Launched federated learning + DP for PT/ES/EN markets; ensured offline/online metric parity (AUC/PR, calibration) and automated drift alerts.
- Experimentation Suite: Standardized A/B & interleaving tests; reusable metrics/dashboards reduced time-to-decision from 2–3 weeks to <5 days.
- GenAI/RAG evaluation: Built offline evaluator and guardrails that reduced hallucinations by ~35% and improved answer F1 by +7 pts while lowering p95 latency by 20%.
- ML Reliability Program: Ran Kubernetes/Docker microservices with model registry, shadow/canary deploys, rollbacks; maintained 99.9% inference SLO, MTTR <10 min; mentored 6 DS/ML Eng and partnered with 4 product teams (BR/US).
Data Scientist
Databricks
- SageMaker Churn & propensity platform: Productionized models on SageMaker with model registry, CI/CD, and blue-green deployment, reducing churn by 22% across three pilot cohorts (around 45k users); included monitoring via MLflow and custom drift detectors.
- Real-time lakehouse: Designed an S3, Glue, Athena, and EMR data plane ingesting over 10 TB per day; implemented streaming features with Kafka and Spark to enable about 1.8k QPS Lambda/Fargate inference.
- LATAM Regulated Templates: Delivered reference architectures that reduced time-to-production from roughly 3 weeks to 6 hours and lowered infrastructure costs by 18% through improved observability.
- Model Governance: Implemented feature lineage, PII safeguards, and model calibration (ECE, Brier) to ensure consistent and auditable performance.
Principal ML Consultant
Capgemini Invent
- Enterprise labeling platform: Flask/React system to train/retrain CV/NLP models; reduced dataset turnaround by 50% (4 wks → 2 wks) for a tier-1 bank & public-sector client.
- Anti-spoofing & error monitoring: Deployed scikit-learn/PyTorch models and a centralized Flask/DB2 error API; reduced critical incidents by 23% QoQ.
- Serverless identity: Built Cloud Functions + Cloud SQL user-management; lowered access-ticket resolution time by 30% and simplified audits.
Summary
Senior AI/ML Engineer with over 10 years delivering production-grade AI that drives measurable business impact.
Scope: Agentic AI, GenAI/RAG, ranking, NLP/CV, and large-scale experimentation; built calibrated, monitored, and drift-resilient ML systems (AUC/PR, ECE).
Platforms: Python; TensorFlow/PyTorch; AWS (SageMaker) & GCP (Vertex AI); Kubernetes/Airflow/MLflow; feature stores; 99.9% real-time inference SLO.
Results: +18% CTR at web scale, -35% p95 latency, -22% churn, -18% infra cost.
Core stack: Python (10+), TensorFlow (6+), PyTorch (5+), Scikit-learn (9+), XGBoost/LightGBM (7+), Transformers/HuggingFace (5+), LangChain/RAG (3+), Vector Databases (FAISS, Pinecone, PostgreSQL) (3+), Airflow/MLflow/Kubernetes/Docker (6+), AWS (SageMaker, S3, Glue, Athena, EMR, Lambda, Fargate) (4+), GCP (Vertex AI, Dataflow, BigQuery) (3+), Spark/Kafka (5+), Feature Store/TFX (3+), SQL/Snowflake/BigQuery (7+), FastAPI/Flask (6+), REST/GraphQL (5+), CI/CD (Jenkins, GitHub Actions) (5+), NLP (8+), Computer Vision (7+), Federated Learning & Responsible AI (2+).
Skills
- Agentic Ai & Orchestration: Multi-agent (Planner Tools Critic) Tool Routing; Short/long-term Memory; Self-reflection; Guardrails; Llm-as-judge
- Rag & Retrieval: Hybrid Bm25 + Dense; Cross-encoder Reranking; Query-intent Routing; Semantic Chunking; Vector Stores (Faiss, Pinecone)
- Serving & Systems: Ray Serve; Nvidia Triton; Gpu Inference; Fastapi Microservices; Distributed Training; Api Design; Kubernetes/docker
- Mlops & Evaluation: Mlflow; Model Registry; Ci/cd; Feature Stores; Monitoring & Drift/bias; Prometheus/opentelemetry
- Data & Streaming: Spark; Kafka; Redis; Sql; Hl7/fhir Pipelines
- Cloud & Infra: Aws (Sagemaker, Lambda, S3); Gcp (Vertex Ai, Bigquery); Azure (Azure Ml)
- Databases & Warehouses: Postgresql; Azure Cosmos Db; Dynamodb; Snowflake; Bigquery
- Languages & Frameworks: Python, C++, Java, Javascript/typescript; React (Typescript); Fastapi; Flask; Rest/graphql
Languages
Education
Nanyang Technological University (NTU)
Master of Science in Computer Science · Computer Science · Singapore
Nanyang Technological University (NTU)
Bachelor of Science in Computer Science · Computer Science · Singapore
Similar Freelancers
Discover other experts with similar qualifications and experience