Principal Machine Learning Engineer - Production Systems
hace 12 días
Bristol
Principal Machine Learning Engineer – Production Systems Overview SoftInWay UK Ltd. Is seeking a highly experienced ML Systems Architect to design and implement a scalable, production-grade architecture for our machine learning solver. This role bridges research prototypes and commercial deployment, ensuring reliability, maintainability, and performance in a mixed technology stack. Responsibilities • Architect the ML Solver Platform:, • Define modular architecture for data preprocessing, model execution, and post-processing., • Establish clear API contracts between Python/TensorFlow and C# services., • Productionize ML Workflows:, • Convert research code into robust, testable, and observable services., • Implement CI/CD pipelines, automated testing, and reproducibility standards., • Integration & Interoperability:, • Design REST/gRPC endpoints for cross-language communication., • Ensure compatibility with C#/.NET services., • Performance & Scalability:, • Optimize GPU/CPU utilization, batching strategies, and memory management., • Plan for multi-model and multi-tenant scenarios., • MLOps & Lifecycle Management:, • Implement model versioning, artifact registries, and deployment workflows., • Set up monitoring, logging, and alerting for solver performance., • Security & Compliance:, • Apply best practices for secrets management, dependency scanning, and secure artifact storage. Required Skills & Experience • ML Frameworks: Expert in TensorFlow (TF2/Keras), experience with ONNX Runtime for inference., • Programming: Advanced Python for ML; strong understanding of packaging, type checking, and performance profiling., • Architecture: Proven experience designing scalable ML systems for production., • APIs: Proficiency in gRPC/Protobuf and REST for cross-language integration., • MLOps: CI/CD pipelines, containerization (Docker/Kubernetes), model registries, reproducibility., • Performance Optimization: GPU acceleration (CUDA/cuDNN), mixed precision, XLA, profiling., • Observability: Metrics, tracing, structured logging, dashboards., • Experience with ONNX Runtime Training, PyTorch, or hybrid ML architectures., • Familiarity with distributed training strategies and multi-GPU setups., • Knowledge of feature stores and data validation frameworks., • Exposure to regulated environments and compliance frameworks. Tools & Technologies • ML: TensorFlow, ONNX Runtime, tf2onnx., • APIs: FastAPI, gRPC., • DevOps: GitLab CI/GitHub Actions, Docker, Kubernetes., • Monitoring: Prometheus, Grafana, OpenTelemetry., • Security: HashiCorp Vault, Sigstore. Why Join Us? • Work on cutting-edge ML solutions integrated into commercial engineering software., • Define architecture that scales across global deployments., • Collaborate with a team of experts in ML, software engineering, and UI development., • Competitive salary and benefits. To apply: Send your resume and a brief cover letter to