Deep Render is a Deep Tech startup founded to liberate the world of all bandwidth constraints by pioneering AI-Compression technology. Our compression codecs are based on a fundamental technology shift, representing over 100 years of progress in the industry.

We're leading the way in AI-Compression, having the world's first AI-Codec running in real-time on mobile devices. Our team comes with 50+ years of combined research experience with over 40+ filed patents.

Deep Render has recently raised its Series A funding round from top-tier investors and is looking to double/triple its current 20-person team. We're in commercial engagements with some of the largest Big Tech companies in the world and expect hundreds of millions of people to use the Deep Render AI Codec by 2024.

Role:

We're looking for Performance Engineers to join us to be part of a highly talented team bringing the next step-change in compression technology to billions of users, creating enormous global impact and value.

We're looking for engineers who will help us deliver AI-based compression to end-users by porting our codec from GPU-based systems to (primarily) mobile platforms with NPUs and (secondarily) mid-range GPU/CPU systems, thus going from research to production. You'll enjoy working with low-level code and are comfortable with programming across multiple platforms.

The ideal candidate will have a deep understanding of optimisation methodologies to reduce runtime and memory footprint, preferably for neural networks; and/or experience implementing high-performance entropy coding algorithms such as Huffman Coding, Arithmetic Coding, Range Coding, or Asymmetric numeral systems. The ideal candidate will have some experience with taking algorithms from research to deployment.

Responsibilities:

Work in a team to port ML research algorithms to edge devices with an initial focus on smartphones (Android, iOS)
Profile various algorithms to analyse performance and identify any bottlenecks. Profiling includes data loading, data movement, data caching, operation count, execution chipset, warm-up latency and others
Implement solutions to the identified bottlenecks
Implement a high-performance entropy coding algorithm, e.g. Range Coding or Asymmetric Numeral Systems, across different hardware architectures
Optional: Write custom operations using the low-level API for Android (OpenGL ES) and iOS (Metal) systems
Optional: Apply standard neural network runtime optimisation methods such as pruning, low-bit quantisation, architecture tuning, batching and others

Must have:

At a minimum, a Bachelor's degree in computer science or related field (Mathematics, Physics, Engineering)
At a minimum, 3-5 years of experience in performance optimisation
Formal training could come through education, work experience and/or extensive private projects.
Expertise in C++
Some experience with optimisation techniques. Examples include SIMD (SSE, AVX), vectorisation, loop dependencies, multithreading, multi-processor usage, and tensor cores

Preferred skills:

One of the following: Either some formal training in machine learning (understanding PyTorch and/or Tensorflow) or some formal training in entropy coding methods (understanding Range Coding or similar algorithms).
Significant experience with ML-programming in either Android and IOS: Android Studio, XCode, Google ML, Core ML. Knowledge of the development stack for Android and iOS
Experience with Android NNAPI and or other Android-based NPU SDKs (Exynos, Hexagon HiSilicon)

This job is no longer accepting applications

See open jobs at Deep Render.See open jobs similar to "ML Performance Engineer" Speedinvest.

See more open positions at Deep Render

Privacy policy Cookie policy