Amazon
SDE Intern
May 2025 - August 2025
Integrated NVIDIA TensorRT-LLM into the MARS platform alongside vLLM, enabling multi-library LLM inference optimization and generating ready-to-deploy containers. Refactored key components of the platform to support library-agnostic workflows and enhance maintainability. Additionally, automated end-to-end container build and deployment pipelines using AWS CDK, ensuring seamless integration with services like EKS, ECS, and SageMaker. To guide performance tuning, I benchmarked and analyzed latency, throughput, and cost trade-offs between TensorRT-LLM and vLLM.