Efficient AI Computing,
Transforming the Future.

Guangxuan Xiao

Ph.D

(Graduated)

His research interests focus on the development of efficient algorithms and systems for deep learning, specifically large foundation models. His work has received over 8000 stars on GitHub. His work has a real-world impact: SmoothQuant has been integrated into NVIDIA's TensorRT-LLM, FasterTransformer and Intel's NeuralCompressor and is utilized in the LLMs of industry companies like Amazon, Meta, and Huggingface. StreamingLLM has been integrated into NVIDIA's TensorRT-LLM, Huggingface's transformers, and Intels' Extension for Transformers.

Honors and Fellowships

No items found.

Competition Awards

No items found.

Awards

team

received

Best Paper Award

of

MLSys 2024

.

Open source projects with over 1K GitHub stars

Efficient Streaming Language Models with Attention Sinks Code

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration Code

Tiny Machine Learning Projects Code

Projects

XAttention: Block Sparse Attention with Antidiagonal Scoring

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

International Journal of Computer Vision 2024

Blog Posts

Currently no blog posts.

Talks

No items found.