Vision transformer on high-resolution images can learn richer visual representation. However, the improved performance comes at the cost of huge computation complexity. Thus, we present SparseViT, which accelerates high-resolution visual processing by skipping less important regions during computation.