Violence detection is very important for public safety. However, violence detection is not an easy task. Because recognizing violence in surveillance video requires not only spatial information but also sufficient temporal information. In order to highlight the time information, we propose an efficient deep learning architecture for violence detection based on temporal attention mechanism, which utilizes pre-trained MobileNetV3, convolu-tional LSTM and temporal attention block Temporal Adaptive (TA). TA block can focus on further refining temporal information from spatial information extracted from backbone. Experimental results show the pro-posed model is validated on three publicly datasets: Hockey Fight, Movies, and RWF-2000 datasets.