Chinese speech to text. Output language: Include timestamps .