1 3D虚拟人口唇同步为什么需要音素信息

音素是语音中最小的区分单位,而Viseme(视位),是说话时音素对应的视觉描述,定义了一个人说话时嘴巴以及面部的位置,每一个视素都描述了一组特定音素对应的面部姿态和口唇形状。视素和音素之前不存在一一对应的关系,而是多对一的关系,通常多个音素对应一个视素,因为一些音素发音时其面部和嘴唇位置形似。
在常规方案3D虚拟人口唇同步方案中,常规通过获取语音合成过程中音素的时间戳信息,通过音素时间戳信息计算每一个音素的开始和结束时间,进一步计算每一个视位的开始和结束时间,而视位则可以通过blendshape来表示,所以如果语音合成能伴随输出音素时间戳信息,则对后续3D虚拟人口唇同步非常有用。

2 可输出音素时间戳信息的TTS接口

2.1 微软语音合成

微软文本转语音服务中的神经网络TTS,可以将输入文本或者SSML(语音合成标记语言)转换为语音,同时可以附带输出viseme ID、2D Scalable Vector Graphics (SVG) 权重、3D blendshapes 权重。
相关文档可参考:https://learn.microsoft.com/en-us/azure/ai-services/speech-service/how-to-speech-synthesis-viseme?tabs=visemeid&pivots=programming-language-cpp

通过附带输出的Viseme ID和Blendshape权重我们就可以使用这些信息进行虚拟人口唇同步。

{
    "FrameIndex":0,
    "BlendShapes":[
        [0.021,0.321,...,0.258],
        [0.045,0.234,...,0.288],
        ...
    ]
}

2.2 阿里语音合成

阿里语音合成也支持在输出音频流的同时,可输出每个汉字/英文单词在音频中的时间位置,即时间戳,时间戳功能又叫字级别音素边界接口。该时间信息可用于驱动虚拟人口型、做视频配音字幕等。

相关文档可参考:https://help.aliyun.com/zh/isi/developer-reference/timestamp-feature?spm=a2c4g.11186623.0.i3

输出结果示例:

//今天
{
  "subtitles":[
  {"begin_index":0,"begin_time":0,"end_index":1,"end_time":100,
    "phoneme_list":[
      {"index":0, "begin_time":0, "end_time": 20, "phoneme":"hh", "tone":0},
      {"index":1, "begin_time":20, "end_time": 40, "phoneme":"ah", "tone":0},
      {"index":1, "begin_time":40, "end_time": 60, "phoneme":"l", "tone":1},
      {"index":1, "begin_time":60, "end_time": 100, "phoneme":"ow", "tone":1}
    ],
    "text":"hello", "phoneme":"hh ah l ow"},
  {"begin_index":1,"begin_time":100,"end_index":2,"end_time":200,
    "phoneme_list":[
      {"index":0, "begin_time":100, "end_time": 150, "phoneme":"j_c", "tone":1},
      {"index":1, "begin_time":150, "end_time": 200, "phoneme":"in_c", "tone":1}
    ],
    "text":"今", "phoneme":"j_c in_c"},
  {"begin_index":1,"begin_time":200,"end_index":2,"end_time":400,
    "phoneme_list":[
      {"index":0, "begin_time":200, "end_time": 300, "phoneme":"t_c", "tone":1},
      {"index":1, "begin_time":300, "end_time": 400, "phoneme":"ian_c", "tone":1}
    ],"text":"天", "phoneme":"t_c ian_c"}
  ]
}

2.3 腾讯语音合成

腾讯语音合成也提供音素时间戳,相关文档如下:https://cloud.tencent.com/document/product/1073/57374

2.4 火山引擎语音合成

相关文档:https://www.volcengine.com/docs/6561/79823

输出结果示例:

{
    "reqid": "reqid",
    "code": 3000,
    "operation": "query",
    "message": "Success",
    "sequence": -1,
    "data": "base64 encoded binary data",
    "addition": {
        "description": "...",
        "duration": "1960",
        "frontend": "{
            "words": [{
                "word": "字",
                "start_time": 0.025,
                "end_time": 0.185
            },
            ... 
            {
                "word": "。",
                "start_time": 1.85,
                "end_time": 1.955
            }],
            "phonemes": [{
                "phone": "C0z",
                "start_time": 0.025,
                "end_time": 0.105
            },
            ... 
            {
                "phone": "。",
                "start_time": 1.85,
                "end_time": 1.955
            }]
        }"
    }
}

2.5 百度语音合成

百度语音合成只支持词级别时间戳信息,相关文档:https://ai.baidu.com/ai-doc/SPEECH/ulbxh8rbu

输出示例如下:

{
    "log_id": 16739423288701914,
    "tasks_info": [
        {
            "task_status": "Success",
            "task_result": {
                "speech_url": "http://bj.bcebos.com/aipe-speech/text_to_speech/2023-01-17/63c6550e52064d000104da0d/speech/0.mp3?authorization=bce-auth-v1%2F8a6ca9b78c124d89bb6bca18c6fc5944%2F2023-01-17T07%3A58%3A12Z%2F259200%2F%2Fbb3f38b53425ced397a107aebe21d2e951ed0e27a964f39c2a350249ba07b47c",
                "speech_timestamp": {
                    "sentences": [
                        {    "paragraph_index": 0,
                            "sentence_texts": "今年上半年我国工业经济面临的内外部环境还是比较严峻复杂的",
                            "begin_time": 104,
                            "end_time": 5970,
                            "characters": [
                                {
                                    "character_text": "今",
                                    "begin_time": 106,
                                    "end_time": 313
                                },
                                {
                                    "character_text": "年",
                                    "begin_time": 316,
                                    "end_time": 522
                                },
                                {
                                    "character_text": "上",
                                    "begin_time": 525,
                                    "end_time": 732
                                },
                                {
                                    "character_text": "半",
                                    "begin_time": 735,
                                    "end_time": 941
                                },
                                {
                                    "character_text": "年",
                                    "begin_time": 944,
                                    "end_time": 1151
                                },
                                {
                                    "character_text": "我",
                                    "begin_time": 1154,
                                    "end_time": 1360
                                },
                                {
                                    "character_text": "国",
                                    "begin_time": 1363,
                                    "end_time": 1570
                                },
                                {
                                    "character_text": "工",
                                    "begin_time": 1573,
                                    "end_time": 1779
                                },
                                {
                                    "character_text": "业",
                                    "begin_time": 1782,
                                    "end_time": 1989
                                },
                                {
                                    "character_text": "经",
                                    "begin_time": 1992,
                                    "end_time": 2198
                                },
                                {
                                    "character_text": "济",
                                    "begin_time": 2201,
                                    "end_time": 2408
                                },
                                {
                                    "character_text": "面",
                                    "begin_time": 2411,
                                    "end_time": 2617
                                },
                                {
                                    "character_text": "临",
                                    "begin_time": 2620,
                                    "end_time": 2827
                                },
                                {
                                    "character_text": "的",
                                    "begin_time": 2830,
                                    "end_time": 3036
                                },
                                {
                                    "character_text": "内",
                                    "begin_time": 3039,
                                    "end_time": 3246
                                },
                                {
                                    "character_text": "外",
                                    "begin_time": 3249,
                                    "end_time": 3455
                                },
                                {
                                    "character_text": "部",
                                    "begin_time": 3458,
                                    "end_time": 3664
                                },
                                {
                                    "character_text": "环",
                                    "begin_time": 3667,
                                    "end_time": 3874
                                },
                                {
                                    "character_text": "境",
                                    "begin_time": 3877,
                                    "end_time": 4083
                                },
                                {
                                    "character_text": "还",
                                    "begin_time": 4086,
                                    "end_time": 4293
                                },
                                {
                                    "character_text": "是",
                                    "begin_time": 4296,
                                    "end_time": 4502
                                },
                                {
                                    "character_text": "比",
                                    "begin_time": 4505,
                                    "end_time": 4712
                                },
                                {
                                    "character_text": "较",
                                    "begin_time": 4715,
                                    "end_time": 4921
                                },
                                {
                                    "character_text": "严",
                                    "begin_time": 4924,
                                    "end_time": 5131
                                },
                                {
                                    "character_text": "峻",
                                    "begin_time": 5134,
                                    "end_time": 5340
                                },
                                {
                                    "character_text": "复",
                                    "begin_time": 5343,
                                    "end_time": 5550
                                },
                                {
                                    "character_text": "杂",
                                    "begin_time": 5553,
                                    "end_time": 5759
                                },
                                {
                                    "character_text": "的",
                                    "begin_time": 5762,
                                    "end_time": 5969
                                }
                            ]
                        }
                    ]
                }
            },
            "task_id": "63c6550e52064d000104da0d"
        }
    ]
}

2.6 云知声语音合成

相关文档:https://ai.unisound.com/doc/ttslong/WebAPI.html

输出示例:

[{
        "sentence": "你好",
        "phoneme": [
            {
                "end": 208.98,
                "phone": "sil",
                "start": 0,
                "type": 8
            },
            {
                "end": 278.639,
                "phone": "n",
                "start": 208.98,
                "type": 1
            },
            {
                "end": 417.959,
                "phone": "i",
                "start": 278.639,
                "type": 4
            },
            {
                "end": 557.279,
                "phone": "h",
                "start": 417.959,
                "type": 1
            },
            {
                "end": 801.088,
                "phone": "ao",
                "start": 557.279,
                "type": 4
            },
            {
                "end": 1001.361,
                "phone": "sil",
                "start": 801.088,
                "type": 8
            }
        ]
    }]