FFmpeg 101 (2024) **FFmpeg 101 (2024)**
FFmpeg 101 (2024)

原始链接: https://blogs.igalia.com/llepage/ffmpeg-101/

## FFmpeg 概述 FFmpeg 是一套强大、开源的工具和库,用于处理多媒体。它包含诸如 `ffmpeg`(用于转换)、`ffplay`(一个简单的播放器)和 `ffprobe`(用于流分析)等命令行工具,以及用于集成到自定义项目中的库。 核心库包括 `libavformat`(I/O 和复用/解复用)、`libavcodec`(编码/解码)以及用于过滤、缩放和重采样的其他库。典型的流程包括将多媒体流解复用为音频和视频组件,然后使用诸如 `AVFormatContext`、`AVStream`、`AVPacket` 和 `AVFrame` 等结构将这些组件解码为原始数据。 该过程从使用 `avformat_open_input` 打开文件、使用 `avformat_find_stream_info` 查找流信息,然后迭代流以提取帧率和编解码器类型等详细信息开始。编解码器使用 `avcodec_find_decoder` 识别,并为解码创建一个 `AVCodecContext`。最后,使用 `av_read_frame` 读取数据包,使用 `avcodec_send_packet` 将数据包发送到解码器,并使用 `avcodec_receive_frame` 接收解码后的帧。 一个完整的示例,可在 `ffmpeg-101` 仓库中找到,演示了此过程,并可以使用 `meson` 和 `ninja` 构建。

最近的 Hacker News 讨论突出了一份新的 FFmpeg 101 资源 (igalia.com),引发了人们对这个强大的多媒体框架的兴趣。虽然有些人觉得最初的文章命令很简单,但用户热情地推荐 Leandro Moreira 的深入教程 ([https://github.com/leandromoreira/ffmpeg-libav-tutorial](https://github.com/leandromoreira/ffmpeg-libav-tutorial)) 作为真正理解 FFmpeg 及其底层库 libav 的最佳方式。 对话中也反映了 FFmpeg 发展之迅速(“FFmpeg 8 感觉就像昨天!”)以及对其能力的赞赏——一位用户称其为组装视频的“超能力”。 另一个有用的指南也通过链接到之前的 YCombinator 讨论分享。 总而言之,该帖子表明了社区对掌握这个基本工具的强烈兴趣。
相关文章

原文

A high-level architecture overview to start with FFmpeg.

Code repository: ffmpeg-101

FFmpeg package content

FFmpeg is composed of a suite of tools and libraries.

FFmpeg tools

The tools can be used to encode/decode/transcode a multitude of different audio and video formats, and to stream the encoded media over networks.

  • ffmpeg: a command line tool to convert multimedia files between formats
  • ffplay: a simple mediaplayer based on SDL and the FFmpeg libraries
  • ffprobe: a simple multimedia stream analyzer

FFmpeg libraries

The libraries can be used to integrate those same features into your own product.

  • libavformat: I/O and muxing/demuxing
  • libavcodec: encoding/decoding
  • libavfilter: graph-based filters for raw media
  • libavdevice: input/output devices
  • libavutil: common multimedia utilities
  • libswresample: audio resampling, samples format conversion and audio mixing
  • libswscale: color conversion and image scaling
  • libpostproc: video post-processing (deblocking/noise filters)

FFmpeg simple player

A basic usage of FFmpeg is to demux a multimedia stream (obtained from a file or from the network) into its audio and video streams and then to decode those streams into raw audio and raw video data.

To manage the media streams, FFmpeg uses the following structures:

  • AVFormatContext: a high level structure providing sync, metadata and muxing for the streams
  • AVStream: a continuous stream (audio or video)
  • AVCodec: defines how data are encoded and decoded
  • AVPacket: encoded data in the stream
  • AVFrame: decoded data (raw video frame or raw audio samples)

The process used to demux and decode follows this logic:

basic processing

Here is the basic code needed to read an encoded multimedia stream from a file, analyze its content and demux the audio and video streams. Those features are provided by the libavformat library and it uses the AVFormatContext and AVStream structures to store the information.


AVFormatContext* format_context = avformat_alloc_context();


avformat_open_input(&format_context, filename, NULL, NULL);
printf("File: %s, format: %s\n", filename, format_context->iformat->name);


avformat_find_stream_info(format_context, NULL);


for (unsigned int i = 0; i < format_context->nb_streams; ++i)
{
    AVStream* stream = format_context->streams[i];

    printf("---- Stream %02d\n", i);
    printf("  Time base: %d/%d\n", stream->time_base.num, stream->time_base.den);
    printf("  Framerate: %d/%d\n", stream->r_frame_rate.num, stream->r_frame_rate.den);
    printf("  Start time: %" PRId64 "\n", stream->start_time);
    printf("  Duration: %" PRId64 "\n", stream->duration);
    printf("  Type: %s\n", av_get_media_type_string(stream->codecpar->codec_type));

    uint32_t fourcc = stream->codecpar->codec_tag;
    printf("  FourCC: %c%c%c%c\n", fourcc & 0xff, (fourcc >> 8) & 0xff, (fourcc >> 16) & 0xff, (fourcc >> 24) & 0xff);
}


avformat_close_input(&format_context);

Once we’ve got the different streams from inside the multimedia file, we need to find specific codecs to decode the streams to raw audio and raw video data. All codecs are statically included in libavcodec. You can easily create your own codec by just creating an instance of the FFCodec structure and registering it as an extern const FFCodec in libavcodec/allcodecs.c, but this would be a different topic for another post.

To find the codec corresponding to the content of an AVStream, we can use the following code:


AVStream* stream = format_context->streams[i];


const AVCodec* codec = avcodec_find_decoder(stream->codecpar->codec_id);
if (!codec)
{
    fprintf(stderr, "Unsupported codec\n");
    continue;
}
printf("  Codec: %s, bitrate: %" PRId64 "\n", codec->name, stream->codecpar->bit_rate);

if (codec->type == AVMEDIA_TYPE_VIDEO)
{
    printf("  Video resolution: %dx%d\n", stream->codecpar->width, stream->codecpar->height);
}
else if (codec->type == AVMEDIA_TYPE_AUDIO)
{
    printf("  Audio: %d channels, sample rate: %d Hz\n",
        stream->codecpar->ch_layout.nb_channels,
        stream->codecpar->sample_rate);
}

With the right codec and codec parameters extracted from the AVStream information, we can now allocate the AVCodecContext structure that will be used to decode the corresponding stream. It is important to remember the index of the stream we want to decode from the former streams list (format_context->streams) because this index will be used later to identify the demuxed packets extracted by the AVFormatContext.

In the following code we’re going to select the first video stream contained in the multimedia file.


int first_video_stream_index = ...;

AVStream* first_video_stream = format_context->streams[first_video_stream_index];
AVCodecParameters* first_video_stream_codec_params = first_video_stream->codecpar;
const AVCodec* first_video_stream_codec = avcodec_find_decoder(first_video_stream_codec_params->codec_id);


AVCodecContext* codec_context = avcodec_alloc_context3(first_video_stream_codec);


avcodec_parameters_to_context(codec_context, first_video_stream_codec_params);


avcodec_open2(codec_context, first_video_stream_codec, NULL);

Now that we have a running decoder, we can extract the demuxed packets using the AVFormatContext structure and decode them to raw video frames. For that we need 2 different structures:

  • AVPacket which contains the encoded packets extracted from the input multimedia file,
  • AVFrame which will contain the raw video frame after the AVCodecContext has decoded the former packets.

AVPacket* packet = av_packet_alloc();


AVFrame* frame = av_frame_alloc();


while (av_read_frame(format_context, packet) >= 0)
{
    
    printf("Packet received for stream %02d, pts: %" PRId64 "\n", packet->stream_index, packet->pts);

    
    if (packet->stream_index == first_video_stream_index)
    {
        
        int res = avcodec_send_packet(codec_context, packet);
        if (res < 0)
        {
            fprintf(stderr, "Cannot send packet to the decoder: %s\n", av_err2str(res));
            break;
        }

        
        
        

        
        while (res >= 0)
        {
            
            res = avcodec_receive_frame(codec_context, frame);
            if (res == AVERROR(EAGAIN) || res == AVERROR_EOF)
            {
                
                break;
            }
            else if (res < 0)
            {
                fprintf(stderr, "Error while receiving a frame from the decoder: %s\n", av_err2str(res));
                goto end;
            }

            
            printf("Frame %02" PRId64 ", type: %c, format: %d, pts: %03" PRId64 ", keyframe: %s\n",
                codec_context->frame_num, av_get_picture_type_char(frame->pict_type), frame->format, frame->pts,
                (frame->flags & AV_FRAME_FLAG_KEY) ? "true" : "false");

            
            
        }
    }

    
    av_packet_unref(packet);
}


end:
    av_packet_free(&packet);
    av_frame_free(&frame);
    avcodec_free_context(&codec_context);
    avformat_close_input(&format_context);

The way the former code is acting is resumed in the next diagram:

processing diagram

You can download the full code here or directly access to the code repository.

To build the example you will need meson and ninja. If you have python and pip installed, you can install them very easily by calling pip3 install meson ninja. Then, once the example archive extracted to a ffmpeg-101 folder, go to this folder and call: meson setup build. It will automatically download the right version of FFmpeg if you don’t have it already installed on your system. Then call: ninja -C build to build the code and ./build/ffmpeg-101 sample.mp4 to run it.

You should obtain the following result:

File: sample.mp4, format: mov,mp4,m4a,3gp,3g2,mj2
---- Stream 00
  Time base: 1/3000
  Framerate: 30/1
  Start time: 0
  Duration: 30000
  Type: video
  FourCC: avc1
  Codec: h264, bitrate: 47094
  Video resolution: 206x80
---- Stream 01
  Time base: 1/44100
  Framerate: 0/0
  Start time: 0
  Duration: 440320
  Type: audio
  FourCC: mp4a
  Codec: aac, bitrate: 112000
  Audio: 2 channels, sample rate: 44100 Hz
Packet received for stream 00, pts: 0
Send video packet to decoder...
Frame 01, type: I, format: 0, pts: 000, keyframe: true
Packet received for stream 00, pts: 100
Send video packet to decoder...
Frame 02, type: P, format: 0, pts: 100, keyframe: false
Packet received for stream 00, pts: 200
Send video packet to decoder...
Frame 03, type: P, format: 0, pts: 200, keyframe: false
Packet received for stream 00, pts: 300
Send video packet to decoder...
Frame 04, type: P, format: 0, pts: 300, keyframe: false
Packet received for stream 00, pts: 400
Send video packet to decoder...
Frame 05, type: P, format: 0, pts: 400, keyframe: false
Packet received for stream 00, pts: 500
Send video packet to decoder...
Frame 06, type: P, format: 0, pts: 500, keyframe: false
Packet received for stream 00, pts: 600
Send video packet to decoder...
Frame 07, type: P, format: 0, pts: 600, keyframe: false
Packet received for stream 00, pts: 700
Send video packet to decoder...
Frame 08, type: P, format: 0, pts: 700, keyframe: false
Packet received for stream 01, pts: 0
Packet received for stream 01, pts: 1024
Packet received for stream 01, pts: 2048
Packet received for stream 01, pts: 3072
Packet received for stream 01, pts: 4096
Packet received for stream 01, pts: 5120
Packet received for stream 01, pts: 6144
Packet received for stream 01, pts: 7168
Packet received for stream 01, pts: 8192
Packet received for stream 01, pts: 9216
Packet received for stream 01, pts: 10240
Packet received for stream 01, pts: 11264
Packet received for stream 01, pts: 12288
Packet received for stream 01, pts: 13312
Packet received for stream 01, pts: 14336
Packet received for stream 01, pts: 15360
Packet received for stream 01, pts: 16384
Packet received for stream 01, pts: 17408
Packet received for stream 01, pts: 18432
Packet received for stream 01, pts: 19456
Packet received for stream 01, pts: 20480
Packet received for stream 01, pts: 21504
Packet received for stream 00, pts: 800
Send video packet to decoder...
Frame 09, type: P, format: 0, pts: 800, keyframe: false
Packet received for stream 00, pts: 900
Send video packet to decoder...
Frame 10, type: P, format: 0, pts: 900, keyframe: false
联系我们 contact @ memedata.com