[Lan Xie Notes] Neural Adaptive Video Streaming with Pensieve

2017-10-10

Posted by 谢澜

Neural Adaptive Video Streaming with Pensieve by Hongzi Mao, Ravi Netravali, Mohammad Alizadeh (MIT). This paper is accepted by SIGCOMM17.

Topic: adaptive bitrate (ABR) algorithm

Challenges:

  1. Network can fluctuate over time and hard to be predicted. Therefore, ABR algorithms should consider more stable signals like buffer occupancy.
  2. QoE factors should be balanced.
  3. The result of ABR algorithm has cascading effects.
  4. The tradeoff between high quality and playback continuity.

In this paper, the author considers a learning based ABR algorithm which applies Reinforcement Learning (RL) — Pensive . The goal of the learning is to maximize the expected QoE. Fig.1 shows how RL is applied to ABR.

f2

Pensive uses A3C as the training algorithm for RL, a state-of-the-art actor-critic method which contains training two neural networks. More specifically, the diagram of the network is:

f1

Inputs

As we can see, it contains two parts: an actor network and a critic network. Both of them take the same input and has two hidden layers. The inputs are:

  • x: network throughput for past k chunks (k = 8)
  • \tau: download time of the past k chunks
  • nm available sizes of the next chunk (m = 6)
  • b: current buffer level
  • c: number of chunks remaining in the video
  • l: last chunk’s bitrate

Actor-Critic Network

The goal of the actor network is to learn a policy \pi : \pi(s, a),  which indicates the probability that action a is taken in state s.

The goal of the critic network is to learn an estimate of value function (the expected total reward starting at state s and following the policy \pi).

The networks parameters \theta are updated during the training process. Generally, \theta are calculated as:

WechatIMG91

In practice, the agent samples a trajectory of bitrate decisions and uses the empirically computed advantage A(s,a).

WechatIMG92

By adding regularization,\theta are updated according:

WechatIMG93

 

Some Details of Network Layers

The 1D-CNN layer process vector inputs. It uses 128 filters, each of size 4 with stride 1. The results of 1D-CNN are gathered with other inputs in a hidden layer that uses 128 neurons to apply the softmax function.