Subido por pofufo

an evaluation of video quality assessment metrics for passive gaming video streaming

Anuncio
An Evaluation of Video Quality Assessment Metrics for Passive
Gaming Video Streaming
Nabajeet Barman
Kingston University
London, United Kingdom
n.barman@kingston.ac.uk
Steven Schmidt
Quality and Usability Lab, TU Berlin
Berlin, Germany
steven.schmidt@tu-berlin.de
Maria G. Martini
Saman Zadtootaghaj
Deutsche Telekom AG
Berlin, Germany
saman.zadtootaghaj@telekom.de
Sebastian Möller
Kingston University
London, United Kingdom
m.martini@kingston.ac.uk
Quality and Usability Lab, TU Berlin
Berlin, Germany
sebastian.moeller@tu-berlin.de
ABSTRACT
1
Video quality assessment is imperative to estimate and hence manage the Quality of Experience (QoE) in video streaming applications
to the end-user. Recent years have seen a tremendous advancement
in the field of objective video quality assessment (VQA) metrics,
with the development of models that can predict the quality of the
videos streamed over the Internet. However, no work so far has
attempted to study the performance of such quality assessment
metrics on gaming videos, which are artificial and synthetic and
have different streaming requirements than traditionally streamed
videos. Towards this end, we present in this paper a study of the performance of objective quality assessment metrics for gaming videos
considering passive streaming applications. Objective quality assessment considering eight widely used VQA metrics is performed
on a dataset of 24 reference videos and 576 compressed sequences
obtained by encoding them at 24 different resolution-bitrate pairs.
We present an evaluation of the performance behavior of the VQA
metrics. Our results indicate that VMAF predicts subjective video
quality ratings the best, while NIQE turns out to be a promising
alternative as a no-reference metric in some scenarios.
Gaming video streaming applications are becoming increasingly
popular. They can be divided into two different, but related, applications: interactive and passive services. Interactive gaming
video streaming applications are commonly known as cloud gaming, where the actual gameplay is performed on a cloud server. The
user receives the rendered gameplay video back on a client device
and then inputs corresponding game commands. Such applications
have received lots of attention, resulting in the rapid development
and acceptance of such services [1]. On the other hand, passive
gaming video streaming refers to applications such as Twitch.tv1 ,
where viewers can watch the gameplay of other gamers. Such applications have received much less attention from both the gaming and
video community despite the fact that Twitch.tv, with its nine million subscribers and about 800 thousand active viewers at the same
time, is alone responsible for the 4th highest peak Internet traffic
in the USA [2]. With the increasing popularity of such services,
along with demand for other over-the-top services such as Netflix
and YouTube, the demand on network resources has also increased.
Therefore, to provide the end-user with a service at a reasonable
Quality of Experience (QoE) and satisfy the user expectation of
anytime, anyplace and any-content video service availability, it is
necessary to optimize the video delivery process.
For the assessment of video quality, typically subjective tests are
carried out. However, these tests are time-consuming and expensive.
Thus, numerous efforts are being made to predict the video quality
through video quality assessment (VQA) metrics. Depending on
the availability and the amount of reference information, objective
video quality assessment (VQA) algorithms can be categorized into
full-reference (FR), reduced-reference (RR), and no-reference (NR)
metrics. So far, these metrics have been developed and tested for
non-gaming videos, usually considering video on demand (VoD)
streaming applications. Also, some of the metrics such as NIQE
and BRISQUE are based on Natural Scene Statistics (for details
see Section 2). Gaming videos, on the other hand, are artificial
and synthetic in nature, have different streaming requirements (1pass, Constant Bitrate (CBR)) and hence the performance of these
VQA metrics remains an open question. Our earlier study in [3]
found some differences in the performance of such metrics when
comparing gaming videos to non-gaming videos. Towards this end,
CCS CONCEPTS
•Information systems → Multimedia streaming;
KEYWORDS
Gaming Video Streaming, Quality Assessment, QoE
ACM Reference format:
Nabajeet Barman, Steven Schmidt, Saman Zadtootaghaj, Maria G. Martini,
and Sebastian Möller. 2018. An Evaluation of Video Quality Assessment
Metrics for Passive Gaming Video Streaming. In Proceedings of 23rd Packet
Video Workshop, Amsterdam, Netherlands, June 12–15, 2018 (Packet Video’18),
6 pages.
DOI: 10.1145/3210424.3210434
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
Packet Video’18, Amsterdam, Netherlands
© 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
978-1-4503-5773-9/18/06. . . $15.00
DOI: 10.1145/3210424.3210434
INTRODUCTION
1 https://www.twitch.tv/
7
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands
N. Barman et al.
we present in this paper the evaluation and analysis of some of the
most widely used VQA metrics. Since for applications such as live
video streaming, where due to the absence of reference information,
FR and RR metrics cannot be used, we provide a more detailed
discussion on the performance of the NR metrics. We believe that
the insight gained from this study will help to improve or design
better performing VQA metrics. The remainder of the paper is
organized as follows: Section 2 presents a discussion about the eight
VQA metrics used in this work. Section 3 describes the dataset and
the evaluation methodology. The results and main observations are
presented in Section 4 and Section 5 finally concludes the paper.
2
considers only the spatial domain for its computation [9]. For both
these metrics, we used the default settings and implementation as
provided by the authors.
2.3
OVERVIEW OF VQA METRICS
We start with a brief introduction of the eight VQA metrics considered in this work. The primary focus of this work is to evaluate the
performance of the existing VQA metrics on gaming video content
which has not been investigated.
2.1
NR Metrics
NR metrics try to predict the quality without using any source
information. Since for gaming applications, a high-quality reference
video is typically not available, the development of good performing
no-reference metrics is of very high importance. Blind/referenceless
image spatial quality evaluator (BRISQUE) [10] is an NR metric
which tries to quantify the possible loss of naturalness in an image
by using the locally normalized luminance coefficients. Blind image
quality index (BIQI) is a modular NR metric based on distortion
image statistics which is based on natural scene statistics (NSS)
[11]. Natural Image Quality Evaluator (NIQE) is a learning-based
NR quality estimation metric which uses statistical features based
on the space domain NSS model [12].
For FR metrics, we use the results made available in the dataset.
For ST-RREDOpt, SpEED-QA and BIQI we used the implementation
made available by the authors using the default settings. NIQE2
and BRISQUE3 calculations were done using the inbuilt MATLAB
function (version: R2017b).
FR metrics
FR metrics refer to the VQA metrics which requires the availability
of full reference information. We selected Peak Signal to Noise Ratio (PSNR), Structural Similarity Index Metric (SSIM) [4] and Video
Multi-Method Assessment Fusion (VMAF) [5] as the choice of our
three FR metrics. Due to its simplicity and ease of computation,
PSNR is one of the most widely used metrics for both image and
video quality assessment. SSIM, which computes the structural
similarity between the two images, was shown to correlate better
with subjective judgement and hence is also widely used for both
image and video quality assessment [4]. For video quality assessment, frame-level PSNR and SSIM are temporally pooled (usually
averaged) over the video duration to obtain a single score. VMAF is
a fusion based metric which combines scores from three different
metrics to obtain a single score between 0 to 100, with higher score
denoting a higher quality. The choice of VMAF along with PSNR
and SSIM is influenced by our previous work which has shown to
have a very high correlation with subjective scores [3].
3
EVALUATION DATASET AND
METHODOLOGY
3.1 Evaluation Dataset
For this work, we use the GamingVideoSET, a public open source
dataset made available by the authors in [13]. We briefly describe
the dataset and the data used in this work and refer the reader to the
dataset and associated publication for further information. GamingVideoSET consists of a total of 24 gaming video sequences of 30
seconds duration obtained from two recorded video sequences from
each of the 12 games considered. The dataset also provides subjective test results for 90 gaming video sequences obtained by encoding
six gaming videos in 15 different resolution-bitrate pairs (three resolutions, 1080p, 720p and 480p) using the H.264/AVC compression
standard. In addition, a total of 576 encoded videos obtained by
encoding the 24 reference videos in 24 different resolution-bitrate
pairs (inclusive of the ones used for subjective assessment) are provided in MP4 format. The encoding mode used is 1-pass, Constant
Bitrate (CBR). In the rest of this paper, we refer to the part of the
dataset reporting the subjective results as subjective dataset and the
whole dataset as full dataset.
2.2 RR Metrics
Reduced-reference metrics are used when only partial information
about the reference video is available. As such they are less accurate than FR metrics but are useful in applications where there
is limited source information available such as limited bandwidth
transmissions. We used Spatio-temporal-reduced reference entropic
differences (ST-RRED), an RR metric proposed by the authors in
[6], since it is one of the most widely used RR metrics with very
good performance on various VQA databases [7]. It measures the
amount of spatial and temporal information differences in terms of
wavelet coefficients of the frames and frame differences between
the distorted and received videos. In this work, we use the recently
developed optimized version of ST-RRED known as ST-RREDOpt
which calculates only the desired sub-band, resulting in almost the
same performance as ST-RRED but almost ten times computationally faster [8]. In addition, we also use the recently proposed spatial
efficient entropic differencing for quality assessment (SpEED-QA)
model, which is almost 70 times faster than the original implementation of ST-RRED and seven times faster than ST-RREDOpt as it
3.2
Evaluation Methodology
The standard practice to evaluate how well a VQA metric performs is
to measure the correlation between the objective metric score with
subjective scores. In this work, we measure the performance of the
objective metrics in two phases. In the first phase, we compare the
performance of the VQA metrics with subjective scores considering
the subjective dataset. In the second phase, for a comprehensive
evaluation of the VQA metrics on the full dataset, we compare the
VQA metric performance with a benchmark VQA metric. Since the
2 https://de.mathworks.com/help/images/ref/niqe.html
3 https://de.mathworks.com/help/images/ref/brisque.html
8
An Evaluation of VQA Metrics for Passive Gaming Video Streaming
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands
1
40
SSIM
PSNR (dB)
50
30
20
500
1000
1500
2000
2500
3000
3500
0.9
0.8
500
4000
1000
1500
Bitrate (kbps)
ST-RREDOpt
VMAF
50
1000
1500
2000
2500
3000
3500
0
500
4000
1000
1500
2000
2500
3000
3500
4000
3000
3500
4000
3000
3500
4000
60
BRISQUE
SpEEDQA
4000
Bitrate (kbps)
2000
1000
1000
1500
2000
2500
3000
3500
40
20
500
4000
1000
1500
Bitrate (kbps)
2000
2500
Bitrate (kbps)
6
NIQE
60
BIQI
3500
500
3000
40
20
500
3000
1000
Bitrate (kbps)
0
500
2500
Bitrate (kbps)
100
0
500
2000
1000
1500
2000
2500
3000
3500
4
2
500
4000
Bitrate (kbps)
1000
1500
2000
2500
Bitrate (kbps)
Figure 1: Quality vs. Bitrate plots for eight different quality metrics for 1080p resolution.
metrics4 . The results are reported separately for each resolution
and also considering all three resolution-bitrate combined (all data).
It can be observed that VMAF results in the highest performance in
terms of both PLCC and SROCC values across all three resolutions
and all data. The two RR metrics have a similar performance in
terms of correlation values across all resolution-bitrate pairs and
over all data. Hence for applications where an increased speed
of computation is of high importance, SpEEDQA can be selected
as RR metric as it is almost seven times faster than ST-RREDOpt.
Among the NR metrics, BIQI performs the worst. BRISQUE and
NIQE result in almost the same performance for 1080p and 720p
resolutions, but for 480p resolution and all data, NIQE performs
better than BRISQUE.
encoded videos available are MP4, for FR and RR metric calculations,
we instead use the decoded, raw YUV videos obtained from the
encoded, MP4 videos (The videos at 480p and 720p resolution were
rescaled to 1080p YUV format using bilinear scaling filter, as was
done by the authors in GamingVideoSET for subjective quality
evaluation). For NR metric calculations we instead use the encoded
videos at their original resolution (without scaling 480p and 720p
videos to 1080p) due to the reasons discussed later in Section 4.6
4 RESULTS
4.1 VQA Metrics Variation With Bitrates
Figure 1 shows the rate-distortion results for the eight VQA metrics
for all twenty-four videos considering different bitrates for the
1080p resolution. Similar results for 720p and 480p resolution videos
are also obtained but are not presented here due to lack of space. It
can be observed that the FR and RR metrics, at higher bitrates, the
quality gap between various content (due to content complexity)
decreases. Both RR metric results in identical behavior with both
reaching saturation at higher bitrates. For NR metrics, almost a
reverse trend is observed, with increased quality gap at higher
bitrates compared to at lower bitrates.
4.3
Impact of resolution on VQA metrics
It can be observed that in general, the performance of the VQA
metrics varies across different resolutions. For the FR and NR
metrics, the performance decreases as one moves across from higher
resolution to lower resolution videos. In contrast, both RR metrics
resulted in higher correlation in terms of PLCC with MOS scores
for 720p resolution videos, followed by 1080p and 480p resolution
videos. Fisher’s Z-test5 to assess the significance of the difference
between two correlation coefficients indicates that the difference
between 720p and 1080p is not statistically significant, while the
difference between 720p and 480p is significant, Z = 2.954, p < 0.01.
For all eight VQA metrics, the performance for the 480p resolution
4.2 Comparison of VQA metrics with MOS
The performance of a VQA metric with respect to subjective rating is evaluated in terms of Pearson Linear Correlation Coefficient
(PLCC) and Spearman’s Rank Correlation Coefficient (SROCC) values. Negative PLCC and SROCC correlation values indicate that
higher values for the respective metric indicate lower quality and
vice versa. Table 1 shows the correlation values of the eight VQA
4 While the authors in [13] makes available both raw MOS and MOS scores after outlier
detection, we in this work consider only the raw MOS scores and not the ones obtained
without any subjective scores processing
5 http://psych.unl.edu/psycrs/statpage/biv corr comp eg.pdf
9
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands
N. Barman et al.
Table 1: Comparison of the performance of the VQA metric scores with MOS ratings in terms of PLCC and SROCC values. All
Data refers to the combined data of all three resolution-bitrate pairs. The best performing metric is shown in bold.
Table 2: Comparison of the performance of the VQA metric scores with VMAF scores in terms of PLCC and SROCC values.
All Data refers to the combined data of all three resolution-bitrate pairs. The best performing metric is shown in bold.
Metrics
FR Metrics
RR Metrics
NR Metrics
PSNR
SSIM
ST-RREDOpt
SpEEDQA
BRISQUE
BIQI
NIQE
480p
720p
All Data
SROCC
PLCC
SROCC
PLCC
SROCC
PLCC
SROCC
0.62
0.56
-0.66
-0.68
-0.68
-0.57
-0.75
0.60
0.56
-0.85
-0.88
-0.68
-0.54
-0.77
0.79
0.68
-0.74
-0.76
-0.79
-0.70
-0.81
0.77
0.70
-0.89
-0.92
-0.79
-0.71
-0.81
0.91
0.80
-0.77
-0.77
-0.77
-0.67
-0.78
0.92
0.83
-0.91
-0.93
-0.78
-0.68
-0.76
0.87
0.70
-0.53
-0.55
-0.14
-0.05
-0.42
0.87
0.74
-0.61
-0.63
-0.14
-0.05
-0.42
(cf. Table 1) is considerably lower compared to the same VQA
metric performance for the 720p and 1080p resolutions. Also, the
decrease in performance for some metrics is higher than others. We
explain this observation using an example metric, PSNR, as shown
in Figure 2b. Based on the figure, it can be observed that PSNR
for different bitrates at 480p resolution is not able to capture the
variation in MOS (cf. Figure 2a) as its value for the 480p resolution
almost remain constant even at higher bitrates. VMAF, on the other
hand, as evident from Figure 2c, captures this variation quite well
and hence results in increased performance overall and also across
each individual resolutions.
4.4
1080p
PLCC
with VMAF scores. It can be observed that PSNR results in the
highest correlation followed by SSIM. Similar to the correlation
values with MOS as reported in Table 1, both RR metrics result
in similar performance. Also, it is observed that similar to results
reported in Table 1, for some metrics the correlation values vary
significantly over different resolutions. At 1080p, PSNR results in
the highest PLCC scores and SpEEDQA results in higher SROCC
values. At 720p and 480p, NIQE results in the highest PLCC scores
and SpEEDQA results in the highest SROCC values. These results
indicate towards the high potential for the use of RR and NR metrics
for quality evaluations for applications limited to a single resolution
and where full reference information is not available.
Comparison of VQA metrics with VMAF
4.5
In the previous section, we presented and evaluated the performance of the eight VQA metrics based on the subjective ratings
using six reference gaming video sequences and 15 resolutionbitrate pairs. It was found that across all conditions, VMAF resulted
in the highest performance in terms of both PLCC and SROCC
values. In the absence of subjective ratings for the full dataset,
and taking into account the fact that our previous results showed
superior performance of VMAF among all eight VQA metrics, we
consider VMAF values as the reference score. We then evaluate
the rest of the seven VQA metrics on the full dataset (24 reference
video sequences and a total of 24 resolution-bitrate pairs, resulting in a total of 576 encoded video sequences). Table 2 shows the
PLCC and SROCC correlation values for the seven VQA metrics
Comparative performance analysis of NR
metrics
While the VQA metrics, in general, perform quite well, when considering multiple resolutions their performance decreases. Compared
to FR and RR metrics, the performance degradation of NR metrics
for all data was considerably high. We investigate the reason behind
such performance degradation across multiple resolution-bitrate
pairs using Figure 3 which shows the scatter plot of BRSIQUE, BIQI
and NIQE with VMAF scores considering all three resolutions. It
can be observed from Figure 3 that, when considering individual
resolutions, the variation of the NR metric values with respect to
VMAF values are somewhat well correlated and increases linearly
and hence results in reasonable PLCC scores. When considering all
10
An Evaluation of VQA Metrics for Passive Gaming Video Streaming
decrease for 480p (wider spread of the scores) and all data. BIQI
performs the worst among all three.
The difference in values per resolution can be attributed to the
fact that, while for FR and RR metric calculations we used the
rescaled YUV videos, for 720p and 480p resolutions, for NR metric calculations we used the downscaled, compressed MP4 videos.
This, along with lack of proper training with videos consisting of
different resolutions, as well as the absence of parameters in the
models which can capture the differences due to change in resolution results in lower correlation scores when considering all
resolution-bitrate pairs. We discuss next the results obtained for
NR metric performance evaluation when considering the upscaled
YUV videos as was done for FR and RR metric evaluation.
(a) MOS vs. Bitrate (kbps)
Mean Opinion Score (MOS)
5
Resolution
480
720
1080
4
3
2
1
300
500
600
750
1200
2000
4000
Bitrate (kbps)
(b) PSNR (dB) vs. Bitrate (kbps)
4.6
40
720
1080
PSNR
30
20
10
0
300
500
600
750
1200
2000
4000
2000
4000
Bitrate (kbps)
(c) VMAF vs. Bitrate (kbps)
100
Resolution
VMAF
80
480
720
1080
60
40
20
0
300
500
600
750
1200
NR metric evaluation with rescaling
As mentioned before, the three NR metrics were evaluated on videos
without rescaling. We briefly present and discuss the results obtained with rescaled YUV videos and limitations of the same. Figure 4 shows the variation of the NIQE scores for one of the sample
gaming video (FIFA) over 24 different resolution-bitrate pairs. While
for 1080p resolution videos, the NIQE values indicate higher quality
with increase in encoding bitrate (as one would expect), for 720p
resolution videos, the estimated quality remains approx. the same
even when considering higher bitrates. For 480p, the trend actually
reverses, with NIQE estimating a poorer quality at higher bitrates.
A similar behavior is observed for BRISQUE and BIQI. A possible
reason behind such behavior could be that these NR metrics, which
are based on natural scene statistics, are not able to capture the
combined effect of quality loss due to compression and quality
loss due to rescaling, a common method used in resolution switching in adaptive streaming applications such as Dynamic Adaptive
Streaming over HTTP (DASH) and HTTP Live Streaming (HLS).
Hence, while the results for NR metrics when considering the
compressed, low resolution version without upscaling (480p and
720p) are as expected, the same are not capable to estimate MOS
values when rescaled versions of the sequences are considered.
This indicates unsuitability of their usage for applications such as
DASH and HLS where there is quality adaptation using multiple
resolution-bitrate pairs, and the videos are usually rescaled to the
native resolution (1080p in our case). Further investigation into the
design of these metrics can help to overcome this shortcoming and
also perhaps increasing their performance. Training and evaluation
of these metrics considering rescaled, multiple resolution-bitrate
pairs can possibly lead to improved prediction accuracy.
Resolution
480
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands
Bitrate (kbps)
Figure 2: MOS (with 95% confidence interval), PSNR and
VMAF values for the CSGO video sequence at different
resolution-bitrate pairs. A similar behavior is observed for
other video sequences (relevant results not reported here
due to lack of space).
5
CONCLUSION AND FUTURE WORK
In this paper, we presented an objective evaluation and analysis of
the performance of eight different VQA metrics on gaming video
considering a passive, live streaming scenario. At first, on a subset
of GamingVideoSET consisting of 90 video sequences, we evaluated the performance of the VQA metrics against MOS scores. We
found that VMAF results in the highest correlation with subjective
scores followed by SSIM and NIQE. It was observed that many
metrics failed to capture the MOS variation at lower resolutions,
hence resulting in lower correlation values. Then we evaluated
the performance of the rest of the VQA metrics against VMAF on
resolution-bitrate pairs, however, the spread of values is no longer
linear, hence the lower correlation scores. Among the three NR
metrics, NIQE results in a much lower spread for each individual
resolution and when considering all data as compared to BIQI and
BRISQUE. Hence, NIQE results in a higher overall prediction quality
when using both MOS scores and VMAF scores as the benchmark.
BRISQUE on the other hand results in almost similar performance
as NIQE for 1080p and 720p resolutions but the correlation values
11
Packet Video’18, June 12–15, 2018, Amsterdam, Netherlands
(a) VMAF vs. BRISQUE
(b) VMAF vs. BIQI
480
720
1080
Resolution
480
720
1080
100
80
VMAF
VMAF
60
60
60
40
40
40
20
20
20
0
0
0
10
20
30
40
50
60
480
720
1080
100
80
80
VMAF
(c) VMAF vs. NIQE
Resolution
Resolution
100
N. Barman et al.
0
10
20
30
40
50
60
1
BIQI
BRISQUE
2
3
4
5
6
NIQE
Figure 3: Scatter plot showing the variation of the NR metrics wrt. VMAF scores considering all three resolutions over the
whole dataset.
6.5
ACKNOWLEDGMENT
FIFA17
This work is part of a project that has received funding from the European Union’s Horizon 2020 research and innovation programme
under the Marie Sk lodowska-Curie grant agreement No 643072 and
was supported by the German Research Foundation (DFG) within
project MO 1038/21-1.
6.0
5.5
5.0
NIQE
4.5
4.0
3.5
REFERENCES
3.0
[1] S. Shirmohammadi, M. Abdallah, D. T. Ahmed, Y. Lu, and A. Snyatkov. Introduction to the special section on visual computing in the cloud: Cloud gaming and
virtualization. IEEE Transactions on Circuits and Systems for Video Technology,
25(12):1955–1959, 2015.
[2] D. Fitzgerald and D. Wakabayashi. Apple Quietly Builds New Networks. https:
//www.wsj.com/articles/apple-quietly-builds-new-networks-1391474149, February 2014. [Online: accessed 27-February-2017].
[3] N. Barman, S. Zadtootaghaj, M. G. Martini, S. Möller, and S. Lee. A Comparative Quality Assessment Study for Gaming and Non-Gaming Videos. In Tenth
International Conference on Quality of Multimedia Experience (QoMEX), May 2018.
[4] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image
Processing, 13(4):600–612, 2004.
[5] Netflix. VMAF - Video Multi-Method Assessment Fusion. https://github.com/
Netflix/vmaf. [Online: accessed 12-Dec-2018].
[6] R. Soundararajan and A. C. Bovik. Video quality assessment by reduced reference
spatio-temporal entropic differencing. IEEE Transactions on Circuits and Systems
for Video Technology, 23(4):684–694, April 2013.
[7] A. C. Bovik, R. Soundararajan, and Christos Bampis. On the Robust Performance
of the ST-RRED Video Quality Predictor. http://live.ece.utexas.edu/research/
Quality/ST-RRED/.
[8] C. G. Bampis, P. Gupta, R. Soudararajan, and A.C. Bovik. Source code for
optimized Spatio-Temporal Reduced Reference Entropy Differencing Video Quality Prediction Model. http://live.ece.utexas.edu/research/Quality/STRRED opt
demo.zip, 2017.
[9] C. G. Bampis, P. Gupta, R. Soundararajan, and A. C. Bovik. SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality. IEEE Signal
Processing Letters, 24(9):1333–1337, Sept 2017.
[10] A. Mittal, A. K. Moorthy, and A. C. Bovik. No-reference image quality assessment
in the spatial domain. IEEE Transactions on Image Processing, 21(12):4695–4708,
Dec 2012.
[11] A. K. Moorthy and A. C. Bovik. A two-step framework for constructing blind
image quality indices. IEEE Signal Processing Letters, 17(5):513–516, May 2010.
[12] A. Mittal, R. Soundararajan, and A. C. Bovik. Making a ”Completely Blind” Image
Quality Analyzer. IEEE Signal Processing Letters, 20(3):209–212, March 2013.
[13] N. Barman, S. Zadtootaghaj, S. Schmidt, M. G. Martini, and S. Möller. GamingVideoSET: A Dataset for Gaming Video Streaming Applications. In Network
and Systems Support for Games (NetGames), 16th Annual Workshop on, Amsterdam, Netherlands, June 2018.
2.5
2.0
1.5
Resolution
480
720
1080
480
720
1080
1.0
300
400
500
600
750
900
1000 1200 1500 1600 2000 2500 3000 4000
Bitrate (kbps)
Figure 4: NIQE score variation for one of the sample gaming
video sequence (FIFA) considering the rescaled YUV videos
for 720p and 480p resolution. Similar patterns are observed
for other videos but not presented here due to lack of space.
the full test dataset. The performance of the NR metrics decreased
when considering different resolution-bitrate pairs together. Also,
when considering rescaled videos, the NR metrics results in erroneous predictions. Possible reasons could be attributed to the lack
of proper training, gaming video content, etc., which we plan to
investigate in our future works.
We believe that the observations and discussions presented in
this work will be helpful to improve the prediction efficiency of
these existing metrics as well as develop better performing NR VQA
metrics with a focus on live gaming video streaming applications.
In addition to passive gaming service as discussed in this work, a
well-performing NR metric can also be used for predicting video
quality for interactive cloud gaming services. It should be noted
that our current subjective evaluation was limited in terms of the
number of videos considered. Also, the gaming videos used in this
work were limited to 30 fps frame rate. As a future work, we plan to
extend our subjective analysis using more videos and also include
higher frame rate videos.
12
Descargar