TY - GEN
T1 - Fast Generating A Large Number of Gumbel-Max Variables
AU - Qi, Yiyan
AU - Wang, Pinghui
AU - Zhang, Yuanming
AU - Zhao, Junzhou
AU - Tian, Guangjian
AU - Guan, Xiaohong
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element i (or a Gumbel-Max variable i) in proportion to its positive weight vi, the Gumbel-Max Trick first computes a Gumbel random variable gi for each positive weight element i, and then samples the element i with the largest value of gi + ln vi. Recently, applications including similarity estimation and graph embedding require to generate k independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large k (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, that reduces the time complexity from O(kn+) to O(kln k + n+), where n+ is the number of positive elements in the vector of interest. Instead of computing k independent Gumbel random variables directly, we find that there exists a technique to generate these variables in descending order. Using this technique, our method FastGM computes variables gi + ln vi for all positive elements i in descending order. As a result, FastGM significantly reduces the computation time because we can stop the procedure of Gumbel random variables computing for many elements especially for those with small weights. Experiments on a variety of real-world datasets show that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy and incurring additional expenses.
AB - The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a nonnegative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element i (or a Gumbel-Max variable i) in proportion to its positive weight vi, the Gumbel-Max Trick first computes a Gumbel random variable gi for each positive weight element i, and then samples the element i with the largest value of gi + ln vi. Recently, applications including similarity estimation and graph embedding require to generate k independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large k (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, that reduces the time complexity from O(kn+) to O(kln k + n+), where n+ is the number of positive elements in the vector of interest. Instead of computing k independent Gumbel random variables directly, we find that there exists a technique to generate these variables in descending order. Using this technique, our method FastGM computes variables gi + ln vi for all positive elements i in descending order. As a result, FastGM significantly reduces the computation time because we can stop the procedure of Gumbel random variables computing for many elements especially for those with small weights. Experiments on a variety of real-world datasets show that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy and incurring additional expenses.
KW - Graph embedding
KW - Gumbel-Max Trick
KW - Sketching
UR - https://www.scopus.com/pages/publications/85086582052
U2 - 10.1145/3366423.3380160
DO - 10.1145/3366423.3380160
M3 - 会议稿件
AN - SCOPUS:85086582052
T3 - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
SP - 796
EP - 807
BT - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery, Inc
T2 - 29th International World Wide Web Conference, WWW 2020
Y2 - 20 April 2020 through 24 April 2020
ER -