Skip to content

微博 Url Mid 转换

Published: at 11:23 AM

新浪微博 url 与 mid 转换工具

起因

weibo.com 微博的详情页 url 格式为:

https://weibo.com/{user_id}/{weibo_id}
ex: https://weibo.com/2034565060/Hd1N2qpta

m.weibo.cn 微博的详情页 url 格式为:

https://m.weibo.cn/detail/{mid}
ex: https://m.weibo.cn/detail/4331051486294436

原理

url -> mid

1. weibo_id 字符串为 Hd1N2qpta

2. 先分组,从后往前 4 个字符一组,得到以下三组字符:

H
d1N2
qpta

**3. 这三组字符实际上是 base62 编码 62 进制表示的数值 **

4. 62 进制的字典是 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 按照字典把这三组字符转换成 10 进制,得到三组数字:

43
3105148
6294436

5. 拼起来,得出 mid:4331051486294436

(这里要强调的是:对于除了开头的字符串,如果得到的十进制数字不足 7 位,需要在前面补足 0。比如得到的十进制数分别为:35,33040,8906190,则需要在 33040 前面添上两个 0。)

mid-> url

** 从后向前每 7 位一组,用 base62 编码来 encode,拼起来即可。同样要注意的是,每 7 个一组的数字,除了开头一组,如果得到的 62 进制数字不足 4 位,需要补足 0。**

代码实现 (Python)

weibo_url_mid_convert.py

"""
:author Jermic
:date 2019-02-13
:weibo https://weibo.com/Jermic/
"""ALPHABET ="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"def base62_encode(num, alphabet=ALPHABET):
    num = int(num)
    if num == 0:
        return alphabet[0]
    arr = []
    base = len(alphabet)
    while num:
        rem = num % base
        num = num // base
        arr.append(alphabet[rem])
    arr.reverse()
    return ''.join(arr)


def base62_decode(string, alphabet=ALPHABET):
    string = str(string)
    num = 0
    idx = 0
    for char in string:
        power = (len(string) - (idx + 1))
        num += alphabet.index(char) * (len(alphabet) ** power)
        idx += 1

    return num


def reverse_cut_to_length(content, code_func, cut_num=4, fill_num=7):
    content = str(content)
    cut_list = [content[i - cut_num if i>= cut_num else 0:i] for i in range(len(content), 0, (-1 * cut_num))]
    cut_list.reverse()
    result = []
    for i, item in enumerate(cut_list):
        s = str(code_func(item))
        if i > 0 and len(s) < fill_num:
            s = (fill_num - len(s)) * '0' + s
        result.append(s)
    return ''.join(result)


def url_to_mid(url: str):
    """>>> url_to_mid('z0JH2lOMb')
    3501756485200075
    >>> url_to_mid('z0IgABdSn')
    3501701648871479
    >>> url_to_mid('z08AUBmUe')
    3500330408906190
    >>> url_to_mid('z06qL6b28')
    3500247231472384
    >>> url_to_mid('yAt1n2xRa')
    3486913690606804
    """result = reverse_cut_to_length(url, base62_decode, 4, 7)
    return int(result)


def mid_to_url(mid_int: int):
    """>>> mid_to_url(3501756485200075)
    'z0JH2lOMb'
    >>> mid_to_url(3501701648871479)
    'z0IgABdSn'
    >>> mid_to_url(3500330408906190)
    'z08AUBmUe'
    >>> mid_to_url(3500247231472384)
    'z06qL6b28'
    >>> mid_to_url(3486913690606804)
    'yAt1n2xRa'
    """result = reverse_cut_to_length(mid_int, base62_encode, 7, 4)
    return result


if __name__ == "__main__":
    print(url_to_mid('Hd1N2qpta'))
    print(mid_to_url(4331051486294436))