新浪微博 url 与 mid 转换工具
起因
weibo.com 微博的详情页 url 格式为:
https://weibo.com/{user_id}/{weibo_id}
ex: https://weibo.com/2034565060/Hd1N2qpta
m.weibo.cn 微博的详情页 url 格式为:
https://m.weibo.cn/detail/{mid}
ex: https://m.weibo.cn/detail/4331051486294436
原理
url -> mid
1. weibo_id 字符串为 Hd1N2qpta
2. 先分组,从后往前 4 个字符一组,得到以下三组字符:
H
d1N2
qpta
**3. 这三组字符实际上是 base62 编码 62 进制表示的数值 **
4. 62 进制的字典是 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 按照字典把这三组字符转换成 10 进制,得到三组数字:
43
3105148
6294436
5. 拼起来,得出 mid:4331051486294436
(这里要强调的是:对于除了开头的字符串,如果得到的十进制数字不足 7 位,需要在前面补足 0。比如得到的十进制数分别为:35,33040,8906190,则需要在 33040 前面添上两个 0。)
mid-> url
** 从后向前每 7 位一组,用 base62 编码来 encode,拼起来即可。同样要注意的是,每 7 个一组的数字,除了开头一组,如果得到的 62 进制数字不足 4 位,需要补足 0。**
代码实现 (Python)
"""
:author Jermic
:date 2019-02-13
:weibo https://weibo.com/Jermic/
"""ALPHABET ="0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"def base62_encode(num, alphabet=ALPHABET):
num = int(num)
if num == 0:
return alphabet[0]
arr = []
base = len(alphabet)
while num:
rem = num % base
num = num // base
arr.append(alphabet[rem])
arr.reverse()
return ''.join(arr)
def base62_decode(string, alphabet=ALPHABET):
string = str(string)
num = 0
idx = 0
for char in string:
power = (len(string) - (idx + 1))
num += alphabet.index(char) * (len(alphabet) ** power)
idx += 1
return num
def reverse_cut_to_length(content, code_func, cut_num=4, fill_num=7):
content = str(content)
cut_list = [content[i - cut_num if i>= cut_num else 0:i] for i in range(len(content), 0, (-1 * cut_num))]
cut_list.reverse()
result = []
for i, item in enumerate(cut_list):
s = str(code_func(item))
if i > 0 and len(s) < fill_num:
s = (fill_num - len(s)) * '0' + s
result.append(s)
return ''.join(result)
def url_to_mid(url: str):
""">>> url_to_mid('z0JH2lOMb')
3501756485200075
>>> url_to_mid('z0IgABdSn')
3501701648871479
>>> url_to_mid('z08AUBmUe')
3500330408906190
>>> url_to_mid('z06qL6b28')
3500247231472384
>>> url_to_mid('yAt1n2xRa')
3486913690606804
"""result = reverse_cut_to_length(url, base62_decode, 4, 7)
return int(result)
def mid_to_url(mid_int: int):
""">>> mid_to_url(3501756485200075)
'z0JH2lOMb'
>>> mid_to_url(3501701648871479)
'z0IgABdSn'
>>> mid_to_url(3500330408906190)
'z08AUBmUe'
>>> mid_to_url(3500247231472384)
'z06qL6b28'
>>> mid_to_url(3486913690606804)
'yAt1n2xRa'
"""result = reverse_cut_to_length(mid_int, base62_encode, 7, 4)
return result
if __name__ == "__main__":
print(url_to_mid('Hd1N2qpta'))
print(mid_to_url(4331051486294436))