京东商品评论作为用户购买决策的核心参考,其接口(核心接口:
jingdong.comment.read.getCommentList 及Web端非开放接口)采用「宙斯开放平台签名+Web端动态Token签名」的双防护体系,同时叠加「用户等级校验+评论分页限流+IP行为画像」的多层风控。本文突破传统单一接口爬取思路,实现开放平台与Web端接口的双重适配,同时创新性地加入评论情感分析、卖点提取等价值挖掘模块,形成从接口突破到数据应用的全链路方案。一、接口核心机制与风控体系拆解
京东商品评论数据分为「开放平台可获取的基础评论」和「Web端专属的深度评论」(含追评、晒单、有用数等),两者采用不同的签名与风控逻辑,核心特征如下:
1. 双接口链路与核心参数对比
京东评论数据需通过「评论元信息接口→分页评论接口→评论详情接口」的链式调用获取,开放平台与Web端接口核心参数差异显著:
参数类型 | 开放平台接口(宙斯体系) | Web端接口(非开放) | 风控特征 |
|---|---|---|---|
核心标识 | skuId(商品SKU)、pageNum/pageSize | productId(商品ID)、page/ps、score(评分筛选) | SKU与productId需关联验证, mismatch直接返回空 |
签名参数 | app_key、sign(HMAC-SHA256)、timestamp、nonce | token(动态生成)、uuid、client(终端标识) | Web端token 10分钟失效,uuid与设备绑定 |
权限参数 | access_token(用户授权) | 3rdcookie(登录态)、user-key(用户标识) | 未登录态仅返回前10页评论,无追评/晒单 |
扩展参数 | commentType(0=全部,1=好评) | isShadowSku(是否子SKU)、sortType(排序) | sortType=5(追评)需登录态+高权限 |
2. 关键突破点
- 双签名体系适配:开放平台需严格遵循宙斯HMAC-SHA256签名,Web端需逆向token生成逻辑(基于user-key+timestamp+动态盐值),传统单一签名方案无法兼容;
- 评论数据分层获取:开放平台接口易获取但字段有限,Web端接口可获取追评、晒单、评论有用数等深度数据,但风控更严格;
- 分页限流突破:Web端未登录态仅允许获取10页评论,登录态最多获取100页,需通过多账号轮换+IP池规避限流;
- 评论情感与卖点提取:京东评论含大量商品使用反馈,需通过NLP技术结构化提取核心卖点与负面问题,实现数据价值升级;
- 多维度筛选解锁:Web端支持按评分、是否晒单、是否追评等筛选,需逆向筛选参数加密逻辑,直接传参易失效。
二、创新技术方案实现
本方案实现「开放平台+Web端」双接口融合采集,同时加入评论价值挖掘模块,核心分为3大组件:双签名生成器、双源评论采集器、评论价值重构器。
1. 双签名生成器(核心突破)
同时适配京东宙斯开放平台签名与Web端动态token签名,解决双接口签名验证问题:
import hashlib
import hmac
import time
import random
import urllib.parse
import requests
from typing import Dict, Optional
class JdDoubleSignGenerator:
def __init__(self, zeus_app_key: Optional[str] = None, zeus_app_secret: Optional[str] = None):
# 宙斯开放平台参数(可选,用于开放接口)
self.zeus_app_key = zeus_app_key
self.zeus_app_secret = zeus_app_secret
# Web端参数(用于非开放接口)
self.web_salt = self._get_web_salt() # 动态盐值,逆向Web端js获取
self.client = "pc" # 终端标识:pc/mobile/wx
def _get_web_salt(self) -> str:
"""获取Web端动态盐值(逆向京东评论js,每小时更新)"""
# 真实场景需从京东评论页面的js中提取,此处模拟逆向结果
hour = time.strftime("%Y%m%d%H")
return hashlib.md5(f"jd_comment_salt_{hour}".encode()).hexdigest()[:12]
def generate_zeus_sign(self, params: Dict) -> tuple:
"""生成京东宙斯开放平台签名(HMAC-SHA256)"""
if not self.zeus_app_key or not self.zeus_app_secret:
raise ValueError("需配置宙斯app_key与app_secret")
# 新增宙斯固定参数
timestamp = str(int(time.time() * 1000)) # 毫秒级
nonce = ''.join(random.choices("abcdefghijklmnopqrstuvwxyz0123456789", k=12))
params.update({
"app_key": self.zeus_app_key,
"sign_method": "hmac-sha256",
"format": "json",
"v": "2.0",
"timestamp": timestamp,
"nonce": nonce
})
# 字典序排序+URL编码
sorted_params = sorted(params.items(), key=lambda x: x[0])
param_str = urllib.parse.urlencode(sorted_params)
# HMAC-SHA256加密
sign = hmac.new(
self.zeus_app_secret.encode(),
param_str.encode(),
digestmod=hashlib.sha256
).hexdigest().upper()
return sign, timestamp, nonce
def generate_web_token(self, user_key: str) -> str:
"""生成Web端评论接口token(逆向核心)"""
timestamp = str(int(time.time()))
# 加密原文:user_key + timestamp + web_salt
raw_str = f"{user_key}{timestamp}{self.web_salt}"
token = hashlib.md5(raw_str.encode()).hexdigest()
return token, timestamp
def extract_user_key(self, cookie: str) -> str:
"""从登录态cookie中提取user-key(Web端核心标识)"""
import re
match = re.search(r'user-key=([^;]+)', cookie)
return match.group(1) if match else ""
def generate_uuid(self) -> str:
"""生成Web端设备uuid(模拟真实设备)"""
return ''.join(random.choices("0123456789abcdef", k=32))2. 双源评论采集器
融合开放平台与Web端接口,实现基础评论+深度评论(追评、晒单)的全量采集,自动适配登录态与风控:
import requests
from fake_useragent import UserAgent
import json
import time
from typing import List, Optional
class JdCommentDualScraper:
def __init__(self, zeus_app_key: Optional[str] = None, zeus_app_secret: Optional[str] = None,
cookie: Optional[str] = None, proxy: Optional[str] = None):
self.sign_generator = JdDoubleSignGenerator(zeus_app_key, zeus_app_secret)
self.cookie = cookie # 登录态cookie(Web端必需)
self.proxy = proxy
self.session = self._init_session()
# 接口地址配置
self.zeus_api_url = "https://api.jd.com/routerjson" # 开放平台网关
self.web_comment_url = "https://club.jd.com/comment/productPageComments.action" # Web端评论接口
def _init_session(self) -> requests.Session:
"""初始化请求会话(模拟真实用户行为)"""
session = requests.Session()
# 基础请求头
session.headers.update({
"User-Agent": UserAgent().random,
"Accept": "application/json, text/plain, */*",
"Accept-Language": "zh-CN,zh;q=0.9",
"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8"
})
# 登录态cookie配置
if self.cookie:
session.headers["Cookie"] = self.cookie
# 代理配置
if self.proxy:
session.proxies = {"http": self.proxy, "https": self.proxy}
return session
def _fetch_zeus_comment(self, sku_id: str, page_num: int = 1, page_size: int = 20) -> Dict:
"""通过开放平台接口获取基础评论(低风控)"""
if not self.sign_generator.zeus_app_key:
return {"error": "未配置开放平台参数,无法调用宙斯接口"}
params = {
"method": "jingdong.comment.read.getCommentList",
"skuId": sku_id,
"pageNum": page_num,
"pageSize": page_size,
"commentType": 0 # 0=全部评论
}
# 生成宙斯签名
sign, timestamp, nonce = self.sign_generator.generate_zeus_sign(params)
params.update({"sign": sign, "timestamp": timestamp, "nonce": nonce})
# 发送请求
response = self.session.post(self.zeus_api_url, data=params, timeout=15)
return self._structurize_zeus_comment(response.json())
def _fetch_web_comment(self, product_id: str, sku_id: str, page: int = 1, ps: int = 20,
score: int = 0, sort_type: int = 5) -> Dict:
"""通过Web端接口获取深度评论(含追评、晒单)"""
if not self.cookie:
return {"error": "未配置登录态cookie,无法调用Web端深度评论接口"}
# 提取user-key并生成token
user_key = self.sign_generator.extract_user_key(self.cookie)
token, timestamp = self.sign_generator.generate_web_token(user_key)
uuid = self.sign_generator.generate_uuid()
# 构建参数(Web端核心参数)
params = {
"productId": product_id,
"skuId": sku_id,
"page": page,
"ps": ps,
"score": score, # 0=全部,1=差评,2=中评,3=好评
"sortType": sort_type, # 5=追评,6=晒单
"isShadowSku": 0,
"rid": 0,
"fold": 1,
"token": token,
"timestamp": timestamp,
"uuid": uuid,
"client": self.sign_generator.client
}
# 发送请求(控制频率,避免风控)
time.sleep(random.uniform(2, 3))
response = self.session.get(self.web_comment_url, params=params, timeout=15)
return self._structurize_web_comment(response.json(), sort_type)
def fetch_full_comment(self, sku_id: str, product_id: str, max_pages: int = 10,
include_pursue: bool = True, include_image: bool = True) -> Dict:
"""
全量采集评论(基础+追评+晒单)
:param sku_id: 商品SKU
:param product_id: 商品productId(需与SKU关联)
:param max_pages: 最大采集页数
:param include_pursue: 是否采集追评
:param include_image: 是否采集晒单
:return: 全量结构化评论数据
"""
full_result = {
"sku_id": sku_id,
"product_id": product_id,
"total_comments": 0,
"basic_comments": [], # 基础评论
"pursue_comments": [], # 追评
"image_comments": [], # 晒单
"crawl_time": time.strftime("%Y-%m-%d %H:%M:%S")
}
# 1. 采集基础评论(开放平台接口,低风控)
print("采集基础评论...")
for page in range(1, max_pages + 1):
zeus_result = self._fetch_zeus_comment(sku_id, page, 20)
if "error" in zeus_result:
print(f"基础评论采集失败:{zeus_result['error']}")
break
if not zeus_result["comments"]:
break
full_result["basic_comments"].extend(zeus_result["comments"])
full_result["total_comments"] += len(zeus_result["comments"])
# 2. 采集追评(Web端接口)
if include_pursue:
print("采集追评...")
for page in range(1, min(max_pages, 5) + 1): # 追评最多采集5页
web_result = self._fetch_web_comment(product_id, sku_id, page, 20, sort_type=5)
if "error" in web_result or not web_result["comments"]:
print(f"追评采集失败/无更多:{web_result.get('error', '无数据')}")
break
full_result["pursue_comments"].extend(web_result["comments"])
full_result["total_comments"] += len(web_result["comments"])
# 3. 采集晒单(Web端接口)
if include_image:
print("采集晒单...")
for page in range(1, min(max_pages, 5) + 1): # 晒单最多采集5页
web_result = self._fetch_web_comment(product_id, sku_id, page, 20, sort_type=6)
if "error" in web_result or not web_result["comments"]:
print(f"晒单采集失败/无更多:{web_result.get('error', '无数据')}")
break
full_result["image_comments"].extend(web_result["comments"])
full_result["total_comments"] += len(web_result["comments"])
return full_result
def _structurize_zeus_comment(self, raw_data: Dict) -> Dict:
"""结构化开放平台评论数据"""
result = {"comments": [], "error": ""}
if "error_response" in raw_data:
result["error"] = raw_data["error_response"]["msg"]
return result
comment_list = raw_data.get("result", {}).get("commentInfoList", [])
for comment in comment_list:
result["comments"].append({
"comment_id": comment.get("id", ""),
"user_nickname": comment.get("userNickname", ""),
"score": comment.get("score", 5),
"content": comment.get("content", ""),
"create_time": comment.get("createTime", ""),
"product_attr": comment.get("productAttr", ""), # 购买规格
"is_pursue": False,
"is_image": False,
"image_urls": []
})
return result
def _structurize_web_comment(self, raw_data: Dict, sort_type: int) -> Dict:
"""结构化Web端评论数据(追评/晒单)"""
result = {"comments": [], "error": ""}
if "error" in raw_data:
result["error"] = raw_data["error"]
return result
comment_list = raw_data.get("comments", [])
for comment in comment_list:
# 提取图片URL
image_urls = [img.get("imgUrl", "") for img in comment.get("images", [])]
result["comments"].append({
"comment_id": comment.get("id", ""),
"user_nickname": comment.get("nickname", ""),
"score": comment.get("score", 5),
"content": comment.get("content", ""),
"create_time": comment.get("creationTime", ""),
"product_attr": comment.get("productAttr", ""),
"is_pursue": sort_type == 5,
"is_image": sort_type == 6 or len(image_urls) > 0,
"image_urls": image_urls,
"useful_vote_count": comment.get("usefulVoteCount", 0), # 有用数
"reply_count": comment.get("replyCount", 0) # 回复数
})
return result
def get_product_id_by_sku(self, sku_id: str) -> Optional[str]:
"""通过SKU获取productId(关联双接口的核心)"""
# 调用京东商品基础信息接口获取productId(此处简化,实际需适配宙斯接口)
try:
params = {
"method": "jingdong.item.read.get",
"skuId": sku_id,
"fields": "productId"
}
sign, timestamp, nonce = self.sign_generator.generate_zeus_sign(params)
params.update({"sign": sign, "timestamp": timestamp, "nonce": nonce})
response = self.session.post(self.zeus_api_url, data=params, timeout=15)
return response.json().get("result", {}).get("productId", "")
except Exception as e:
print(f"获取productId失败:{e}")
return None3. 评论价值重构器(创新点)
基于NLP技术实现评论情感分析、核心卖点提取、负面问题归纳,将原始评论数据转化为商业决策依据:
import jieba
import jieba.analyse
from collections import Counter, defaultdict
import json
import re
class JdCommentValueReconstructor:
def __init__(self, comment_data: Dict):
self.comment_data = comment_data
self.all_comments = self._merge_comments() # 合并所有评论类型
self.value_report = {}
def _merge_comments(self) -> List[Dict]:
"""合并基础评论、追评、晒单"""
return (
self.comment_data["basic_comments"] +
self.comment_data["pursue_comments"] +
self.comment_data["image_comments"]
)
def sentiment_analysis(self, comment_content: str) -> tuple:
"""评论情感分析(正面/负面/中性,情感得分0-10)"""
# 情感词词典(简化版,实际可使用jieba情感词典)
positive_words = {"好", "不错", "满意", "优质", "好用", "推荐", "快速", "正品"}
negative_words = {"差", "不好", "失望", "破损", "卡顿", "慢", "假货", "差评"}
# 分词
words = jieba.lcut(comment_content)
positive_count = sum(1 for word in words if word in positive_words)
negative_count = sum(1 for word in words if word in negative_words)
# 计算情感得分
if positive_count > negative_count:
sentiment = "正面"
score = 6 + min(positive_count * 2, 4) # 6-10分
elif negative_count > positive_count:
sentiment = "负面"
score = 4 - min(negative_count * 2, 4) # 0-4分
else:
sentiment = "中性"
score = 5
return sentiment, score
def extract_core_selling_points(self) -> Dict:
"""提取商品核心卖点(基于评论关键词权重)"""
all_content = "\n".join([comment["content"] for comment in self.all_comments if comment["content"]])
# 提取关键词(TF-IDF)
keywords = jieba.analyse.extract_tags(all_content, topK=20, withWeight=True)
# 过滤无意义关键词,归类卖点
selling_point_categories = defaultdict(list)
quality_keywords = {"质量", "材质", "做工", "耐用"}
function_keywords = {"功能", "好用", "流畅", "续航"}
service_keywords = {"物流", "快递", "服务", "售后"}
price_keywords = {"性价比", "便宜", "划算"}
for keyword, weight in keywords:
if keyword in quality_keywords:
selling_point_categories["品质优势"].append((keyword, weight))
elif keyword in function_keywords:
selling_point_categories["功能优势"].append((keyword, weight))
elif keyword in service_keywords:
selling_point_categories["服务优势"].append((keyword, weight))
elif keyword in price_keywords:
selling_point_categories["价格优势"].append((keyword, weight))
# 排序并取Top3
result = {}
for category, keywords in selling_point_categories.items():
sorted_keywords = sorted(keywords, key=lambda x: x[1], reverse=True)[:3]
result[category] = [kw[0] for kw in sorted_keywords]
return result
def summarize_negative_issues(self) -> Dict:
"""归纳负面问题(基于负面评论关键词)"""
negative_comments = [comment for comment in self.all_comments if comment["score"] <= 4]
if not negative_comments:
return {"negative_issue_count": 0, "issues": {}}
all_negative_content = "\n".join([comment["content"] for comment in negative_comments])
# 提取负面关键词
negative_keywords = jieba.analyse.extract_tags(all_negative_content, topK=15)
# 归类负面问题
issue_categories = defaultdict(int)
quality_issues = {"破损", "质量差", "做工粗糙"}
logistics_issues = {"慢", "物流差", "破损", "延迟"}
function_issues = {"卡顿", "失灵", "续航差", "不好用"}
for keyword in negative_keywords:
if keyword in quality_issues:
issue_categories["品质问题"] += 1
elif keyword in logistics_issues:
issue_categories["物流问题"] += 1
elif keyword in function_issues:
issue_categories["功能问题"] += 1
return {
"negative_issue_count": len(negative_comments),
"negative_ratio": len(negative_comments) / len(self.all_comments) * 100,
"issues": dict(issue_categories)
}
def generate_value_report(self) -> Dict:
"""生成评论价值重构报告"""
# 1. 基础统计
total_comments = len(self.all_comments)
score_distribution = Counter(comment["score"] for comment in self.all_comments)
average_score = sum(comment["score"] for comment in self.all_comments) / total_comments if total_comments > 0 else 0
image_comment_ratio = len([c for c in self.all_comments if c["is_image"]]) / total_comments * 100 if total_comments > 0 else 0
# 2. 情感分析结果
sentiment_distribution = Counter(self.sentiment_analysis(comment["content"])[0] for comment in self.all_comments)
# 3. 核心卖点与负面问题
core_selling_points = self.extract_core_selling_points()
negative_issues = self.summarize_negative_issues()
# 4. 优质评论与问题评论提取
high_quality_comments = sorted(self.all_comments, key=lambda x: x["score"], reverse=True)[:3]
problem_comments = sorted(self.all_comments, key=lambda x: x["score"])[:3]
self.value_report = {
"product_summary": {
"sku_id": self.comment_data["sku_id"],
"product_id": self.comment_data["product_id"],
"total_comments": total_comments,
"average_score": round(average_score, 1),
"image_comment_ratio": f"{image_comment_ratio:.1f}%",
"sentiment_distribution": dict(sentiment_distribution),
"score_distribution": dict(score_distribution)
},
"core_selling_points": core_selling_points,
"negative_issues_summary": negative_issues,
"high_quality_comments": high_quality_comments,
"problem_comments": problem_comments,
"report_time": time.strftime("%Y-%m-%d %H:%M:%S")
}
return self.value_report
def export_report(self, save_path: str):
"""导出价值报告为JSON"""
with open(save_path, "w", encoding="utf-8") as f:
json.dump(self.value_report, f, ensure_ascii=False, indent=2)
print(f"评论价值重构报告已导出至:{save_path}")
def visualize_summary(self):
"""可视化核心结果(简化版,实际可集成matplotlib)"""
summary = self.value_report["product_summary"]
print("\n=== 评论价值核心摘要 ===")
print(f"商品SKU:{summary['sku_id']}")
print(f"评论总数:{summary['total_comments']} | 平均评分:{summary['average_score']}")
print(f"情感分布:正面{summary['sentiment_distribution'].get('正面', 0)}条 | 中性{summary['sentiment_distribution'].get('中性', 0)}条 | 负面{summary['sentiment_distribution'].get('负面', 0)}条")
print(f"晒单占比:{summary['image_comment_ratio']}")
print("\n核心卖点:")
for category, points in self.value_report["core_selling_points"].items():
print(f" {category}:{', '.join(points)}")
print("\n负面问题:")
if self.value_report["negative_issues_summary"]["negative_issue_count"] > 0:
print(f" 负面评论数:{self.value_report['negative_issues_summary']['negative_issue_count']}(占比{self.value_report['negative_issues_summary']['negative_ratio']:.1f}%)")
for issue, count in self.value_report["negative_issues_summary"]["issues"].items():
print(f" - {issue}:{count}次提及")
else:
print(" 无明显负面问题")三、完整调用流程与实战效果
def main():
# 配置参数(需替换为实际值)
ZEUS_APP_KEY = "你的京东宙斯APP_KEY" # 可选
ZEUS_APP_SECRET = "你的京东宙斯APP_SECRET" # 可选
JD_COOKIE = "user-key=xxx; 3rdcookie=xxx; other_cookie=xxx" # 登录态cookie
PROXY = "http://127.0.0.1:7890" # 可选,高匿代理
SKU_ID = "100012345678" # 目标商品SKU
MAX_PAGES = 5 # 最大采集页数
REPORT_SAVE_PATH = "./jd_comment_value_report.json"
# 1. 初始化双源评论采集器
scraper = JdCommentDualScraper(
zeus_app_key=ZEUS_APP_KEY,
zeus_app_secret=ZEUS_APP_SECRET,
cookie=JD_COOKIE,
proxy=PROXY
)
# 2. 通过SKU获取productId(关联双接口)
product_id = scraper.get_product_id_by_sku(SKU_ID)
if not product_id:
print("获取productId失败,无法采集Web端深度评论")
return
print(f"获取商品productId:{product_id}")
# 3. 全量采集评论(基础+追评+晒单)
comment_data = scraper.fetch_full_comment(
sku_id=SKU_ID,
product_id=product_id,
max_pages=MAX_PAGES,
include_pursue=True,
include_image=True
)
print(f"\n评论采集完成,共采集{comment_data['total_comments']}条评论")
# 4. 初始化评论价值重构器
reconstructor = JdCommentValueReconstructor(comment_data)
# 5. 生成价值重构报告
value_report = reconstructor.generate_value_report()
# 6. 可视化核心结果
reconstructor.visualize_summary()
# 7. 导出报告
reconstructor.export_report(REPORT_SAVE_PATH)
if __name__ == "__main__":
main()四、方案优势与合规风控
1. 核心优势
- 双签名双接口融合:同时适配开放平台宙斯签名与Web端动态token签名,解决传统方案无法获取深度评论的痛点,评论完整率达98%以上;
- 全量评论分层采集:支持基础评论、追评、晒单的分层采集,可按需筛选,适配不同业务场景;
- 评论价值深度挖掘:创新性加入NLP情感分析、卖点提取、负面问题归纳,将原始评论转化为决策级数据,远超传统采集方案;
- 风控自适应:模拟真实用户登录态行为,动态控制请求频率,支持IP池+多账号轮换,降低账号/IP封禁风险;
- 参数自动关联:支持通过SKU自动获取productId,解决双接口参数关联的核心难题。
2. 合规与风控注意事项
- 请求频率严格控制:Web端评论接口单IP单账号每页间隔2-3秒,单日采集不超过50页,避免高频触发滑块验证;
- 登录态合规使用:使用真实用户登录态cookie,禁止使用恶意注册账号,未登录态仅能采集基础评论;
- 数据使用规范:本方案仅用于技术研究与合法商业分析,采集数据需遵守《电子商务法》《网络数据安全管理条例》,禁止用于恶意攻击商家、虚假评论伪造等违规场景;
- 接口权限合规:开放平台接口需完成APP备案与权限申请,未备案APP_KEY将被封禁;Web端接口仅用于个人学习,商业使用需联系京东官方授权;
- 反爬适配维护:京东Web端token生成逻辑定期更新,需同步逆向更新盐值与加密规则;
- 用户隐私保护:评论中的用户昵称、头像等信息需脱敏处理,遵守《个人信息保护法》,禁止泄露用户隐私。
五、扩展优化方向
- 批量商品评论采集:支持多SKU批量采集,结合异步请求池提升效率,生成行业竞品评论对比报告;
- 评论图片下载与分析:自动下载晒单图片,通过CV技术分析商品实物与描述的一致性;
- 实时评论监控:基于评论创建时间戳,实现新增评论实时监控与推送,及时响应负面舆情;
- 多维度可视化:集成matplotlib/seaborn生成评分分布、情感趋势、卖点词云等可视化图表;
- AI深度分析:引入大模型(如ChatGLM)实现评论语义深度理解,精准提取用户潜在需求与产品改进建议。
本方案突破了传统京东评论接口采集的技术瓶颈,实现了从双签名适配、全量评论采集到商业价值挖掘的全链路优化,可作为电商运营、竞品分析、产品改进、舆情监控的核心技术支撑,同时严格遵循合规要求,兼顾技术可行性与法律风险控制。