唯品会作为国内头部品牌特卖电商平台,其关键词搜索接口(开放平台稳定版:/api/search/keyword/v2.1)是企业级应用(选品分析、竞品监控、品牌跟踪、导购对接)批量获取特卖商品数据的核心入口。不同于综合电商的关键词搜索接口,唯品会搜索接口深度绑定“品牌特卖、限时折扣、分层筛选”的核心模式,存在“关键词匹配规则特殊、特卖时效筛选复杂、品牌权限分层、分页易重复、防风控门槛高”等特色痛点。当前全网技术贴均停留在“抓包获取临时接口+基础关键词调用+简单分页”的浅层层面,既忽视唯品会开放平台的合规约束(非法抓包易导致IP封禁、账号拉黑),也未解决生产环境中“搜索词匹配不准、特卖商品筛选无效、分页数据重复、高频调用触发风控”等实际问题;同时,与我之前撰写的唯品会商品详情接口贴文相比,本次完全摒弃“时效适配+规格联动”的框架,聚焦搜索场景的专属需求,打造“合规对接+搜索优化+精准筛选+高可用兜底”的全流程方案,所有代码可直接落地企业级生产环境,兼顾合规性与业务价值,完全适配CSDN技术贴规范,无任何全网同质化内容。
一、核心认知:唯品会关键词搜索接口的差异化特性(区别于全网+过往贴文)
唯品会关键词搜索接口与综合电商(淘宝、京东)、自身非搜索接口(如商品详情)、以及我之前对接的接口差异显著,其设计逻辑完全围绕“品牌特卖搜索”场景展开,四大核心特性直接决定对接思路——照搬通用搜索接口对接经验、复用非搜索场景框架,必然导致合规风险、搜索精准度低、业务适配性差,这也是全网现有教程的核心盲区:
合规约束严苛,非法调用风险高:唯品会开放平台对搜索接口调用有明确的权限分级(个人开发者/企业开发者),仅企业开发者可获取完整的搜索数据(含特卖时效、库存、品牌分层);全网现有教程常用的“移动端抓包临时接口”(如/mapi/vip.com/rest/search/keyword)属于平台内部接口,非法调用会触发IP封禁、账号拉黑,且接口地址动态变化,无法用于生产环境[1][4]。本次方案全程基于唯品会开放平台官方接口对接,严格遵循《唯品会开放平台服务条款》,杜绝合规风险。
关键词匹配规则特殊,需智能优化:不同于综合电商“精确匹配+模糊联想”的常规逻辑,唯品会搜索接口对关键词的匹配存在“品牌优先、特卖标签权重高、同义词识别弱”的特点——如搜索“运动鞋”,会优先展示品牌特卖款,而非所有运动鞋;且“球鞋”与“运动鞋”无法自动关联,易导致搜索结果缺失[4][6]。全网现有教程仅能直接传入关键词调用接口,无法解决“搜索词不精准、结果缺失”的问题。
特卖场景专属筛选,参数适配复杂:搜索接口支持“特卖时效、折扣力度、品牌等级、地区库存、促销类型”等专属筛选参数,且参数之间存在联动约束(如“限时秒杀”筛选需与特卖时效联动);全网现有教程仅能使用基础筛选参数(价格区间),无法适配特卖场景的核心需求,导致筛选结果杂乱(包含非特卖、已过期商品)[2][4]。
分页易重复,防风控难度高:接口返回的搜索结果存在“分页重叠、数据重复”的问题(因特卖商品实时上下架、排序动态调整),且限流规则严苛(基础权限30次/分钟、高级权限150次/分钟,按appKey+IP双重限制),高频批量搜索易触发风控;全网现有教程未解决分页重复问题,且防风控仅做简单延时,无法适配企业级批量搜索需求[2][6]。
核心提醒:1. 本文方案全程基于唯品会开放平台官方接口开发,需提前注册企业开发者账号、创建应用并通过审核,获取appKey和appSecret(个人开发者仅能获取基础搜索结果,无特卖、库存等核心字段);2. 接口调用需严格遵守开放平台限流规则,避免触发风控;3. 与我过往撰写的唯品会商品详情接口贴文相比,本次无任何模块复用,聚焦搜索场景专属需求,重点解决“合规对接、搜索词优化、特卖筛选、分页去重、防风控”五大核心问题,与全网基础教程形成本质区别;4. 接口签名采用HMAC-SHA256算法,需严格遵循参数排序、加密规范,避免签名失败;5. 搜索关键词长度限制为2-50个字符,支持中英文及部分特殊符号,中文需做UTF-8编码[2][6]。
点击获取key和secret
二、差异化方案实现:五大核心模块(全搜索场景专属,无过往模块复用)
方案基于唯品会开放平台V2.1关键词搜索接口构建,核心包含“合规签名客户端(适配搜索接口)+ 搜索词智能优化模块 + 特卖场景专属筛选器 + 分页防重复处理器 + 批量防风控调度器”,技术栈以Python为主,兼顾合规性、搜索精准性与高可用性,全程围绕唯品会“品牌特卖搜索”核心,每一个模块均为全网现有教程未涉及的进阶内容,彻底摆脱同质化困境。
1. 合规签名客户端:适配搜索接口,解决签名失败与合规调用问题
这是唯品会搜索接口对接的基础前提,也是全网现有教程最易忽视的核心环节。全网教程多采用“抓包获取临时接口+跳过签名”的非法方式,无法用于生产环境;而官方搜索接口的签名机制(HMAC-SHA256)存在“参数排序严格、时间戳偏差限制、关键词编码规范”等隐蔽要求,任一环节错误都会返回“401签名无效”。本客户端针对唯品会搜索接口规范,实现“合规签名生成+请求频率控制+参数校验+关键词编码+权限适配”全流程,确保接口调用合规、稳定,避免签名失败与风控触发[6]:
import hashlib import hmac import time import json import requests from typing import Dict, Optional, Any from threading import Lock class VipshopSearchComplianceClient: """唯品会关键词搜索接口合规签名客户端:适配开放平台V2.1,解决签名失败与合规调用问题""" def __init__(self, app_key: str, app_secret: str, timeout: int = 10, max_calls_per_minute: int = 30): self.app_key = app_key # 开放平台申请的appKey self.app_secret = app_secret.encode("utf-8") # 密钥,需编码为字节流 self.base_url = "https://api.vip.com/search/keyword/v2.1" # 官方稳定搜索接口地址 self.timeout = timeout # 请求超时时间(秒) self.max_calls = max_calls_per_minute # 限流阈值(基础30次/分,高级150次/分)[2][6] self.last_request_time = 0 # 上一次请求时间,用于频率控制 self.request_lock = Lock() # 线程锁,保证多线程下频率控制安全 # 接口必传公共参数(V2.1版本强制要求,区别于详情接口) self.public_params = { "appKey": self.app_key, "format": "json", "v": "2.1", "signMethod": "HmacSHA256", "timestamp": "", # 动态生成毫秒级时间戳 "region": "110000", # 默认地区编码(北京),可动态调整 "searchSource": "API" # 搜索来源标识,必填,否则返回400错误[6] } def _generate_sign(self, params: Dict) -> str: """生成唯品会开放平台合规签名(HMAC-SHA256),严格遵循搜索接口规则[6]""" # 1. 排除sign字段,按参数名ASCII码升序排序(核心:排序错误直接签名失败) sorted_params = sorted([(k, v) for k, v in params.items() if k != "sign"], key=lambda x: x[0]) # 2. 拼接为"key=value&key=value"格式,中文关键词需UTF-8编码(搜索接口核心要求) sign_str = "&".join([f"{k}={self._encode_chinese(v)}" for k, v in sorted_params]) # 3. 用appSecret作为密钥,进行HMAC-SHA256加密,结果转为16进制字符串(V2.1版本规范) hmac_obj = hmac.new(self.app_secret, sign_str.encode("utf-8"), hashlib.sha256) sign = hmac_obj.digest().hex() return sign.lower() def _encode_chinese(self, value: Any) -> str: """中文参数编码:适配搜索接口关键词要求,中文需UTF-8编码,避免乱码与签名失败[4][6]""" if isinstance(value, str) and any("\u4e00" <= char <= "\u9fa5" for char in value): # 搜索关键词中文编码,需保留特殊符号(如空格、顿号),区别于详情接口 return requests.utils.quote(value, encoding="utf-8") return str(value) def _control_request_freq(self): """请求频率控制:基于搜索接口限流阈值,控制请求间隔,避免触发风控[2][6]""" with self.request_lock: current_time = time.time() # 计算需等待的时间(确保1分钟内请求不超过阈值) interval = 60 / self.max_calls time_diff = current_time - self.last_request_time if time_diff < interval: time.sleep(interval - time_diff) self.last_request_time = time.time() def _validate_params(self, keyword: str, page: int = 1, page_size: int = 20) -> None: """参数校验:针对搜索接口专属参数,避免无效参数导致接口调用失败(全网教程未涉及)""" # 关键词校验:长度2-50个字符,不能为空[2][6] if not keyword or len(keyword) < 2 or len(keyword) > 50: raise ValueError("关键词无效:长度需为2-50个字符(支持中英文、数字及部分特殊符号)[2][6]") # 分页参数校验:page≥1,page_size10-50(搜索接口限制,详情接口为10-100)[2][4] if page < 1: raise ValueError("页码无效:页码必须大于等于1") if page_size < 10 or page_size > 50: raise ValueError("每页条数无效:需为10-50条(搜索接口官方限制)[2][6]") def search_raw(self, keyword: str, page: int = 1, page_size: int = 20, region: Optional[str] = None, custom_filters: Optional[Dict] = None) -> Dict: """ 合规调用关键词搜索接口,获取原始响应数据(无任何字段过滤,为后续优化提供基础) :param keyword: 搜索关键词(必填,2-50个字符) :param page: 页码(默认1,≥1) :param page_size: 每页条数(默认20,10-50) :param region: 地区编码(可选,影响地区库存筛选) :param custom_filters: 自定义筛选参数(可选,如价格区间、折扣力度等) :return: 接口原始响应JSON """ # 1. 参数校验 self._validate_params(keyword, page, page_size) # 2. 频率控制 self._control_request_freq() # 3. 拼接请求参数(公共参数+业务参数+自定义筛选参数) request_params = self.public_params.copy() request_params["timestamp"] = str(int(time.time() * 1000)) # 毫秒级时间戳,偏差≤3分钟[6] request_params["keyword"] = keyword # 核心业务参数:搜索关键词 request_params["page"] = page # 分页参数 request_params["pageSize"] = page_size # 每页条数(搜索接口参数名pageSize,区别于详情接口) # 补充地区编码(可选) if region: if not (region.isdigit() and len(region) == 6): raise ValueError("地区编码无效:必须为6位数字(如北京110000、上海310000)") request_params["region"] = region # 补充自定义筛选参数(可选) if custom_filters: # 校验筛选参数合法性(避免无效参数导致接口报错) valid_filters = ["priceMin", "priceMax", "discountMin", "discountMax", "brandIds", "promotionType"] for k, v in custom_filters.items(): if k in valid_filters: request_params[k] = v # 4. 生成签名 request_params["sign"] = self._generate_sign(request_params) try: # 5. 发送POST请求(V2.1版本强制要求POST,GET请求会返回405错误,区别于部分临时接口) response = requests.post( url=self.base_url, data=json.dumps(request_params), headers={"Content-Type": "application/json;charset=utf-8"}, timeout=self.timeout, verify=True # 开启SSL证书验证,避免安全风险(全网教程多关闭,存在隐患) ) response.raise_for_status() # 触发HTTP错误(如401、429)时抛出异常 return response.json() except requests.exceptions.RequestException as e: # 捕获请求异常,返回标准化错误信息 return {"code": 500, "msg": f"接口调用异常:{str(e)}", "data": None} except json.JSONDecodeError: return {"code": 500, "msg": "接口响应解析失败(非JSON格式)", "data": None} # 示例:合规调用接口获取原始搜索数据 if __name__ == "__main__": # 替换为自己的开放平台appKey和appSecret(企业开发者账号获取) CLIENT = VipshopSearchComplianceClient( app_key="YOUR_APP_KEY", app_secret="YOUR_APP_SECRET", max_calls_per_minute=30 # 基础权限,30次/分钟[2][6] ) # 自定义筛选参数(特卖场景专属,全网教程未涉及) custom_filters = { "priceMin": 50, # 最低价格 "priceMax": 200, # 最高价格 "discountMin": 2, # 最低折扣(2折) "discountMax": 5, # 最高折扣(5折) "promotionType": 1 # 促销类型:1=限时秒杀[4] } # 调用接口(关键词“连衣裙”,上海地区,310000) raw_response = CLIENT.search_raw( keyword="连衣裙", page=1, page_size=20, region="310000", custom_filters=custom_filters ) print(f"接口调用状态:{'成功' if raw_response.get('code') == 0 else '失败'}") if raw_response.get("code") == 0: total = raw_response["data"].get("totalCount", 0) print(f"搜索结果总数:{total}") print(f"当前页商品数:{len(raw_response['data'].get('products', []))}") else: print(f"错误信息:{raw_response.get('msg')}")
2. 搜索词智能优化模块:解决关键词匹配不准、结果缺失问题
这是本次贴文的核心差异化亮点之一,全网现有教程均未涉及。唯品会搜索接口对关键词的匹配存在“品牌优先、同义词识别弱、特卖标签权重高”的特点,直接传入原始关键词(如“球鞋”“女裙”)会导致搜索结果缺失、精准度低——如搜索“球鞋”无法匹配到“运动鞋”相关商品,搜索“女裙”无法优先匹配“连衣裙特卖款”。本模块针对唯品会搜索词匹配规则,实现“关键词纠错+同义词扩展+特卖标签补充+品牌词提取”,自动优化原始搜索词,提升搜索精准度,解决结果缺失问题,适配特卖场景的搜索需求[4]:
import re from typing import Dict, List, Tuple from collections import defaultdict from vipshop_search_compliance_client import VipshopSearchComplianceClient class VipshopSearchWordOptimizer: """唯品会搜索词智能优化模块:关键词纠错+同义词扩展+特卖标签补充,提升搜索精准度""" def __init__(self, client: VipshopSearchComplianceClient): self.client = client # 1. 唯品会搜索高频同义词映射(基于特卖场景整理,全网独有的优化规则) self.synonym_map = defaultdict(list, { "球鞋": ["运动鞋", "篮球鞋", "跑步鞋"], "女裙": ["连衣裙", "半身裙", "长裙", "短裙"], "卫衣": ["套头卫衣", "连帽卫衣", "宽松卫衣"], "牛仔裤": ["直筒牛仔裤", "紧身牛仔裤", "阔腿牛仔裤"], "面霜": ["保湿面霜", "抗老面霜", "补水面霜"] }) # 2. 特卖标签词(补充到关键词中,提升特卖商品权重) self.sale_tag_words = ["特卖", "限时", "折扣", "秒杀", "清仓"] # 3. 常见关键词错误映射(纠错规则) self.error_correct_map = { "连衣群": "连衣裙", "运动些": "运动鞋", "卫衣": "卫衣", "牛子裤": "牛仔裤", "面霜": "面霜" # 避免错别字导致搜索失败 } # 4. 品牌词库(提取关键词中的品牌词,单独标记,提升品牌匹配权重) self.brand_words = self._load_brand_words() def _load_brand_words(self) -> List[str]: """加载唯品会高频品牌词库(可根据业务需求扩展)[4][6]""" # 这里模拟加载品牌词库,实际可从配置文件、数据库加载 return [ "Nike", "Adidas", "李宁", "安踏", "雅诗兰黛", "兰蔻", "资生堂", "优衣库", "ZARA", "ONLY", "VERO MODA", "波司登" ] def _correct_keyword(self, keyword: str) -> str: """关键词纠错:修正常见错别字,避免因错别字导致搜索结果缺失[4]""" for error, correct in self.error_correct_map.items(): if error in keyword: keyword = keyword.replace(error, correct) return keyword def _expand_synonym(self, keyword: str) -> List[str]: """同义词扩展:基于唯品会搜索规则,扩展同义词,避免结果缺失[4]""" expanded_words = [keyword] # 匹配同义词映射,扩展相关关键词 for core_word, synonyms in self.synonym_map.items(): if core_word in keyword: for synonym in synonyms: if synonym not in expanded_words: expanded_words.append(synonym) return expanded_words def _extract_brand(self, keyword: str) -> Tuple[str, List[str]]: """提取关键词中的品牌词:单独标记品牌,提升品牌匹配权重[4][6]""" brand_list = [] for brand in self.brand_words: if brand.lower() in keyword.lower(): brand_list.append(brand) # 提取品牌词后,保留核心商品词(避免关键词过于冗长) keyword = re.sub(brand, "", keyword, flags=re.IGNORECASE).strip() # 去除关键词中的多余空格 keyword = re.sub(r"\s+", " ", keyword).strip() return keyword, brand_list def _add_sale_tag(self, keyword: str, brand_list: List[str]) -> str: """补充特卖标签:在关键词中添加特卖标签词,提升特卖商品搜索权重[4]""" # 若关键词中已包含特卖相关词汇,无需重复添加 if any(tag in keyword for tag in self.sale_tag_words): return keyword # 拼接特卖标签(优先添加“特卖”,适配唯品会搜索权重) sale_tag = "特卖" # 若有品牌词,拼接格式:品牌+商品词+特卖(如“Nike运动鞋特卖”) if brand_list: brand_str = "、".join(brand_list) return f"{brand_str} {keyword} {sale_tag}" # 无品牌词,拼接格式:商品词+特卖(如“连衣裙特卖”) return f"{keyword} {sale_tag}" def optimize_keyword(self, keyword: str) -> Dict: """ 全流程关键词优化:纠错→提取品牌→同义词扩展→补充特卖标签 :param keyword: 原始搜索关键词 :return: 优化后的关键词信息(含最优关键词、同义词、品牌词) """ # 1. 关键词纠错 corrected_keyword = self._correct_keyword(keyword) if not corrected_keyword: raise ValueError("关键词纠错后为空,请检查原始关键词") # 2. 提取品牌词,精简核心商品词 core_keyword, brand_list = self._extract_brand(corrected_keyword) # 3. 同义词扩展 expanded_words = self._expand_synonym(core_keyword) # 4. 补充特卖标签,生成最优搜索词(用于实际接口调用) optimal_keyword = self._add_sale_tag(core_keyword, brand_list) return { "original_keyword": keyword, "corrected_keyword": corrected_keyword, "core_keyword": core_keyword, "optimal_keyword": optimal_keyword, "brand_list": brand_list, "expanded_synonyms": expanded_words, "optimize_note": "优化后关键词适配唯品会搜索规则,提升特卖商品匹配权重和结果完整性" } def search_with_optimization(self, keyword: str, page: int = 1, page_size: int = 20, region: Optional[str] = None, custom_filters: Optional[Dict] = None) -> Dict: """ 优化后搜索:使用最优关键词调用搜索接口,提升搜索精准度 :param keyword: 原始搜索关键词 :param page: 页码 :param page_size: 每页条数 :param region: 地区编码 :param custom_filters: 自定义筛选参数 :return: 优化后的搜索结果(含关键词优化信息) """ # 1. 优化关键词 optimize_info = self.optimize_keyword(keyword) # 2. 使用最优关键词调用搜索接口 raw_search_result = self.client.search_raw( keyword=optimize_info["optimal_keyword"], page=page, page_size=page_size, region=region, custom_filters=custom_filters ) # 3. 补充关键词优化信息,返回标准化结果 if raw_search_result.get("code") == 0: raw_search_result["data"]["keyword_optimize_info"] = optimize_info return raw_search_result # 示例:搜索词优化+搜索调用 if __name__ == "__main__": CLIENT = VipshopSearchComplianceClient( app_key="YOUR_APP_KEY", app_secret="YOUR_APP_SECRET" ) OPTIMIZER = VipshopSearchWordOptimizer(client=CLIENT) # 原始关键词(存在错别字+同义词问题) original_keyword = "连衣群 特买" # 优化后搜索(以上海地区为例,310000) optimized_search_result = OPTIMIZER.search_with_optimization( keyword=original_keyword, page=1, page_size=20, region="310000", custom_filters={"discountMin": 2, "discountMax": 5} ) if optimized_search_result["code"] == 0: optimize_info = optimized_search_result["data"]["keyword_optimize_info"] print("=== 关键词优化信息 ===") print(f"原始关键词:{optimize_info['original_keyword']}") print(f"纠错后关键词:{optimize_info['corrected_keyword']}") print(f"最优搜索词:{optimize_info['optimal_keyword']}") print(f"提取品牌:{optimize_info['brand_list'] if optimize_info['brand_list'] else '无'}") print(f"同义词扩展:{optimize_info['expanded_synonyms']}") print("\n=== 搜索结果信息 ===") total = optimized_search_result["data"].get("totalCount", 0) print(f"搜索结果总数:{total}") print(f"当前页商品数:{len(optimized_search_result['data'].get('products', []))}") # 打印前3个商品信息 for i, product in enumerate(optimized_search_result["data"]["products"][:3]): print(f"商品{i+1}:{product['productName']}(价格:{product['currentPrice']}元,折扣:{product['discount']}折)") else: print(f"接口调用失败:{optimized_search_result['msg']}")
3. 特卖场景专属筛选器:解决筛选无效、结果杂乱问题
这是全网现有教程均未涉及的核心模块,也是唯品会特卖搜索场景的刚需。唯品会搜索接口支持多种特卖专属筛选参数,但参数之间存在联动约束(如“促销类型=限时秒杀”需与“特卖时效”联动),且部分筛选参数(如品牌等级、库存状态)需要特殊解析,全网现有教程仅能使用基础价格筛选,无法适配特卖场景的核心需求,导致筛选结果杂乱(包含非特卖、已过期、无库存商品)。本筛选器针对唯品会特卖特性,实现“筛选参数校验+特卖时效联动筛选+品牌等级筛选+地区库存筛选+无效商品过滤”,自动适配筛选参数联动规则,输出精准的特卖商品结果,直接支撑选品、竞品监控等业务需求[2][4]:
from typing import Dict, List, Optional from datetime import datetime from vipshop_search_word_optimizer import VipshopSearchWordOptimizer class VipshopSaleSearchFilter: """唯品会特卖场景专属筛选器:适配特卖筛选参数联动,过滤无效商品,提升结果精准度""" def __init__(self, optimizer: VipshopSearchWordOptimizer): self.optimizer = optimizer # 特卖场景专属筛选参数映射(标准化参数名,适配接口返回) self.filter_param_map = { "price_min": "priceMin", "price_max": "priceMax", "discount_min": "discountMin", "discount_max": "discountMax", "brand_level": "brandLevel", # 品牌等级:1=国际大牌,2=国内知名,3=小众品牌[4] "promotion_type": "promotionType", # 促销类型:1=限时秒杀,2=满减,3=会员专享[4] "has_stock": "hasStock", # 是否有库存:true/false "sale_time_type": "saleTimeType" # 特卖时效类型:1=进行中,2=即将开始,3=已结束[4] } # 促销类型与特卖时效联动规则(如限时秒杀仅能筛选特卖进行中的商品) self.promotion_sale_time_link = { 1: [1], # 限时秒杀 → 仅特卖进行中 2: [1], # 满减 → 仅特卖进行中 3: [1], # 会员专享 → 仅特卖进行中 4: [2] # 预售 → 仅即将开始 } def _validate_filter_params(self, custom_filters: Dict) -> Dict: """筛选参数校验+标准化:适配特卖场景,修正无效参数,避免筛选失败[2][4]""" validated_filters = {} for param_name, param_value in custom_filters.items(): # 标准化参数名(如price_min转为priceMin) standard_param = self.filter_param_map.get(param_name, param_name) # 按参数类型校验 if standard_param in ["priceMin", "priceMax"]: # 价格参数:必须为正数,且priceMin≤priceMax value = float(param_value) if value < 0: continue # 无效价格,跳过 if standard_param == "priceMax" and "priceMin" in validated_filters and value< validated_filters["priceMin"]: continue # 最高价小于最低价,跳过 validated_filters[standard_param] = value elif standard_param in ["discountMin", "discountMax"]: # 折扣参数:必须为1-10之间的数字(1折-10折),且discountMin≤discountMax value = float(param_value) if value < 1 or value > 10: continue # 无效折扣,跳过 if standard_param == "discountMax" and "discountMin" in validated_filters and value < validated_filters["discountMin"]: continue # 最高折扣小于最低折扣,跳过 validated_filters[standard_param] = value elif standard_param == "brandLevel": # 品牌等级:1-3之间的整数 value = int(param_value) if value in [1, 2, 3]: validated_filters[standard_param] = value elif standard_param == "promotionType": # 促销类型:1-4之间的整数 value = int(param_value) if value in [1, 2, 3, 4]: validated_filters[standard_param] = value # 联动特卖时效筛选(如限时秒杀仅筛选进行中的商品) sale_time_types = self.promotion_sale_time_link.get(value, [1]) validated_filters["saleTimeType"] = sale_time_types[0] # 默认取第一个联动时效 elif standard_param == "hasStock": # 是否有库存:转为布尔值 validated_filters[standard_param] = bool(param_value) elif standard_param == "saleTimeType": # 特卖时效类型:1-3之间的整数 value = int(param_value) if value in [1, 2, 3]: validated_filters[standard_param] = value return validated_filters def _filter_sale_time(self, products: List[Dict], sale_time_type: int = 1) -> List[Dict]: """特卖时效筛选:根据时效类型,过滤出符合条件的商品[4]""" filtered_products = [] current_time = datetime.now().timestamp() * 1000 # 毫秒级时间戳 for product in products: sale_start_time = product.get("saleStartTime", 0) # 特卖开始时间(毫秒) sale_end_time = product.get("saleEndTime", 0) # 特卖结束时间(毫秒) if sale_time_type == 1: # 特卖进行中:当前时间在开始时间和结束时间之间 if sale_start_time <= current_time <= sale_end_time: filtered_products.append(product) elif sale_time_type == 2: # 即将开始:当前时间在开始时间之前 if current_time < sale_start_time: filtered_products.append(product) elif sale_time_type == 3: # 已结束:当前时间在结束时间之后 if current_time > sale_end_time: filtered_products.append(product) return filtered_products def _filter_brand_level(self, products: List[Dict], brand_level: Optional[int] = None) -> List[Dict]: """品牌等级筛选:过滤出符合品牌等级的商品[4]""" if not brand_level: return products filtered_products = [] for product in products: product_brand_level = product.get("brandInfo", {}).get("brandLevel", 0) if product_brand_level == brand_level: filtered_products.append(product) return filtered_products def _filter_region_stock(self, products: List[Dict], region: str = "110000") -> List[Dict]: """地区库存筛选:过滤出指定地区有库存的商品[3][6]""" filtered_products = [] for product in products: region_stock = product.get("stockInfo", {}).get("regionStock", {}) # 检查该地区是否有库存 has_region_stock = region_stock.get(region, False) if has_region_stock: filtered_products.append(product) return filtered_products def _filter_invalid_products(self, products: List[Dict]) -> List[Dict]: """过滤无效商品:移除已下架、无价格、无主图的商品(全网教程未涉及)""" filtered_products = [] for product in products: # 过滤条件:商品状态正常、有价格、有主图 is_valid = ( product.get("productStatus", 0) == 1 # 1=正常,0=下架 and product.get("currentPrice", 0.0) > 0 and product.get("mainImage", "") != "" ) if is_valid: filtered_products.append(product) return filtered_products def filtered_search(self, keyword: str, page: int = 1, page_size: int = 20, region: Optional[str] = None, custom_filters: Optional[Dict] = None) -> Dict: """ 全流程特卖筛选搜索:优化关键词→校验筛选参数→多维度筛选→过滤无效商品 :param keyword: 原始搜索关键词 :param page: 页码 :param page_size: 每页条数 :param region: 地区编码 :param custom_filters: 自定义筛选参数(特卖场景专属) :return: 精准筛选后的搜索结果 """ # 1. 初始化默认筛选参数 custom_filters = custom_filters or {} region = region or "110000" # 2. 校验+标准化筛选参数 validated_filters = self._validate_filter_params(custom_filters) # 3. 优化关键词并调用搜索接口 optimized_search_result = self.optimizer.search_with_optimization( keyword=keyword, page=page, page_size=page_size, region=region, custom_filters=validated_filters ) if optimized_search_result.get("code") != 0 or not optimized_search_result.get("data"): return optimized_search_result # 4. 提取原始搜索商品列表 products = optimized_search_result["data"].get("products", []) if not products: optimized_search_result["data"]["filtered_products"] = [] optimized_search_result["data"]["filter_note"] = "无符合条件的商品" return optimized_search_result # 5. 多维度筛选(按特卖时效→品牌等级→地区库存→无效商品) # 特卖时效筛选(若有筛选参数) sale_time_type = validated_filters.get("saleTimeType", 1) products = self._filter_sale_time(products, sale_time_type) # 品牌等级筛选(若有筛选参数) brand_level = validated_filters.get("brandLevel") products = self._filter_brand_level(products, brand_level) # 地区库存筛选(默认筛选指定地区有库存的商品) products = self._filter_region_stock(products, region) # 过滤无效商品 filtered_products = self._filter_invalid_products(products) # 6. 补充筛选信息,返回标准化结果 optimized_search_result["data"]["filtered_products"] = filtered_products optimized_search_result["data"]["filter_info"] = { "validated_filters": validated_filters, "original_filter_count": len(products), "filtered_count": len(filtered_products), "filter_steps": ["特卖时效筛选", "品牌等级筛选", "地区库存筛选", "无效商品过滤"] } return optimized_search_result # 示例:特卖场景专属筛选搜索 if __name__ == "__main__": CLIENT = VipshopSearchComplianceClient( app_key="YOUR_APP_KEY", app_secret="YOUR_APP_SECRET" ) OPTIMIZER = VipshopSearchWordOptimizer(client=CLIENT) FILTER = VipshopSaleSearchFilter(optimizer=OPTIMIZER) # 原始关键词 original_keyword = "运动鞋" # 特卖场景专属筛选参数(全网教程未涉及的组合筛选) custom_filters = { "price_min": 80, "price_max": 300, "discount_min": 2, "discount_max": 6, "brand_level": 1, # 国际大牌 "promotion_type": 1, # 限时秒杀(联动筛选特卖进行中商品) "has_stock": True # 有库存 } # 筛选搜索(上海地区,310000) filtered_result = FILTER.filtered_search( keyword=original_keyword, page=1, page_size=20, region="310000", custom_filters=custom_filters ) if filtered_result["code"] == 0: filter_info = filtered_result["data"]["filter_info"] optimize_info = filtered_result["data"]["keyword_optimize_info"] print("=== 筛选信息 ===") print(f"标准化筛选参数:{filter_info['validated_filters']}") print(f"筛选前商品数:{filter_info['original_filter_count']}") print(f"筛选后商品数:{filter_info['filtered_count']}") print("\n=== 最优搜索词 ===") print(f"{optimize_info['optimal_keyword']}") print("\n=== 筛选后商品(前5个) ===") for i, product in enumerate(filtered_result["data"]["filtered_products"][:5]): brand_name = product["brandInfo"]["brandName"] print(f"商品{i+1}:{brand_name} {product['productName']}(价格:{product['currentPrice']}元,折扣:{product['discount']}折,特卖结束时间:{datetime.fromtimestamp(product['saleEndTime']/1000).strftime('%Y-%m-%d %H:%M:%S')})") else: print(f"接口调用失败:{filtered_result['msg']}")
4. 分页防重复处理器:解决分页数据重叠、重复获取问题
这是唯品会搜索接口批量调用的核心痛点,也是全网现有教程的最大盲区。由于唯品会特卖商品实时上下架、排序动态调整,多次调用搜索接口(不同页码)时,会出现“分页数据重叠、重复获取同一商品”的问题——如第1页和第2页均包含同一商品,导致批量获取的数据冗余、统计不准确。本处理器针对该问题,实现“商品唯一标识跟踪+分页数据去重+断点续传”,自动记录已获取的商品ID,过滤重复商品,支持断点续传(中断后继续从上次页码获取),适配企业级批量搜索需求,解决数据冗余问题[2][4]:
from typing import Dict, List, Optional, Set import redis from vipshop_sale_search_filter import VipshopSaleSearchFilter class VipshopSearchPageDeduplicator: """唯品会搜索分页防重复处理器:商品ID跟踪+分页去重+断点续传,解决数据重叠问题""" def __init__(self, filter: VipshopSaleSearchFilter, redis_host: str = "localhost", redis_db: int = 0): self.filter = filter # Redis客户端:用于存储已获取的商品ID(支持分布式部署,避免进程重启丢失) self.redis_client = redis.Redis(host=redis_host, port=6379, db=redis_db, decode_responses=True) # Redis键前缀(区分不同搜索任务,避免冲突) self.redis_key_prefix = "vipshop:search:dedup:" def _get_redis_key(self, keyword: str) -> str: """生成Redis键:基于搜索关键词,区分不同搜索任务""" # 对关键词进行MD5加密,确保键名唯一、简洁 import hashlib keyword_md5 = hashlib.md5(keyword.encode("utf-8")).hexdigest() return f"{self.redis_key_prefix}{keyword_md5}" def _get_existing_product_ids(self, keyword: str) -> Set[str]: """获取已获取的商品ID集合(从Redis中读取)""" redis_key = self._get_redis_key(keyword) # 从Redis中获取所有已存储的商品ID(Set类型) existing_ids = self.redis_client.smembers(redis_key) return set(existing_ids) def _save_product_ids(self, keyword: str, product_ids: List[str]) -> None: """保存新获取的商品ID到Redis(Set类型,自动去重)""" if not product_ids: return redis_key = self._get_redis_key(keyword) # 批量添加商品ID到Redis,设置过期时间(24小时,避免数据堆积) self.redis_client.sadd(redis_key, *product_ids) self.redis_client.expire(redis_key, 86400) def _deduplicate_page(self, keyword: str, products: List[Dict]) -> List[Dict]: """分页数据去重:过滤掉已获取的商品,返回新商品列表""" existing_ids = self._get_existing_product_ids(keyword) new_products = [] new_product_ids = [] for product in products: product_id = str(product.get("productId", "")) if product_id and product_id not in existing_ids: new_products.append(product) new_product_ids.append(product_id) # 保存新商品ID到Redis self._save_product_ids(keyword, new_product_ids) return new_products def _get_last_page(self, keyword: str) -> int: """获取上次中断的页码(断点续传,从Redis中读取)""" redis_key = self._get_redis_key(keyword) + ":last_page" last_page = self.redis_client.get(redis_key) return int(last_page) if last_page else 1 def _save_last_page(self, keyword: str, page: int) -> None: """保存当前页码到Redis(用于断点续传)""" redis_key = self._get_redis_key(keyword) + ":last_page" self.redis_client.set(redis_key, page, ex=86400) def batch_search_with_deduplicate(self, keyword: str, total_pages: int = 5, page_size: int = 20, region: Optional[str] = None, custom_filters: Optional[Dict] = None, resume: bool = True) -> Dict: """ 批量分页搜索+去重+断点续传:避免分页数据重复,支持中断后续传 :param keyword: 原始搜索关键词 :param total_pages: 总页数(需要获取的页数) :param page_size: 每页条数 :param region: 地区编码 :param custom_filters: 自定义筛选参数 :param resume: 是否启用断点续传(默认启用) :return: 批量去重后的搜索结果 """ # 1. 初始化参数 region = region or "110000" custom_filters = custom_filters or {} # 优化关键词(获取最优关键词,用于Redis键名) optimize_info = self.filter.optimizer.optimize_keyword(keyword) optimal_keyword = optimize_info["optimal_keyword"] # 确定起始页码(断点续传则从上次中断页码开始) start_page = self._get_last_page(optimal_keyword) if resume else 1 if start_page > total_pages: return { "code": 0, "msg": "断点续传页码大于总页数,无需继续搜索", "data": { "keyword_optimize_info": optimize_info, "total_pages": total_pages, "start_page": start_page, "end_page": total_pages, "total_products": len(self._get_existing_product_ids(optimal_keyword)), "products": [] } } # 2. 批量分页搜索+去重 batch_products = [] for page in range(start_page, total_pages + 1): print(f"正在获取第{page}/{total_pages}页,关键词:{optimal_keyword}") # 筛选搜索当前页 page_result = self.filter.filtered_search( keyword=keyword, page=page, page_size=page_size, region=region, custom_filters=custom_filters ) if page_result.get("code") != 0: print(f"第{page}页获取失败:{page_result.get('msg')}") continue # 获取当前页筛选后的商品 page_products = page_result["data"].get("filtered_products", []) if not page_products: print(f"第{page}页无符合条件的商品") self._save_last_page(optimal_keyword, page + 1) continue # 分页去重(过滤已获取的商品) new_page_products = self._deduplicate_page(optimal_keyword, page_products) # 添加到批量结果中 batch_products.extend(new_page_products) # 保存当前页码(用于断点续传) self._save_last_page(optimal_keyword, page + 1) print(f"第{page}页去重后新增商品:{len(new_page_products)}件") # 3. 统计批量结果 total_existing_ids = self._get_existing_product_ids(optimal_keyword) # 清空断点续传页码(全部获取完成) self._save_last_page(optimal_keyword, 1) return { "code": 0, "msg": "批量搜索+去重完成", "data": { "keyword_optimize_info": optimize_info, "filter_info": self.filter._validate_filter_params(custom_filters), "total_pages": total_pages, "start_page": start_page, "end_page": total_pages, "total_products": len(total_existing_ids), "batch_products": batch_products, "dedup_note": "已自动过滤重复商品,支持断点续传,避免分页数据重叠" } } # 示例:批量分页搜索+去重+断点续传 if __name__ == "__main__": CLIENT = VipshopSearchComplianceClient( app_key="YOUR_APP_KEY", app_secret="YOUR_APP_SECRET" ) OPTIMIZER = VipshopSearchWordOptimizer(client=CLIENT) FILTER = VipshopSaleSearchFilter(optimizer=OPTIMIZER) DEDUPLICATOR = VipshopSearchPageDeduplicator(filter=FILTER, redis_host="localhost") # 原始关键词 original_keyword = "连衣裙" # 特卖场景专属筛选参数 custom_filters = { "price_min": 60, "price_max": 200, "discount_min": 2, "discount_max": 5, "promotion_type": 1, # 限时秒杀 "has_stock": True } # 批量搜索(获取5页,每页20条,上海地区,启用断点续传) batch_result = DEDUPLICATOR.batch_search_with_deduplicate( keyword=original_keyword, total_pages=5, page_size=20, region="310000", custom_filters=custom_filters, resume=True ) if batch_result["code"] == 0: print("\n=== 批量搜索结果统计 ===") print(f"原始关键词:{batch_result['data']['keyword_optimize_info']['original_keyword']}") print(f"最优搜索词:{batch_result['data']['keyword_optimize_info']['optimal_keyword']}") print(f"搜索页数:{batch_result['data']['start_page']}-{batch_result['data']['end_page']}(共{batch_result['data']['total_pages']}页)") print(f"去重后总商品数:{batch_result['data']['total_products']}") print(f"本次新增商品数:{len(batch_result['data']['batch_products'])}") else: print(f"批量搜索失败:{batch_result['msg']}")