Files

jungwoo choi 070032006e feat: Implement async queue-based news pipeline with microservices

Major architectural transformation from synchronous to asynchronous processing:

## Pipeline Services (8 microservices)
- pipeline-scheduler: APScheduler for 30-minute periodic job triggers
- pipeline-rss-collector: RSS feed collection with deduplication (7-day TTL)
- pipeline-google-search: Content enrichment via Google Search API
- pipeline-ai-summarizer: AI summarization using Claude API (claude-sonnet-4-20250514)
- pipeline-translator: Translation using DeepL Pro API
- pipeline-image-generator: Image generation with Replicate API (Stable Diffusion)
- pipeline-article-assembly: Final article assembly and MongoDB storage
- pipeline-monitor: Real-time monitoring dashboard (port 8100)

## Key Features
- Redis-based job queue with deduplication
- Asynchronous processing with Python asyncio
- Shared models and queue manager for inter-service communication
- Docker containerization for all services
- Container names standardized with site11_ prefix

## Removed Services
- Moved to backup: google-search, rss-feed, news-aggregator, ai-writer

## Configuration
- DeepL Pro API: 3abbc796-2515-44a8-972d-22dcf27ab54a
- Claude Model: claude-sonnet-4-20250514
- Redis Queue TTL: 7 days for deduplication

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-09-13 19:22:14 +09:00

backend

feat: Implement async queue-based news pipeline with microservices

2025-09-13 19:22:14 +09:00

README.md

feat: Implement async queue-based news pipeline with microservices

2025-09-13 19:22:14 +09:00

README.md

Google Search Service

키워드를 구글에서 검색한 결과를 수신하는 서비스입니다.

주요 기능

1. 다중 검색 방법 지원

Google Custom Search API: 공식 구글 API (권장)
SerpAPI: 대체 검색 API
웹 스크래핑: 폴백 옵션 (제한적)

2. 검색 옵션

최대 20개 검색 결과 지원
언어별/국가별 검색
날짜 기준 필터링 및 정렬
전체 콘텐츠 가져오기

API 엔드포인트

기본 검색

GET /api/search?q=키워드&num=20&lang=ko&country=kr

파라미터:

q: 검색 키워드 (필수)
num: 결과 개수 (1-20, 기본값: 10)
lang: 언어 코드 (ko, en 등)
country: 국가 코드 (kr, us 등)
date_restrict: 날짜 제한
- d7: 일주일 이내
- m1: 한달 이내
- m3: 3개월 이내
- y1: 1년 이내
sort_by_date: 최신순 정렬 (true/false)

전체 콘텐츠 검색

GET /api/search/full?q=키워드&num=5

각 검색 결과 페이지의 전체 내용을 가져옵니다 (시간이 오래 걸릴 수 있음).

실시간 트렌딩

GET /api/trending?country=kr

사용 예제

1. 한국어 검색 (최신순)

curl "http://localhost:8016/api/search?q=인공지능&num=20&lang=ko&country=kr&sort_by_date=true"

2. 영어 검색 (미국)

curl "http://localhost:8016/api/search?q=artificial%20intelligence&num=10&lang=en&country=us"

3. 최근 일주일 내 결과만

curl "http://localhost:8016/api/search?q=뉴스&date_restrict=d7&lang=ko"

4. 전체 콘텐츠 가져오기

curl "http://localhost:8016/api/search/full?q=python%20tutorial&num=3"

환경 설정

필수 API 키 설정

Google Custom Search API
- Google Cloud Console에서 API 키 발급
- Programmable Search Engine에서 검색 엔진 ID 생성
SerpAPI (선택사항)
- SerpAPI에서 API 키 발급

.env 파일 설정

# Google Custom Search API
GOOGLE_API_KEY=your_api_key_here
GOOGLE_SEARCH_ENGINE_ID=your_search_engine_id_here

# SerpAPI (선택사항)
SERPAPI_KEY=your_serpapi_key_here

# Redis 캐시
REDIS_HOST=redis
REDIS_PORT=6379
REDIS_DB=2

# 기본 설정
DEFAULT_LANGUAGE=ko
DEFAULT_COUNTRY=kr
CACHE_TTL=3600

Docker 실행

# 빌드 및 실행
docker-compose build google-search-backend
docker-compose up -d google-search-backend

# 로그 확인
docker-compose logs -f google-search-backend

제한 사항

Google Custom Search API

무료 계정: 일일 100회 쿼리 제한
검색당 최대 100개 결과
snippet 길이는 서버에서 제한 (변경 불가)

해결 방법

20개 이상 결과 필요 시: 페이지네이션 사용
긴 내용 필요 시: /api/search/full 엔드포인트 사용
API 제한 도달 시: SerpAPI 또는 웹 스크래핑으로 자동 폴백

캐시 관리

Redis를 사용하여 검색 결과를 캐싱합니다:

기본 TTL: 3600초 (1시간)
캐시 초기화: POST /api/clear-cache

헬스 체크

curl http://localhost:8016/health

문제 해결

1. 한글 검색 안될 때

URL 인코딩 사용:

# "인공지능" → %EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5
curl "http://localhost:8016/api/search?q=%EC%9D%B8%EA%B3%B5%EC%A7%80%EB%8A%A5"

2. API 제한 에러

Google API 일일 제한 확인
SerpAPI 키 설정으로 대체
웹 스크래핑 자동 폴백 활용

3. 느린 응답 시간

Redis 캐시 활성화 확인
결과 개수 줄이기
전체 콘텐츠 대신 기본 검색 사용