nju-power-watch

NJU Electricity Data Pipeline

Automated daily electricity data collection and analysis for Nanjing University dormitories.

Overview

This project automates the collection, processing, and aggregation of electricity consumption data from the NJU epay system. It provides:

Features

✅ Automated daily data collection at 2 AM UTC
Auto-login with captcha recognition - no manual cookie updates needed
✅ Atomic batch operations with rollback on failure
✅ Cookie-based authentication with validation
✅ Monthly data archiving (tar.gz)
✅ Pre-computed statistics for frontend
✅ File-based JSON storage
✅ GitHub Actions automation
Data persistence via Git repository - survives between workflow runs

Important: Data Persistence

How does data persist between GitHub Actions runs?

GitHub Actions每次运行都是全新环境,数据通过以下方式持久化:

  1. 只提交聚合数据: 原始数据不提交,只提交 database/summaries/
  2. 历史数据合并: 每次运行加载旧的 summary,与新数据合并后提交
  3. 完整历史保留: 每个房间的 JSON 包含所有查询过的日期及余额(无时间限制)
  4. 节省空间: Summary 比原始数据小 97%(~5.5MB/年 vs ~182MB/年)

数据流:

运行开始 → 检出仓库(包含完整历史 summary)
       ↓
查询新数据 → 写入原始 database/(临时)
       ↓
合并数据 → 加载旧 summary + 新数据 → 生成新 summary
       ↓
提交推送 → 只提交 summaries/(原始数据丢弃)

空间估算(500个房间):

详见:docs/data-persistence.md

关键配置:

Quick Start

Test Frontend UI 🎨

快速验证前端数据显示功能:

# 启动本地服务器
python serve_frontend.py

# 浏览器访问
# http://localhost:8000/frontend/

前端功能演示

详见:frontend/README.md

Prerequisites

Setup

GitHub Actions自动登录(推荐)

每次查询前自动登录获取cookie,无需手动更新。

  1. 配置GitHub Secrets:
    • Go to repository Settings → Secrets → Actions
    • Add the following 3 secrets:
    Secret Name Description Example
    NJU_USERNAME 你的学号 201250000
    NJU_PASSWORD 统一身份认证密码 your_password
    YUNMA_TOKEN 云码API Token TA6djdhm0NC...

    获取云码Token: 注册 zhuce.jfbym.com → 用户中心 → Token

  2. 配置房间列表 (可选):

    # 编辑 config/room_ids.txt
    echo "53463" > config/room_ids.txt
    echo "53464" >> config/room_ids.txt
    
  3. 手动触发测试:
    • Go to Actions → Manual Electricity Query → Run workflow
    • 查看运行日志确认自动登录成功

成本: 云码验证码识别 ~0.01-0.03元/次,月成本 < 1元

详见:docs/github-actions-setup.md


本地手动登录(备用):

如果需要手动获取cookie:

# 安装依赖
pip install -r requirements.txt

# 配置登录信息
echo "your_username" > /tmp/username
echo "your_password" > /tmp/password
echo "your_yunma_token" > /tmp/token

# 自动登录
python scripts/nju_auto_login.py

# Cookie将保存到 /tmp/cookie.json

Project Structure

.
├── .github/workflows/      # GitHub Actions automation
│   ├── daily-query.yml     # Scheduled daily collection
│   ├── manual-query.yml    # Manual trigger workflow
│   └── data-cleanup.yml    # Monthly cleanup/archival
│
├── scripts/                # Processing scripts
│   ├── validate_cookie.py  # Cookie validation
│   ├── rollback_failed_run.py  # Rollback on failure
│   ├── cleanup_archives.py # Archive management
│   └── aggregate_data.py   # Summary generation
│
├── config/
│   └── room_ids.txt        # List of room IDs to query
│
├── database/               # Data storage (git-ignored)
│   ├── [campus]/[building]/[room-id]/[date].json  # Daily data
│   ├── archives/           # Monthly archives
│   └── summaries/          # Hierarchical aggregated summaries
│       ├── overview.json   # All campuses overview
│       └── campuses/       # Campus → Building → Room hierarchy
│
├── logs/
│   └── query_runs/         # Workflow execution logs
│
├── tests/                  # Test suite
│   ├── unit/               # Unit tests
│   └── integration/        # Integration tests
│
├── nju_electric_query.py   # Existing query script (unchanged)
└── list_room_ids.py        # Existing room ID script (unchanged)

Data Access

View Today’s Data

# Find today's file
find database -name "$(date +%Y%m%d).json"

# View data
cat database/仙林校区/19幢/19栋第16层1613-53463/$(date +%Y%m%d).json | jq

View Summary

# View overview (all campuses)
cat database/summaries/overview.json | jq

# View specific campus
cat database/summaries/campuses/仙林校区/summary.json | jq

# View specific building
cat database/summaries/campuses/仙林校区/buildings/19幢/summary.json | jq

# View specific room
cat database/summaries/campuses/仙林校区/buildings/19幢/rooms/53463.json | jq

Extract Archives

# Extract specific month
cd database/archives
tar -xzf 2026-05.tar.gz

Troubleshooting

See docs/troubleshooting.md for common issues and solutions.

Development

Run Tests

# Install dev dependencies
pip install -r requirements.txt

# Run all tests
pytest tests/

# Run specific test file
pytest tests/unit/test_validate_cookie.py -v

Code Style

# Format code
black scripts/

# Lint code
ruff check scripts/

Architecture

This project follows the Data-Business Separation principle:

  1. Data Acquisition: nju_electric_query.py (unchanged)
  2. Data Processing: scripts/aggregate_data.py, scripts/cleanup_archives.py
  3. Presentation: Static frontend consumes hierarchical summaries (future)

Data Flow:

Daily Query → Raw JSON Files → Hierarchical Aggregation
                                     ↓
                            database/summaries/
                            ├── overview.json (all campuses)
                            └── campuses/
                                └── {campus}/
                                    ├── summary.json
                                    └── buildings/
                                        └── {building}/
                                            ├── summary.json
                                            └── rooms/{id}.json

See docs/hierarchical-aggregation.md for detailed usage.

Monitoring

Maintenance

  1. Login to https://epay.nju.edu.cn
  2. Export cookies as JSON
  3. Update EPAY_COOKIE secret in GitHub
  4. Verify with manual workflow trigger

Archive Management

License

MIT

Credits

Built with: