Running Tests¶

Prerequisites¶

Installing Test Dependencies¶

Before running tests, ensure you have the required dependencies installed:

# Install pytest and testing dependencies
pip install pytest pytest-cov pytest-mock jsonschema

# For development with additional testing tools
pip install pytest-xdist pytest-benchmark pytest-html

Environment Setup¶

Clone the repository and navigate to the project root
Install dependencies as shown above

Set up test environment variables (if needed):

export AGENT_API_URL=http://localhost:5001/execute_task

Running the Full Test Suite¶

Basic Test Execution¶

Run all tests in the project:

# Run all tests
pytest

# Run with verbose output
pytest -v

# Run with more detailed output
pytest -vv

Test Discovery¶

Pytest automatically discovers tests based on naming conventions: - Files named test_*.py or *_test.py - Functions named test_* - Classes named Test*

Running Specific Test Categories¶

Unit Tests¶

Run only unit tests (fast execution):

# Run specific test file
pytest tests/test_cli.py

# Run specific test function
pytest tests/test_cli.py::test_scenario_loading

# Run tests matching a pattern
pytest -k "scenario"

Scenario Compliance Tests¶

Run compliance and schema validation tests (validates all scenario files against AES spec):

# Run scenario compliance tests
pytest tests/test_scenario_compliance.py

# Run with detailed output
pytest tests/test_scenario_compliance.py -v

### Stability & Hardening Tests (v1.1+)

Run the dedicated suite for verifying core stability fixes (dot-notation, judge guarding, etc.):

```bash
pytest tests/test_stability.py

Integration Tests¶

Run integration tests (may require external services):

# Run integration tests (if marked with @pytest.mark.integration)
pytest -m integration

# Run tests that require external services
pytest -m "not slow"

Coverage Reporting¶

Basic Coverage¶

Generate coverage reports:

# Run tests with coverage
pytest --cov=eval_runner

# Generate HTML coverage report
pytest --cov=eval_runner --cov-report=html

# Generate XML coverage report (for CI/CD)
pytest --cov=eval_runner --cov-report=xml

Coverage Configuration¶

Create a .coveragerc file for custom coverage settings:

[run]
source = eval_runner
omit = 
    */tests/*
    */__pycache__/*
    setup.py

[report]
exclude_lines =
    pragma: no cover
    def __repr__
    raise AssertionError
    raise NotImplementedError

Coverage Analysis¶

Analyze coverage results:

# Show coverage summary
pytest --cov=eval_runner --cov-report=term-missing

# Generate detailed HTML report
pytest --cov=eval_runner --cov-report=html
# Open htmlcov/index.html in your browser

Performance Testing¶

Basic Performance Testing¶

Run performance benchmarks:

# Install pytest-benchmark
pip install pytest-benchmark

# Run performance tests
pytest --benchmark-only

# Run with performance comparison
pytest --benchmark-compare

Performance Test Examples¶

# Example performance test
def test_evaluation_performance(benchmark):
    def run_evaluation():
        # Your evaluation code here
        pass

    result = benchmark(run_evaluation)
    assert result.stats.mean < 1.0  # Should complete in under 1 second

Test Output and Reporting¶

Console Output¶

Different output formats for different needs:

# Minimal output
pytest -q

# Verbose output
pytest -v

# Very verbose output
pytest -vv

# Show local variables on failures
pytest -l

HTML Reports¶

Generate HTML test reports:

# Install pytest-html
pip install pytest-html

# Generate HTML report
pytest --html=report.html --self-contained-html

JUnit XML Reports¶

Generate XML reports for CI/CD integration:

# Generate JUnit XML report
pytest --junitxml=test-results.xml

Continuous Integration¶

GitHub Actions Example¶

name: Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: 3.9
    - name: Install dependencies
      run: |
        pip install pytest pytest-cov pytest-mock jsonschema
    - name: Run tests
      run: |
        pytest --cov=eval_runner --cov-report=xml
    - name: Upload coverage
      uses: codecov/codecov-action@v1

Local CI Simulation¶

Simulate CI environment locally:

# Run tests in isolated environment
python -m pytest --strict-markers

# Run with coverage and fail if coverage is low
pytest --cov=eval_runner --cov-fail-under=80

Troubleshooting¶

Common Issues¶

Import Errors: Ensure you're running from the project root
Missing Dependencies: Install all required packages
Schema Validation Failures: Check scenario file formats
Slow Tests: Use pytest -m "not slow" to skip slow tests

Debug Mode¶

Run tests in debug mode:

# Run with debug output
pytest --tb=long

# Run with pdb on failures
pytest --pdb

# Run with pdb on all tests
pytest --pdbcls=IPython.terminal.debugger:Pdb

Test Isolation¶

Ensure tests don't interfere with each other:

# Run tests in random order
pytest --random-order

# Run tests in parallel (if supported)
pytest -n auto

Best Practices¶

Running Tests Efficiently¶

Use markers to categorize tests: @pytest.mark.unit, @pytest.mark.integration
Run specific test categories during development
Use coverage reports to identify untested code
Run full suite before committing changes

Test Environment¶

Use virtual environments to isolate dependencies
Mock external services to avoid dependencies
Use test databases for integration tests
Clean up resources after tests complete