Back to all docs

Testing Strategy for Repository Chat App

Context: iMessage-like chat interface for executing bash commands in git repositories. Terminal-only development environment (SSH to EC2, no GUI during dev).

Problem: How do we test interactive chat functionality (click contact → type command → see output) when developing via SSH without a GUI?


Requirements

  1. Terminal-only execution - Must run via SSH, no GUI required
  2. Fast feedback - Quick smoke tests for rapid iteration
  3. Interactive validation - Verify chat UI works (send message, receive response)
  4. Regression detection - Catch when changes break existing flows
  5. API validation - Ensure backend correctly executes commands in repos
  6. Minimal dependencies - Keep it simple, vanilla approach

Current State

What we have:

What we're missing:

Gap: Our smoke tests validate structure but not user interactions.


Layered Testing Strategy

Layer 1: Smoke Tests (Current - Fast Validation)

Tool: Bash script with curl Runtime: ~5 seconds When: After every change, in pre-commit hook Coverage: HTTP, HTML structure, API endpoints

What it catches:

What it misses:

File: smoke-test.sh (23 tests)


Layer 2: Browser Integration Tests (Needed)

Tool: Puppeteer (headless Chrome) Runtime: ~15-30 seconds When: Before committing feature changes Coverage: Full user flows, JavaScript execution, UI interactions

Critical flows to test:

Flow 1: Contact List → Chat

test('navigate from contacts to chat', async () => {
  await page.goto('http://localhost:8000/contacts.html');

  // Wait for repos to load
  await page.waitForSelector('.contact-item');

  // Click first contact
  await page.click('.contact-item');

  // Should navigate to chat
  await page.waitForSelector('.message-input');

  // Should show repo name in header
  const repoName = await page.$eval('.contact-name', el => el.textContent);
  expect(repoName).toBeTruthy();
});

Flow 2: Execute Command → See Output

test('execute bash command and see output', async () => {
  // Navigate to a specific repo chat
  const contactId = 'L2hvbWUvdWJ1bnR1L3dvcmtwbGFjZS9BaGFpYUFwcC9pZGU=';
  await page.goto(`http://localhost:8000/index.html?contact=${contactId}`);

  // Wait for chat to load
  await page.waitForSelector('.message-input');

  // Type a command
  await page.type('.message-input', 'ls');

  // Press Enter
  await page.keyboard.press('Enter');

  // Should show typing indicator
  await page.waitForSelector('.typing-indicator[style*="flex"]', { timeout: 1000 });

  // Wait for response message
  await page.waitForSelector('.message-group.received .message-bubble', { timeout: 5000 });

  // Response should contain output
  const output = await page.$eval('.message-group.received:last-of-type .message-bubble', el => el.textContent);
  expect(output).toContain('package.json'); // Should list files
});

Flow 3: Back Button Navigation

test('back button returns to contacts list', async () => {
  const contactId = 'L2hvbWUvdWJ1bnR1L3dvcmtwbGFjZS9BaGFpYUFwcC9pZGU=';
  await page.goto(`http://localhost:8000/index.html?contact=${contactId}`);

  // Click back button
  await page.click('.back-button');

  // Should navigate to contacts list
  await page.waitForSelector('.contacts-list');
  expect(page.url()).toContain('contacts.html');
});

Flow 4: Error Handling

test('show stderr in red bubble', async () => {
  const contactId = 'L2hvbWUvdWJ1bnR1L3dvcmtwbGFjZS9BaGFpYUFwcC9pZGU=';
  await page.goto(`http://localhost:8000/index.html?contact=${contactId}`);

  await page.waitForSelector('.message-input');

  // Type invalid command
  await page.type('.message-input', 'nonexistentcommand123');
  await page.keyboard.press('Enter');

  // Wait for error response
  await page.waitForSelector('.error-bubble', { timeout: 5000 });

  // Error bubble should be red
  const bgColor = await page.$eval('.error-bubble', el =>
    window.getComputedStyle(el).backgroundColor
  );
  expect(bgColor).toContain('255, 59, 48'); // iOS red
});

Flow 5: Multiple Commands

test('execute multiple commands in sequence', async () => {
  const contactId = 'L2hvbWUvdWJ1bnR1L3dvcmtwbGFjZS9BaGFpYUFwcC9pZGU=';
  await page.goto(`http://localhost:8000/index.html?contact=${contactId}`);

  await page.waitForSelector('.message-input');

  // First command
  await page.type('.message-input', 'pwd');
  await page.keyboard.press('Enter');
  await page.waitForSelector('.message-group.received', { timeout: 5000 });

  // Second command
  await page.type('.message-input', 'git status');
  await page.keyboard.press('Enter');
  await page.waitForFunction(() =>
    document.querySelectorAll('.message-group.received').length >= 2,
    { timeout: 5000 }
  );

  // Should have both responses
  const messageCount = await page.$$eval('.message-group.received', els => els.length);
  expect(messageCount).toBeGreaterThanOrEqual(2);
});

File: browser-test.js (to be created)


Layer 3: Manual Testing

Tool: iPhone Safari / Desktop browser When: After deploying or for major UI changes Purpose: Visual validation, native feel, iOS-specific issues

Checklist:


Implementation Plan

File Structure

/home/ubuntu/workplace/AhaiaApp/ide/
├── smoke-test.sh          # Layer 1: Curl tests (23 tests) ✅
├── browser-test.js        # Layer 2: Puppeteer tests (5 flows) ❌ TO ADD
├── package.json           # npm scripts
├── screenshots/           # Generated screenshots for review
└── TESTING.md            # Testing guide for developers

Scripts to Add

package.json:

{
  "scripts": {
    "test": "npm run test:smoke && npm run test:browser",
    "test:smoke": "./smoke-test.sh",
    "test:browser": "node browser-test.js",
    "test:watch": "nodemon --exec npm run test:browser"
  },
  "devDependencies": {
    "puppeteer": "^21.0.0"
  }
}

Comparison: Chat App vs Yap Blog

Aspect Yap Blog Chat App Difference
Complexity Static markdown rendering Interactive chat UI + backend Higher
JavaScript Minimal (mostly server-side) Significant (chat logic, API calls) More critical
User Flows View doc, click link, go back Click contact, type command, see output More interactive
Backend Simple Express file server API with command execution More complex
Testing Gap Smoke tests catch most issues Smoke tests miss UI interactions Needs Layer 2

Conclusion: Chat app needs Layer 2 (browser tests) more than yap blog because:

  1. More JavaScript-dependent
  2. Interactive forms (message input)
  3. Real-time API calls
  4. State management (current contact, messages)
  5. Navigation between views

Phase 1: Add Browser Tests (Now)

Priority flows:

  1. ✅ Contact list loads
  2. ✅ Click contact → Opens chat
  3. ✅ Type command + Enter → See output
  4. ✅ Back button navigation
  5. ✅ Error handling (stderr in red)

Why now:

Phase 2: Expand Coverage (Later)

Additional tests:


Workflow

During Development:

# 1. Make changes
vim app-repo.js

# 2. Quick validation (5 sec)
npm run test:smoke

# 3. If UI changes, run browser tests (30 sec)
npm run test:browser

# 4. Iterate until tests pass

Before Committing:

# 1. Run full test suite
npm test

# 2. Review screenshots if generated
ls screenshots/

# 3. Commit if tests pass
git add .
git commit -m "Add feature"
# → Pre-commit hook runs smoke tests automatically

Before Deploying to iPhone:

# 1. Ensure all tests pass
npm test

# 2. Deploy (systemd restart happens automatically)

# 3. Test on iPhone
# - Open Safari to http://YOUR_IP:8000/contacts.html
# - Click through flows manually
# - Check dark mode, PWA install

Trade-offs

What we're doing:

What we're NOT doing (and why):

Philosophy: Vanilla approach extends to testing. Use simple, proven tools (bash + Puppeteer) rather than heavy frameworks.


Success Metrics

We'll know this works if:

  1. ✅ Smoke tests run in < 5 seconds
  2. ✅ Browser tests run in < 30 seconds
  3. ✅ We catch broken interactions before pushing
  4. ✅ Tests don't slow down iteration
  5. ✅ We actually run them (not too annoying/complex)
  6. ✅ Confidence to refactor without breaking things

Next Steps

  1. Install Puppeteer: npm install --save-dev puppeteer
  2. Create browser-test.js with 5 critical flows
  3. Update package.json with test scripts
  4. Run tests and fix any failures
  5. Update pre-commit hook to run both layers
  6. Document in TESTING.md for future reference

Why This Matters

Current risk without Layer 2:

With Layer 2:


Last updated: November 2025 Approach based on proven strategy from ~/yap/design-doc.md