Files
cherry-studio/plan.md
T
Vaayne 8ab26e4e45 feat: implement secure AgentExecutionService for controlled agent.py execution
- Create new AgentExecutionService.ts with secure agent.py script execution
- Replace arbitrary shell command execution with controlled Python script calls
- Add claude_session_id field to session types for conversation continuity
- Update shared types between main and renderer processes
- Implement proper argument validation and sanitization
- Add comprehensive error handling and logging
- Export service through agent service index

Security improvements:
- Only executes predefined agent.py script (no arbitrary commands)
- Uses direct process spawning instead of shell execution
- Validates all arguments before execution
- Prevents command injection vulnerabilities

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-03 17:52:01 +08:00

8.5 KiB

Agent Service Refactoring Plan

Objective

The goal is to completely rewrite the agent execution flow for both backend (src/main/services/agent/) and frontend (src/renderer/src/pages/cherry-agent/). We will move from a model that can run any arbitrary shell command to a more secure and specialized model that only executes the agent.py script to process user prompts. This ensures that user input is always treated as data for the agent, not as a command to be executed by the shell.

@agent.py is the agent script file @agent.log is an example output of the agent execute.

High-Level Plan

The complete rewrite will involve these key areas:

  1. Introduce a dedicated AgentExecutionService: This new service on the main process will be the single point of control for running the Python agent.
  2. Secure the Command Executor: We will modify the existing commandExecutor.ts to prevent shell injection vulnerabilities by no longer using a shell to wrap the command.
  3. Update Session Management: The database schema and logic will be updated to handle the session_id generated by agent.py, allowing for conversation continuity.
  4. Rewrite Frontend Components: All UI components will be updated to work with the new prompt-based flow instead of command execution.
  5. Adapt IPC & Communication: The communication between the renderer and the main process will be updated to pass prompts instead of raw commands.

Detailed Implementation Steps

1. Backend Refactoring (src/main/services/agent)

A. Create AgentExecutionService.ts

This new service will orchestrate the agent's execution.

  • File: src/main/services/agent/AgentExecutionService.ts
  • Purpose: To bridge the gap between incoming user prompts and the execution of the agent.py script.
  • Key Method: public async runAgent(sessionId: string, prompt: string): Promise<void>
    • This method will use AgentService to fetch the session and its associated agent details (instructions, working directory, etc.).
    • It will determine the path to the python executable and the agent.py script. The path to agent.py should be a constant relative to the application root to prevent security issues.
    • It will construct the argument list for agent.py based on the fetched data:
      • --prompt: The user's input prompt.
      • --system-prompt: The agent's instructions.
      • --cwd: The session's accessible_paths[0].
      • --session-id: The claude_session_id stored in our session record (more on this in step 3). If it's the first turn, this argument is omitted.
    • It will then call the refactored pocCommandExecutor to run the script.
    • It will be responsible for parsing the stdout of the script on the first run to capture the newly created claude_session_id and update the database.

B. Refactor commandExecutor.ts

To enhance security, we will change how commands are executed.

  • File: src/main/services/agent/commandExecutor.ts
  • Change: Modify executeCommand to avoid using a shell (bash -c, cmd /c).
  • New Signature (suggestion): executeCommand(id: string, executable: string, args: string[], workingDirectory: string)
  • Implementation:
    • The spawn function from child_process will be called directly with the executable and its arguments: spawn(executable, args, { cwd: workingDirectory, ... }).
    • This completely bypasses the shell, eliminating the risk of command injection from the arguments. The getShellCommand method will no longer be needed for this workflow.

C. Update IPC Handling (src/main/index.ts)

Communication from the frontend needs to be adapted.

  • Action: Create a new, dedicated IPC channel, for example, IpcChannel.Agent_Run.
  • Payload: This channel will accept a structured object: { sessionId: string, prompt: string }.
  • Handler: The main process handler for this channel will simply call agentExecutionService.runAgent(sessionId, prompt). The existing IpcChannel.Poc_CommandOutput can be reused to stream the log output back to the UI.

2. Database and Data Model Changes

To manage the lifecycle of agent conversations, we need to track the session ID from agent.py.

  • File: src/main/services/agent/queries.ts

    • Action: Add a new nullable field claude_session_id TEXT to the sessions table schema.
  • File: src/main/services/agent/types.ts

    • Action: Add the optional claude_session_id?: string field to the SessionEntity and SessionResponse interfaces.
  • File: src/main/services/agent/AgentService.ts

    • Action: Update the createSession, updateSession, and getSessionById methods to handle the new claude_session_id field.
    • Add a new method like updateSessionClaudeId(sessionId: string, claudeSessionId: string) to be called by the AgentExecutionService.

3. Frontend Refactoring (src/renderer)

Finally, we'll update the UI to send prompts instead of commands.

  • File: src/renderer/src/hooks/usePocCommand.ts (to be renamed/refactored as useAgentCommand.ts)

    • Action: Complete rewrite of the command execution logic. Instead of sending a command string, it will now invoke the new IPC channel: window.api.agent.run(sessionId, prompt).
    • New Interface: The hook will expose methods for prompt submission rather than command execution.
  • File: src/renderer/src/pages/cherry-agent/CherryAgentPage.tsx

    • Action: Rewrite the main page component to work with prompt-based flow.
    • The text from the command input will now be treated as the prompt.
    • The function will call the refactored hook with the current session ID and the prompt: agentCommandHook.run(agentManagement.currentSession.id, prompt).
    • The workingDirectory will no longer be passed from the frontend, as it's now part of the session data managed by the backend.
  • Component Updates: All components in src/renderer/src/pages/cherry-agent/components/ will need updates:

    • EnhancedCommandInput.tsx: Rename to EnhancedPromptInput.tsx and update to handle prompt submission instead of command execution.
    • PocMessageBubble.tsx and PocMessageList.tsx: Update to display prompt/response pairs instead of command/output pairs.
    • Session management components: Update to work with new session schema including claude_session_id.

New Data Flow

The execution flow will be transformed as follows:

  • Before: UI Input -> (command string) -> IPC -> ShellCommandExecutor -> Spawns Shell -> Executes Command

  • After: UI Input -> (prompt string) -> IPC({sessionId, prompt}) -> AgentExecutionService -> Constructs Args -> commandExecutor -> Spawns 'python' with args -> Executes agent.py

Security & Error Handling Improvements

Security Enhancements

  • Path validation: Ensure agent.py path is validated and cannot be manipulated
  • Argument sanitization: Validate all arguments passed to agent.py to prevent injection
  • No shell execution: Direct process spawning eliminates shell injection vulnerabilities
  • Resource limits: Consider implementing timeout and resource constraints for agent processes

Error Handling & Recovery

  • Agent script validation: Verify agent.py exists and is accessible before execution
  • Process monitoring: Handle agent crashes, timeouts, and unexpected terminations
  • Session recovery: Graceful handling of orphaned sessions and Claude session mismatches
  • Structured error responses: Clear error messaging for different failure scenarios

Observability

  • Structured logging: Comprehensive logging throughout the agent execution pipeline
  • Performance tracking: Monitor agent execution times and resource usage
  • Health checks: Periodic validation of agent system functionality

Migration Strategy

Backward Compatibility

  • Database migration: Handle existing sessions without claude_session_id
  • Component migration: Gradual update of UI components to new prompt-based interface
  • Testing strategy: Comprehensive testing of both old and new flows during transition

Rollout Plan

  1. Backend first: Implement new AgentExecutionService with feature flag
  2. Database schema: Add claude_session_id field with migration
  3. Frontend components: Update components one by one
  4. IPC integration: Connect new frontend to new backend
  5. Cleanup: Remove old command execution code once migration is complete