Files
cherry-studio/plan.md
T
Vaayne 8ab26e4e45 feat: implement secure AgentExecutionService for controlled agent.py execution
- Create new AgentExecutionService.ts with secure agent.py script execution
- Replace arbitrary shell command execution with controlled Python script calls
- Add claude_session_id field to session types for conversation continuity
- Update shared types between main and renderer processes
- Implement proper argument validation and sanitization
- Add comprehensive error handling and logging
- Export service through agent service index

Security improvements:
- Only executes predefined agent.py script (no arbitrary commands)
- Uses direct process spawning instead of shell execution
- Validates all arguments before execution
- Prevents command injection vulnerabilities

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-08-03 17:52:01 +08:00

137 lines
8.5 KiB
Markdown

# Agent Service Refactoring Plan
## Objective
The goal is to completely rewrite the agent execution flow for both backend (`src/main/services/agent/`) and frontend (`src/renderer/src/pages/cherry-agent/`). We will move from a model that can run any arbitrary shell command to a more secure and specialized model that **only** executes the `agent.py` script to process user prompts. This ensures that user input is always treated as data for the agent, not as a command to be executed by the shell.
@agent.py is the agent script file
@agent.log is an example output of the agent execute.
## High-Level Plan
The complete rewrite will involve these key areas:
1. **Introduce a dedicated `AgentExecutionService`:** This new service on the main process will be the single point of control for running the Python agent.
2. **Secure the Command Executor:** We will modify the existing `commandExecutor.ts` to prevent shell injection vulnerabilities by no longer using a shell to wrap the command.
3. **Update Session Management:** The database schema and logic will be updated to handle the `session_id` generated by `agent.py`, allowing for conversation continuity.
4. **Rewrite Frontend Components:** All UI components will be updated to work with the new prompt-based flow instead of command execution.
5. **Adapt IPC & Communication:** The communication between the renderer and the main process will be updated to pass prompts instead of raw commands.
---
## Detailed Implementation Steps
### 1. Backend Refactoring (`src/main/services/agent`)
#### A. Create `AgentExecutionService.ts`
This new service will orchestrate the agent's execution.
- **File:** `src/main/services/agent/AgentExecutionService.ts`
- **Purpose:** To bridge the gap between incoming user prompts and the execution of the `agent.py` script.
- **Key Method:** `public async runAgent(sessionId: string, prompt: string): Promise<void>`
- This method will use `AgentService` to fetch the session and its associated agent details (instructions, working directory, etc.).
- It will determine the path to the `python` executable and the `agent.py` script. The path to `agent.py` should be a constant relative to the application root to prevent security issues.
- It will construct the argument list for `agent.py` based on the fetched data:
- `--prompt`: The user's input `prompt`.
- `--system-prompt`: The agent's `instructions`.
- `--cwd`: The session's `accessible_paths[0]`.
- `--session-id`: The `claude_session_id` stored in our session record (more on this in step 3). If it's the first turn, this argument is omitted.
- It will then call the refactored `pocCommandExecutor` to run the script.
- It will be responsible for parsing the `stdout` of the script on the first run to capture the newly created `claude_session_id` and update the database.
#### B. Refactor `commandExecutor.ts`
To enhance security, we will change how commands are executed.
- **File:** `src/main/services/agent/commandExecutor.ts`
- **Change:** Modify `executeCommand` to avoid using a shell (`bash -c`, `cmd /c`).
- **New Signature (suggestion):** `executeCommand(id: string, executable: string, args: string[], workingDirectory: string)`
- **Implementation:**
- The `spawn` function from `child_process` will be called directly with the executable and its arguments: `spawn(executable, args, { cwd: workingDirectory, ... })`.
- This completely bypasses the shell, eliminating the risk of command injection from the arguments. The `getShellCommand` method will no longer be needed for this workflow.
#### C. Update IPC Handling (`src/main/index.ts`)
Communication from the frontend needs to be adapted.
- **Action:** Create a new, dedicated IPC channel, for example, `IpcChannel.Agent_Run`.
- **Payload:** This channel will accept a structured object: `{ sessionId: string, prompt: string }`.
- **Handler:** The main process handler for this channel will simply call `agentExecutionService.runAgent(sessionId, prompt)`. The existing `IpcChannel.Poc_CommandOutput` can be reused to stream the log output back to the UI.
### 2. Database and Data Model Changes
To manage the lifecycle of agent conversations, we need to track the session ID from `agent.py`.
- **File:** `src/main/services/agent/queries.ts`
- **Action:** Add a new nullable field `claude_session_id TEXT` to the `sessions` table schema.
- **File:** `src/main/services/agent/types.ts`
- **Action:** Add the optional `claude_session_id?: string` field to the `SessionEntity` and `SessionResponse` interfaces.
- **File:** `src/main/services/agent/AgentService.ts`
- **Action:** Update the `createSession`, `updateSession`, and `getSessionById` methods to handle the new `claude_session_id` field.
- Add a new method like `updateSessionClaudeId(sessionId: string, claudeSessionId: string)` to be called by the `AgentExecutionService`.
### 3. Frontend Refactoring (`src/renderer`)
Finally, we'll update the UI to send prompts instead of commands.
- **File:** `src/renderer/src/hooks/usePocCommand.ts` (to be renamed/refactored as `useAgentCommand.ts`)
- **Action:** Complete rewrite of the command execution logic. Instead of sending a command string, it will now invoke the new IPC channel: `window.api.agent.run(sessionId, prompt)`.
- **New Interface:** The hook will expose methods for prompt submission rather than command execution.
- **File:** `src/renderer/src/pages/cherry-agent/CherryAgentPage.tsx`
- **Action:** Rewrite the main page component to work with prompt-based flow.
- The text from the command input will now be treated as the `prompt`.
- The function will call the refactored hook with the current session ID and the prompt: `agentCommandHook.run(agentManagement.currentSession.id, prompt)`.
- The `workingDirectory` will no longer be passed from the frontend, as it's now part of the session data managed by the backend.
- **Component Updates:** All components in `src/renderer/src/pages/cherry-agent/components/` will need updates:
- **`EnhancedCommandInput.tsx`:** Rename to `EnhancedPromptInput.tsx` and update to handle prompt submission instead of command execution.
- **`PocMessageBubble.tsx` and `PocMessageList.tsx`:** Update to display prompt/response pairs instead of command/output pairs.
- **Session management components:** Update to work with new session schema including `claude_session_id`.
## New Data Flow
The execution flow will be transformed as follows:
- **Before:**
`UI Input -> (command string) -> IPC -> ShellCommandExecutor -> Spawns Shell -> Executes Command`
- **After:**
`UI Input -> (prompt string) -> IPC({sessionId, prompt}) -> AgentExecutionService -> Constructs Args -> commandExecutor -> Spawns 'python' with args -> Executes agent.py`
## Security & Error Handling Improvements
### Security Enhancements
- **Path validation**: Ensure `agent.py` path is validated and cannot be manipulated
- **Argument sanitization**: Validate all arguments passed to `agent.py` to prevent injection
- **No shell execution**: Direct process spawning eliminates shell injection vulnerabilities
- **Resource limits**: Consider implementing timeout and resource constraints for agent processes
### Error Handling & Recovery
- **Agent script validation**: Verify `agent.py` exists and is accessible before execution
- **Process monitoring**: Handle agent crashes, timeouts, and unexpected terminations
- **Session recovery**: Graceful handling of orphaned sessions and Claude session mismatches
- **Structured error responses**: Clear error messaging for different failure scenarios
### Observability
- **Structured logging**: Comprehensive logging throughout the agent execution pipeline
- **Performance tracking**: Monitor agent execution times and resource usage
- **Health checks**: Periodic validation of agent system functionality
## Migration Strategy
### Backward Compatibility
- **Database migration**: Handle existing sessions without `claude_session_id`
- **Component migration**: Gradual update of UI components to new prompt-based interface
- **Testing strategy**: Comprehensive testing of both old and new flows during transition
### Rollout Plan
1. **Backend first**: Implement new `AgentExecutionService` with feature flag
2. **Database schema**: Add `claude_session_id` field with migration
3. **Frontend components**: Update components one by one
4. **IPC integration**: Connect new frontend to new backend
5. **Cleanup**: Remove old command execution code once migration is complete