cherry-studio/plan.md

# Agent Service Refactoring Plan

## Objective

The goal is to completely rewrite the agent execution flow for both backend (`src/main/services/agent/`) and frontend (`src/renderer/src/pages/cherry-agent/`). We will move from a model that can run any arbitrary shell command to a more secure and specialized model that **only** executes the `agent.py` script to process user prompts. This ensures that user input is always treated as data for the agent, not as a command to be executed by the shell.

@agent.py is the agent script file
@agent.log is an example output of the agent execute.

## High-Level Plan

The complete rewrite will involve these key areas:

1.  **Introduce a dedicated `AgentExecutionService`:** This new service on the main process will be the single point of control for running the Python agent.
2.  **Secure the Command Executor:** We will modify the existing `commandExecutor.ts` to prevent shell injection vulnerabilities by no longer using a shell to wrap the command.
3.  **Update Session Management:** The database schema and logic will be updated to handle the `session_id` generated by `agent.py`, allowing for conversation continuity.
4.  **Rewrite Frontend Components:** All UI components will be updated to work with the new prompt-based flow instead of command execution.
5.  **Adapt IPC & Communication:** The communication between the renderer and the main process will be updated to pass prompts instead of raw commands.

---

## Detailed Implementation Steps

### 1. Backend Refactoring (`src/main/services/agent`)

#### A. Create `AgentExecutionService.ts`

This new service will orchestrate the agent's execution.

-   **File:** `src/main/services/agent/AgentExecutionService.ts`
-   **Purpose:** To bridge the gap between incoming user prompts and the execution of the `agent.py` script.
-   **Key Method:** `public async runAgent(sessionId: string, prompt: string): Promise<void>`
    -   This method will use `AgentService` to fetch the session and its associated agent details (instructions, working directory, etc.).
    -   It will determine the path to the `python` executable and the `agent.py` script. The path to `agent.py` should be a constant relative to the application root to prevent security issues.
    -   It will construct the argument list for `agent.py` based on the fetched data:
        -   `--prompt`: The user's input `prompt`.
        -   `--system-prompt`: The agent's `instructions`.
        -   `--cwd`: The session's `accessible_paths[0]`.
        -   `--session-id`: The `claude_session_id` stored in our session record (more on this in step 3). If it's the first turn, this argument is omitted.
    -   It will then call the refactored `pocCommandExecutor` to run the script.
    -   It will be responsible for parsing the `stdout` of the script on the first run to capture the newly created `claude_session_id` and update the database.

#### B. Refactor `commandExecutor.ts`

To enhance security, we will change how commands are executed.

-   **File:** `src/main/services/agent/commandExecutor.ts`
-   **Change:** Modify `executeCommand` to avoid using a shell (`bash -c`, `cmd /c`).
-   **New Signature (suggestion):** `executeCommand(id: string, executable: string, args: string[], workingDirectory: string)`
-   **Implementation:**
    -   The `spawn` function from `child_process` will be called directly with the executable and its arguments: `spawn(executable, args, { cwd: workingDirectory, ... })`.
    -   This completely bypasses the shell, eliminating the risk of command injection from the arguments. The `getShellCommand` method will no longer be needed for this workflow.

#### C. Update IPC Handling (`src/main/index.ts`)

Communication from the frontend needs to be adapted.

-   **Action:** Create a new, dedicated IPC channel, for example, `IpcChannel.Agent_Run`.
-   **Payload:** This channel will accept a structured object: `{ sessionId: string, prompt: string }`.
-   **Handler:** The main process handler for this channel will simply call `agentExecutionService.runAgent(sessionId, prompt)`. The existing `IpcChannel.Poc_CommandOutput` can be reused to stream the log output back to the UI.

### 2. Database and Data Model Changes

To manage the lifecycle of agent conversations, we need to track the session ID from `agent.py`.

-   **File:** `src/main/services/agent/queries.ts`
    -   **Action:** Add a new nullable field `claude_session_id TEXT` to the `sessions` table schema.

-   **File:** `src/main/services/agent/types.ts`
    -   **Action:** Add the optional `claude_session_id?: string` field to the `SessionEntity` and `SessionResponse` interfaces.

-   **File:** `src/main/services/agent/AgentService.ts`
    -   **Action:** Update the `createSession`, `updateSession`, and `getSessionById` methods to handle the new `claude_session_id` field.
    -   Add a new method like `updateSessionClaudeId(sessionId: string, claudeSessionId: string)` to be called by the `AgentExecutionService`.

### 3. Frontend Refactoring (`src/renderer`)

Finally, we'll update the UI to send prompts instead of commands.

-   **File:** `src/renderer/src/hooks/usePocCommand.ts` (to be renamed/refactored as `useAgentCommand.ts`)
    -   **Action:** Complete rewrite of the command execution logic. Instead of sending a command string, it will now invoke the new IPC channel: `window.api.agent.run(sessionId, prompt)`.
    -   **New Interface:** The hook will expose methods for prompt submission rather than command execution.

-   **File:** `src/renderer/src/pages/cherry-agent/CherryAgentPage.tsx`
    -   **Action:** Rewrite the main page component to work with prompt-based flow.
    -   The text from the command input will now be treated as the `prompt`.
    -   The function will call the refactored hook with the current session ID and the prompt: `agentCommandHook.run(agentManagement.currentSession.id, prompt)`.
    -   The `workingDirectory` will no longer be passed from the frontend, as it's now part of the session data managed by the backend.

-   **Component Updates:** All components in `src/renderer/src/pages/cherry-agent/components/` will need updates:
    -   **`EnhancedCommandInput.tsx`:** Rename to `EnhancedPromptInput.tsx` and update to handle prompt submission instead of command execution.
    -   **`PocMessageBubble.tsx` and `PocMessageList.tsx`:** Update to display prompt/response pairs instead of command/output pairs.
    -   **Session management components:** Update to work with new session schema including `claude_session_id`.

## New Data Flow

The execution flow will be transformed as follows:

-   **Before:**
    `UI Input -> (command string) -> IPC -> ShellCommandExecutor -> Spawns Shell -> Executes Command`

-   **After:**
    `UI Input -> (prompt string) -> IPC({sessionId, prompt}) -> AgentExecutionService -> Constructs Args -> commandExecutor -> Spawns 'python' with args -> Executes agent.py`

## Security & Error Handling Improvements

### Security Enhancements
- **Path validation**: Ensure `agent.py` path is validated and cannot be manipulated
- **Argument sanitization**: Validate all arguments passed to `agent.py` to prevent injection
- **No shell execution**: Direct process spawning eliminates shell injection vulnerabilities
- **Resource limits**: Consider implementing timeout and resource constraints for agent processes

### Error Handling & Recovery
- **Agent script validation**: Verify `agent.py` exists and is accessible before execution
- **Process monitoring**: Handle agent crashes, timeouts, and unexpected terminations
- **Session recovery**: Graceful handling of orphaned sessions and Claude session mismatches
- **Structured error responses**: Clear error messaging for different failure scenarios

### Observability
- **Structured logging**: Comprehensive logging throughout the agent execution pipeline
- **Performance tracking**: Monitor agent execution times and resource usage
- **Health checks**: Periodic validation of agent system functionality

## Migration Strategy

### Backward Compatibility
- **Database migration**: Handle existing sessions without `claude_session_id`
- **Component migration**: Gradual update of UI components to new prompt-based interface
- **Testing strategy**: Comprehensive testing of both old and new flows during transition

### Rollout Plan
1. **Backend first**: Implement new `AgentExecutionService` with feature flag
2. **Database schema**: Add `claude_session_id` field with migration
3. **Frontend components**: Update components one by one
4. **IPC integration**: Connect new frontend to new backend
5. **Cleanup**: Remove old command execution code once migration is complete