Release 1.1.0: Add Import Audio Files feature

- New Import tab with drag-and-drop support for audio files
- Support for 8 formats: MP3, MP4, WAV, M4A, FLAC, OGG, AAC, WMA
- File metadata display (duration, size, format)
- Editable meeting titles
- Progress tracking with visual indicators
- Smart template selection
- Auto-navigation to Transcription view
- Updated README with BlackHole requirement and Teams config
- Added get_audio_metadata Rust command
- Version bump to 1.1.0
This commit is contained in:
michael.borak
2026-01-21 09:08:56 +01:00
parent 79f509951c
commit a06e473e85
12 changed files with 1041 additions and 171 deletions

View File

@@ -7,6 +7,7 @@
## ✨ Features ## ✨ Features
* **🎙️ Dual-Channel Recording**: seamlessly capture your voice and meeting audio from apps like Microsoft Teams, Zoom, or Google Meet. * **🎙️ Dual-Channel Recording**: seamlessly capture your voice and meeting audio from apps like Microsoft Teams, Zoom, or Google Meet.
* **📁 Import Audio Files**: Upload existing recordings (MP3, MP4, WAV, M4A, FLAC, OGG, AAC, WMA) for transcription and summarization.
* **📅 Microsoft 365 Integration**: * **📅 Microsoft 365 Integration**:
* **Upcoming Meetings**: View your daily schedule and join with **one click**. * **Upcoming Meetings**: View your daily schedule and join with **one click**.
* **Meeting Details**: View full agenda and **invited attendee status** (Accepted/Declined). * **Meeting Details**: View full agenda and **invited attendee status** (Accepted/Declined).
@@ -24,6 +25,7 @@
### 1. Prerequisites ### 1. Prerequisites
* **macOS** (Apple Silicon or Intel). * **macOS** (Apple Silicon or Intel).
* **BlackHole 2ch Driver** (Mandatory): Download from [existential.audio](https://existential.audio/blackhole/) or run `brew install blackhole-2ch`.
* **Infomaniak AI Account**: You need an API Key and Product ID from the [Infomaniak Developer Portal](https://manager.infomaniak.com/). * **Infomaniak AI Account**: You need an API Key and Product ID from the [Infomaniak Developer Portal](https://manager.infomaniak.com/).
### 2. Installation ### 2. Installation
@@ -35,15 +37,21 @@
## 🎧 Recording System Audio (Teams, Zoom, etc.) ## 🎧 Recording System Audio (Teams, Zoom, etc.)
We've made this easy! Hearbit AI includes a built-in helper to set up your audio devices. We've made this easy! **Note: You must have the BlackHole driver installed.**
1. **Open Audio MIDI Setup**: Click the "Open Audio MIDI Setup" button in the recorder view. 1. **Create "Hearbit Audio" Device**:
2. **Create "Hearbit Audio" Device**: * Open the app and select **Meeting** mode.
* If you don't have a virtual device, click **"🪄 Create Hearbit Audio Device"** in the app (appears in Meeting mode if no device is found). * If you don't have the device yet, click the **"🪄 Create Hearbit Audio Device"** button.
* This will automatically configure a Multi-Output Device so you can record and hear at the same time. * This creates a specialized "Multi-Output Device" that routes audio to both your headphones/speakers AND the app.
3. **Select "Hearbit Audio" in Teams/Zoom**:
* In your meeting app settings (Teams/Zoom), set your **Speaker** to **Hearbit Audio**. 2. **Configure Teams / Zoom / Webex**:
* In Hearbit AI, select **Hearbit Audio** (or BlackHole) as your input. * **Speaker / Output**: Change this to **Hearbit Audio**.
* *Why?* This ensures the audio goes to the recording app *and* your ears.
* **Microphone / Input**: Leave this as your normal microphone (e.g., MacBook Pro Mic).
* *Note:* Do **not** select Hearbit Audio as your microphone in Teams.
3. **Start Recording**:
* In Hearbit AI, ensure **Hearbit Audio** is selected as the input.
--- ---

81
RELEASE_NOTES_1.1.0.md Normal file
View File

@@ -0,0 +1,81 @@
# Release Notes - Version 1.1.0
**Release Date**: January 21, 2026
## 🎉 What's New
### Import Audio Files Feature
We've added a powerful new **Import** tab that allows you to upload and process existing audio/video files!
**Key Features:**
- **Drag-and-Drop Upload**: Simply drag your audio files into the app
- **8 Format Support**: MP3, MP4, WAV, M4A, FLAC, OGG, AAC, WMA
- **Smart Metadata Display**: See file duration, size, and format before processing
- **Editable Meeting Titles**: Customize the name (defaults to filename)
- **Progress Tracking**: Visual indicators for each stage (Validating → Transcribing → Summarizing)
- **Same AI Power**: Uses the same AI templates and Smart Select as live recordings
- **Auto-Navigation**: Seamlessly transition to Transcription view when complete
**Use Cases:**
- Process pre-recorded meetings you forgot to record live
- Batch process voice memos
- Import recordings from other devices
- Archive and transcribe old meeting recordings
---
## 📝 Documentation Updates
### README Enhancements
- Added mandatory **BlackHole 2ch Driver** requirement to Prerequisites
- Clarified **Teams/Zoom configuration** (Speaker vs. Microphone settings)
- Added detailed setup instructions for meeting audio capture
---
## 🔧 Technical Improvements
- Added `get_audio_metadata` Rust command for file metadata extraction
- Improved tab navigation with new Import tab
- Enhanced error handling for file validation
- Code optimizations and cleanup
---
## 📦 Installation
Download the DMG file:
```
Hearbit_AI_1.1.0_aarch64.dmg
```
**Location**: `src-tauri/target/release/bundle/dmg/`
### First-time Installation
If you see "Hearbit AI is damaged and can't be opened":
```bash
sudo xattr -cr /Applications/Hearbit\ AI.app
```
---
## 🐛 Known Issues
None reported for this release.
---
## 🙏 Credits
Built with ❤️ by the Livtec team using Tauri, React, and TypeScript.
---
## What's Next?
Potential future enhancements:
- Meeting auto-stop when meeting ends (via M365 API)
- Batch file import
- Audio preview player
- More audio format conversions

66
src-tauri/Cargo.lock generated
View File

@@ -1739,7 +1739,7 @@ checksum = "841d1cc9bed7f9236f321df977030373f4a4163ae1a7dbfe1a51a2c1a51d9100"
[[package]] [[package]]
name = "hearbit-ai" name = "hearbit-ai"
version = "1.1.0" version = "0.1.2"
dependencies = [ dependencies = [
"chrono", "chrono",
"cpal", "cpal",
@@ -1757,6 +1757,7 @@ dependencies = [
"tauri-plugin-log", "tauri-plugin-log",
"tauri-plugin-oauth", "tauri-plugin-oauth",
"tauri-plugin-opener", "tauri-plugin-opener",
"tauri-plugin-shell",
"tokio", "tokio",
"url", "url",
"voice_activity_detector", "voice_activity_detector",
@@ -3089,6 +3090,16 @@ dependencies = [
"ureq", "ureq",
] ]
[[package]]
name = "os_pipe"
version = "1.2.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7d8fae84b431384b68627d0f9b3b1245fcf9f46f6c0e3dc902e9dce64edd1967"
dependencies = [
"libc",
"windows-sys 0.61.2",
]
[[package]] [[package]]
name = "pango" name = "pango"
version = "0.18.3" version = "0.18.3"
@@ -4361,12 +4372,44 @@ dependencies = [
"digest", "digest",
] ]
[[package]]
name = "shared_child"
version = "1.1.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1e362d9935bc50f019969e2f9ecd66786612daae13e8f277be7bfb66e8bed3f7"
dependencies = [
"libc",
"sigchld",
"windows-sys 0.60.2",
]
[[package]] [[package]]
name = "shlex" name = "shlex"
version = "1.3.0" version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index" source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64" checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
[[package]]
name = "sigchld"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "47106eded3c154e70176fc83df9737335c94ce22f821c32d17ed1db1f83badb1"
dependencies = [
"libc",
"os_pipe",
"signal-hook",
]
[[package]]
name = "signal-hook"
version = "0.3.18"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d881a16cf4426aa584979d30bd82cb33429027e42122b169753d6ef1085ed6e2"
dependencies = [
"libc",
"signal-hook-registry",
]
[[package]] [[package]]
name = "signal-hook-registry" name = "signal-hook-registry"
version = "1.4.8" version = "1.4.8"
@@ -4951,6 +4994,27 @@ dependencies = [
"zbus", "zbus",
] ]
[[package]]
name = "tauri-plugin-shell"
version = "2.3.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "39b76f884a3937e04b631ffdc3be506088fa979369d25147361352f2f352e5ed"
dependencies = [
"encoding_rs",
"log",
"open",
"os_pipe",
"regex",
"schemars 0.8.22",
"serde",
"serde_json",
"shared_child",
"tauri",
"tauri-plugin",
"thiserror 2.0.18",
"tokio",
]
[[package]] [[package]]
name = "tauri-runtime" name = "tauri-runtime"
version = "2.9.2" version = "2.9.2"

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "hearbit-ai" name = "hearbit-ai"
version = "1.1.0" version = "0.1.2"
description = "A Tauri App" description = "A Tauri App"
authors = ["you"] authors = ["you"]
edition = "2021" edition = "2021"
@@ -18,7 +18,7 @@ crate-type = ["staticlib", "cdylib", "rlib"]
tauri-build = { version = "2", features = [] } tauri-build = { version = "2", features = [] }
[dependencies] [dependencies]
tauri = { version = "2", features = [] } tauri = { version = "2", features = ["tray-icon"] }
tauri-plugin-opener = "2" tauri-plugin-opener = "2"
tauri-plugin-dialog = "2" tauri-plugin-dialog = "2"
serde = { version = "1", features = ["derive"] } serde = { version = "1", features = ["derive"] }
@@ -36,3 +36,4 @@ oauth2 = "4.4"
url = "2.5" url = "2.5"
lettre = { version = "0.11", features = ["tokio1", "tokio1-native-tls", "builder"] } lettre = { version = "0.11", features = ["tokio1", "tokio1-native-tls", "builder"] }
tauri-plugin-log = "2.0.0" tauri-plugin-log = "2.0.0"
tauri-plugin-shell = "2.3.4"

View File

@@ -110,6 +110,9 @@ func createAggregateDevice() {
} }
print("Found BlackHole 2ch (ID: \(blackHoleID))") print("Found BlackHole 2ch (ID: \(blackHoleID))")
// --- PART 1: Hearbit Audio (Input: Mic + BlackHole) ---
print("\n--- Creating 'Hearbit Audio' (Input) ---")
// Default Input // Default Input
var defaultInputID: AudioObjectID = 0 var defaultInputID: AudioObjectID = 0
var size = UInt32(MemoryLayout<AudioObjectID>.size) var size = UInt32(MemoryLayout<AudioObjectID>.size)
@@ -125,19 +128,14 @@ func createAggregateDevice() {
} }
print("Found Default Input (ID: \(defaultInputID))") print("Found Default Input (ID: \(defaultInputID))")
// Check for existing "Hearbit Audio" by UID // Check for existing "Hearbit Audio"
let targetUID = "hearbit_audio_aggregate_v1" let inputUID = "hearbit_audio_aggregate_v1"
if let existingID = findDeviceByUID(targetUID) { if let existingID = findDeviceByUID(inputUID) {
print("Found existing Hearbit Audio device (ID: \(existingID)). Destroying to recreate...") print("Found existing Hearbit Audio (ID: \(existingID)). Destroying...")
if AudioHardwareDestroyAggregateDevice(existingID) != noErr { AudioHardwareDestroyAggregateDevice(existingID)
print("Warning: Failed to destroy existing device.")
} else {
print("Existing device destroyed.")
}
Thread.sleep(forTimeInterval: 0.5) Thread.sleep(forTimeInterval: 0.5)
} }
// Build SubDevice List
guard let bhUID = getStringProperty(objectID: blackHoleID, selector: kAudioDevicePropertyDeviceUID) else { guard let bhUID = getStringProperty(objectID: blackHoleID, selector: kAudioDevicePropertyDeviceUID) else {
print("Error: Could not get BlackHole UID.") print("Error: Could not get BlackHole UID.")
exit(1) exit(1)
@@ -147,36 +145,47 @@ func createAggregateDevice() {
exit(1) exit(1)
} }
// Dedup: if Mic IS BlackHole (user set BlackHole as default), don't duplicate
var subDevicesUIDs = [bhUID] var subDevicesUIDs = [bhUID]
if micUID != bhUID { if micUID != bhUID {
subDevicesUIDs.append(micUID) subDevicesUIDs.append(micUID)
} }
let subDevicesArray = subDevicesUIDs.map { let subDevicesArray = subDevicesUIDs.map { [kAudioSubDeviceUIDKey: $0] }
[kAudioSubDeviceUIDKey: $0] let inputDesc: [String: Any] = [
}
let desc: [String: Any] = [
kAudioAggregateDeviceNameKey: "Hearbit Audio", kAudioAggregateDeviceNameKey: "Hearbit Audio",
kAudioAggregateDeviceUIDKey: targetUID, kAudioAggregateDeviceUIDKey: inputUID,
kAudioAggregateDeviceIsPrivateKey: Int(0), kAudioAggregateDeviceIsPrivateKey: Int(0),
kAudioAggregateDeviceIsStackedKey: Int(0), kAudioAggregateDeviceIsStackedKey: Int(0),
kAudioAggregateDeviceSubDeviceListKey: subDevicesArray kAudioAggregateDeviceSubDeviceListKey: subDevicesArray
] ]
print("Creating Aggregate Device with UIDs: \(subDevicesUIDs)") var outInputID: AudioObjectID = 0
let errIn = AudioHardwareCreateAggregateDevice(inputDesc as CFDictionary, &outInputID)
var outID: AudioObjectID = 0 if errIn == noErr {
let err = AudioHardwareCreateAggregateDevice(desc as CFDictionary, &outID) print("Success! Created 'Hearbit Audio' with ID: \(outInputID)")
if err == noErr {
print("Success! Created 'Hearbit Audio' with ID: \(outID)")
exit(0)
} else { } else {
print("Failed to create device. Error code: \(err) (\(err.fourCC))") print("Failed to create 'Hearbit Audio'. Error: \(errIn)")
exit(1)
} }
// --- PART 2: Cleanup Unstable "Hearbit Speakers" ---
// The previous "Hearbit Speakers" device caused MS Teams to crash.
// We strictly remove it here to restore stability.
print("\n--- Cleaning up Unstable Devices ---")
let stopOutputUID = "hearbit_speakers_aggregate_v1"
if let existingOutID = findDeviceByUID(stopOutputUID) {
print("Found unstable 'Hearbit Speakers' (ID: \(existingOutID)). Removing to fix Teams crash...")
let errDist = AudioHardwareDestroyAggregateDevice(existingOutID)
if errDist == noErr {
print("Successfully removed unstable device.")
} else {
print("Warning: Failed to remove device. Error: \(errDist)")
}
} else {
print("No unstable 'Hearbit Speakers' found. System is clean.")
}
exit(0)
} }
createAggregateDevice() createAggregateDevice()

View File

@@ -11,6 +11,9 @@ pub struct AudioProcessor {
vad_chunk_size: usize, vad_chunk_size: usize,
vad_buffer: Vec<f32>, vad_buffer: Vec<f32>,
// Audio Config
channel_count: u16,
// Resampler // Resampler
resampler: FastFixedIn<f32>, resampler: FastFixedIn<f32>,
resample_input_buffer: Vec<f32>, resample_input_buffer: Vec<f32>,
@@ -21,6 +24,9 @@ pub struct AudioProcessor {
last_speech_time: u64, // In samples or frames last_speech_time: u64, // In samples or frames
hangover_samples: u64, hangover_samples: u64,
// Waiting Mode
waiting_for_speech: bool,
// Ring Buffer (for pre-roll) // Ring Buffer (for pre-roll)
ring_buffer: Vec<f32>, ring_buffer: Vec<f32>,
ring_pos: usize, ring_pos: usize,
@@ -37,12 +43,14 @@ pub struct AudioProcessor {
impl AudioProcessor { impl AudioProcessor {
pub fn new( pub fn new(
sample_rate: u32, sample_rate: u32,
channel_count: u16,
writer: Arc<Mutex<WavWriter<std::io::BufWriter<std::fs::File>>>>, writer: Arc<Mutex<WavWriter<std::io::BufWriter<std::fs::File>>>>,
app_handle: AppHandle app_handle: AppHandle,
wait_for_speech: bool
) -> Result<Self, String> { ) -> Result<Self, String> {
let vad_sample_rate = 16000; let vad_sample_rate = 16000;
let vad_chunk_size = 512; // Silero usually likes ~30ms which is 512 at 16k? No 16000 * 0.032 = 512. let vad_chunk_size = 512;
// Initialize VAD // Initialize VAD
let vad = VoiceActivityDetector::builder() let vad = VoiceActivityDetector::builder()
@@ -51,8 +59,7 @@ impl AudioProcessor {
.build() .build()
.map_err(|e| format!("Failed to init VAD: {:?}", e))?; .map_err(|e| format!("Failed to init VAD: {:?}", e))?;
// Initialize Resampler (Input Rate -> 16000) using FastFixedIn for speed/simplicity // Initialize Resampler (Input Rate -> 16000)
// new(f_ratio, max_resample_ratio_relative, polyn_deg, chunk_size, channels)
let resampler = FastFixedIn::<f32>::new( let resampler = FastFixedIn::<f32>::new(
16000.0 / sample_rate as f64, 16000.0 / sample_rate as f64,
1.0, 1.0,
@@ -61,20 +68,26 @@ impl AudioProcessor {
1 1
).map_err(|e| format!("Failed to init Resampler: {:?}", e))?; ).map_err(|e| format!("Failed to init Resampler: {:?}", e))?;
// Pre-roll buffer (e.g. 0.5 seconds of high quality audio) // Pre-roll buffer (1.0 seconds) * Channels (interleaved store)
let ring_curr_seconds = 1.0; let ring_curr_seconds = 1.0;
let ring_size = (sample_rate as f32 * ring_curr_seconds) as usize; // WavWriter writes interleaved, so we store interleaved.
let ring_size = (sample_rate as f32 * ring_curr_seconds) as usize * channel_count as usize;
Ok(Self { Ok(Self {
vad, vad,
vad_chunk_size, vad_chunk_size,
vad_buffer: Vec::new(), vad_buffer: Vec::new(),
channel_count,
resampler, resampler,
resample_input_buffer: Vec::new(), resample_input_buffer: Vec::new(),
resample_output_buffer: Vec::new(), resample_output_buffer: Vec::new(),
is_speech_active: false, is_speech_active: false,
last_speech_time: 0, last_speech_time: 0,
hangover_samples: (sample_rate as f32 * 1.5) as u64, // 1.5s hangover // Hangover counts "processed samples" which are actually frames * channels in current logic?
// Actually total_processed_samples usually counts FRAMES in audio terminology, but here we count elements.
// Let's stick to elements to match existing logic logic.
hangover_samples: (sample_rate as f32 * 1.5 * channel_count as f32) as u64,
waiting_for_speech: wait_for_speech,
ring_buffer: vec![0.0; ring_size], ring_buffer: vec![0.0; ring_size],
ring_pos: 0, ring_pos: 0,
ring_size, ring_size,
@@ -87,30 +100,39 @@ impl AudioProcessor {
} }
pub fn process(&mut self, data: &[f32]) { pub fn process(&mut self, data: &[f32]) {
// 1. Add to Ring Buffer (always, for pre-roll) // 1. Add to Ring Buffer (Interleaved data - Record EVERYTHING)
for &sample in data { for &sample in data {
self.ring_buffer[self.ring_pos] = sample; self.ring_buffer[self.ring_pos] = sample;
self.ring_pos = (self.ring_pos + 1) % self.ring_size; self.ring_pos = (self.ring_pos + 1) % self.ring_size;
} }
// 2. Resample for VAD // 2. Prepare VAD Signal (Mono Mixdown)
// We append new data to input buffer for resampler // FRESH START LOGIC (v0.2.0):
self.resample_input_buffer.extend_from_slice(data); // We expect standard Stereo Input (BlackHole 2ch).
// No magic 3-channel aggregate.
// Process in chunks compatible with resampler let channels = self.channel_count as usize;
// Actually rubato process_into_buffer needs waves of input. let frame_count = data.len() / channels;
// Simplified: SincFixedIn wants a fixed number of input frames? let mut vad_input_chunk = Vec::with_capacity(frame_count);
// Docs: "retrieve result... input buffer must contain needed number of frames"
// SincFixedIn: "input buffer used for resampling... must receive a fixed number of frames" for i in 0..frame_count {
// Wait, SincFixedIn is fixed INPUT size. SincFixedOut is fixed OUTPUT size. let frame_start = i * channels;
// We want to feed whatever we get.
// For simplicity, let's use a simpler resampling strategy or accept rubato's constraints. let mix_sample = if channels >= 2 {
// Rubato SincFixedIn: we must provide `input_frames_next` frames. // Stereo -> Average L + R
(data[frame_start] + data[frame_start + 1]) / 2.0
} else {
// Mono -> Take as is
data[frame_start]
};
vad_input_chunk.push(mix_sample);
}
// 3. Resample for VAD
self.resample_input_buffer.extend_from_slice(&vad_input_chunk);
// Let's defer strict resampling and just use decimation if sample rate is multiple?
// No, user devices vary.
// Handling Resampling properly:
let needed = self.resampler.input_frames_next(); let needed = self.resampler.input_frames_next();
while self.resample_input_buffer.len() >= needed { while self.resample_input_buffer.len() >= needed {
let chunk: Vec<f32> = self.resample_input_buffer.drain(0..needed).collect(); let chunk: Vec<f32> = self.resample_input_buffer.drain(0..needed).collect();
@@ -127,63 +149,87 @@ impl AudioProcessor {
// Update output buffer usage... logic is tricky with drain. // Update output buffer usage... logic is tricky with drain.
} }
// 3. Process VAD // 4. Process VAD
while self.vad_buffer.len() >= self.vad_chunk_size { while self.vad_buffer.len() >= self.vad_chunk_size {
let vad_chunk: Vec<f32> = self.vad_buffer.drain(0..self.vad_chunk_size).collect(); let vad_chunk: Vec<f32> = self.vad_buffer.drain(0..self.vad_chunk_size).collect();
// Run Detection // Run Detection
// Run Detection
let probability = self.vad.predict(vad_chunk.clone()); let probability = self.vad.predict(vad_chunk.clone());
// Calculate RMS for this chunk to use as fallback/hybrid detection // Calculate RMS for this chunk to use as fallback/hybrid detection
let sq_sum: f32 = vad_chunk.iter().map(|x| x * x).sum(); let sq_sum: f32 = vad_chunk.iter().map(|x| x * x).sum();
let rms = (sq_sum / vad_chunk.len() as f32).sqrt(); let rms = (sq_sum / vad_chunk.len() as f32).sqrt();
// Hybrid VAD: Probability > 0.4 OR RMS > 0.005 (approx -46dB) // Hybrid VAD: Probability > 0.8 OR RMS > 0.015
let is_speech = probability > 0.4 || rms > 0.005; // INCREASED THRESHOLDS (v1.9.0):
// Now that routing works, we must filter out system notifications (beeps) and noise floor.
let is_speech = probability > 0.8 || rms > 0.015;
if is_speech { if is_speech {
self.is_speech_active = true; self.is_speech_active = true;
self.last_speech_time = self.total_processed_samples; self.last_speech_time = self.total_processed_samples;
} }
// Emit VAD event periodically (every 500ms) // Emit VAD event periodically (every 500ms is enough for non-diagnostic mode)
if self.last_event_time.elapsed().as_millis() > 500 { if self.last_event_time.elapsed().as_millis() > 500 {
// Calculate simple RMS of the current chunk for debugging
let sq_sum: f32 = vad_chunk.iter().map(|x| x * x).sum();
let rms = (sq_sum / vad_chunk.len() as f32).sqrt();
// Print debug info to stdout (viewable in terminal)
println!("VAD Debug: Prob={:.4}, RMS={:.6}, Speech={}", probability, rms, is_speech);
if let Some(app) = &self.app_handle { if let Some(app) = &self.app_handle {
// Just sending probability is enough for now #[derive(Clone, serde::Serialize)]
#[derive(serde::Serialize, Clone)]
struct VadEvent { struct VadEvent {
probability: f32,
is_speech: bool, is_speech: bool,
probability: f32,
} }
let _ = app.emit("vad-event", VadEvent { probability, is_speech }); let _ = app.emit("vad-event", VadEvent {
probability,
is_speech: self.is_speech_active,
});
} }
self.last_event_time = std::time::Instant::now(); self.last_event_time = std::time::Instant::now();
// IMPORTANT: We reset is_speech_active after emitting,
// so we don't latch it forever if the user stops talking.
// However, the main loop sets it to true if current chunk is speech.
// This logic is a bit of a "latch for X ms".
self.is_speech_active = false;
} }
} }
// 4. Update Hangover and Check Write condition // 4. Update Hangover and Check Write condition
if self.waiting_for_speech {
if self.is_speech_active {
// Trigger Detected!
println!("Auto-Start: Speech detected. Flushing pre-roll...");
self.waiting_for_speech = false;
// Flush Ring Buffer (Orderly: from ring_pos to end, then 0 to ring_pos)
let mut guard = self.writer.lock().unwrap();
let amplitude = i16::MAX as f32;
// Part 1: ring_pos to end
for i in self.ring_pos..self.ring_size {
let sample = self.ring_buffer[i];
guard.write_sample((sample * amplitude) as i16).ok();
}
// Part 2: 0 to ring_pos
for i in 0..self.ring_pos {
let sample = self.ring_buffer[i];
guard.write_sample((sample * amplitude) as i16).ok();
}
// Emit event to notify frontend that "real" recording started
if let Some(app) = &self.app_handle {
let _ = app.emit("auto-recording-triggered", ());
}
} else {
// Still waiting, do not write to file.
return;
}
}
// Standard Recording Logic (Active or Hangover)
let time_since_speech = self.total_processed_samples.saturating_sub(self.last_speech_time); let time_since_speech = self.total_processed_samples.saturating_sub(self.last_speech_time);
if self.is_speech_active || time_since_speech < self.hangover_samples { if self.is_speech_active || time_since_speech < self.hangover_samples {
// We are recording!
// Check if we just started (transition)
// Ideally we dump the ring buffer here if we just switched state.
// Implementing perfect ring buffer dump is complex (need to track state changes better).
// MVP: Just Write Current Data if in state.
// Improvement: If we are in hangover, we just write.
// If we just detected speech (was not speech?), dump ring buffer?
// We'd need to know if we 'wrote' the ring buffer already.
// Simple Logic: just write all incoming data if (Now - LastSpeech < Hangover)
let mut guard = self.writer.lock().unwrap(); let mut guard = self.writer.lock().unwrap();
for &sample in data { for &sample in data {
let amplitude = i16::MAX as f32; let amplitude = i16::MAX as f32;

View File

@@ -1,4 +1,9 @@
use tauri::{AppHandle, Manager, State, Emitter}; use tauri::{
AppHandle, Manager, State, Emitter,
menu::{Menu, MenuItem},
tray::{TrayIconBuilder, TrayIconEvent},
WindowEvent
};
use std::sync::{Arc, Mutex}; use std::sync::{Arc, Mutex};
use std::process::Command; use std::process::Command;
use cpal::traits::{DeviceTrait, HostTrait, StreamTrait}; use cpal::traits::{DeviceTrait, HostTrait, StreamTrait};
@@ -65,7 +70,7 @@ fn get_input_devices() -> Result<Vec<AudioDevice>, String> {
#[tauri::command] #[tauri::command]
fn start_recording(app: AppHandle, state: State<'_, AppState>, device_id: String, save_path: Option<String>, custom_filename: Option<String>) -> Result<(), String> { fn start_recording(app: AppHandle, state: State<'_, AppState>, device_id: String, save_path: Option<String>, custom_filename: Option<String>, wait_for_speech: Option<bool>) -> Result<(), String> {
emit_log(&app, "INFO", &format!("Starting recording on device: {}", device_id)); emit_log(&app, "INFO", &format!("Starting recording on device: {}", device_id));
let host = cpal::default_host(); let host = cpal::default_host();
@@ -77,16 +82,17 @@ fn start_recording(app: AppHandle, state: State<'_, AppState>, device_id: String
.or_else(|| host.default_input_device()) .or_else(|| host.default_input_device())
.ok_or("No input device found")?; .ok_or("No input device found")?;
let config = device.default_input_config().map_err(|e| e.to_string())?; // Select the configuration with the MAXIMUM number of channels
// This is crucial for "Hearbit Audio" (Aggregate) which lists 3 channels but might default to 2.
// VAD requires 16Hz or 8kHz, typically. Silero likes 16k. // We want the raw 3 channels to separate Mic (Ch0) from System (Ch1+2).
// We might need to resample or just check if the device supports it. let supported_configs = device.supported_input_configs().map_err(|e| e.to_string())?;
// For MVP VAD, we will try to stick to standard rates. let config = supported_configs
// Actually, simple energy VAD is easier to start with if Silero is too heavy or requires ONNX runtime. .max_by_key(|c| c.channels())
// Let's check the crate docs or usage first. .map(|c| c.with_max_sample_rate())
// Wait, the user wants to IGNORE music. Energy VAD will fail on music. .ok_or("No supported input configurations found")?;
// voice_activity_detector crate usually uses Silero or similar.
emit_log(&app, "INFO", &format!("Selected Audio Config: {} Channels, {} Hz", config.channels(), config.sample_rate()));
let spec = hound::WavSpec { let spec = hound::WavSpec {
channels: config.channels(), channels: config.channels(),
sample_rate: config.sample_rate(), sample_rate: config.sample_rate(),
@@ -122,7 +128,12 @@ fn start_recording(app: AppHandle, state: State<'_, AppState>, device_id: String
// Initialize AudioProcessor (VAD) // Initialize AudioProcessor (VAD)
// We pass the writer to it. // We pass the writer to it.
let processor = AudioProcessor::new(config.sample_rate(), writer.clone(), app.clone()) let should_wait = wait_for_speech.unwrap_or(false);
if should_wait {
emit_log(&app, "INFO", "Recording started in WAITING mode (buffer-only until speech).");
}
let processor = AudioProcessor::new(config.sample_rate(), config.channels(), writer.clone(), app.clone(), should_wait)
.map_err(|e| format!("Failed to create AudioProcessor: {}", e))?; .map_err(|e| format!("Failed to create AudioProcessor: {}", e))?;
// Wrap processor in Arc<Mutex> so we can share/move it into callback // Wrap processor in Arc<Mutex> so we can share/move it into callback
@@ -560,6 +571,62 @@ async fn summarize_text(app: AppHandle, text: String, api_key: String, product_i
} }
} }
#[derive(serde::Serialize)]
struct AudioMetadata {
duration: f64,
size: u64,
format: String,
}
#[tauri::command]
fn get_audio_metadata(app: AppHandle, file_path: String) -> Result<AudioMetadata, String> {
emit_log(&app, "INFO", &format!("Getting metadata for: {}", file_path));
// Get file size
let metadata = std::fs::metadata(&file_path).map_err(|e| e.to_string())?;
let size = metadata.len();
// Extract format from extension
let path = std::path::Path::new(&file_path);
let format = path.extension()
.and_then(|e| e.to_str())
.unwrap_or("unknown")
.to_string();
// Get duration using ffprobe (requires ffmpeg to be installed)
let duration = match Command::new("ffprobe")
.args([
"-v", "error",
"-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1",
&file_path
])
.output()
{
Ok(output) => {
if output.status.success() {
let duration_str = String::from_utf8_lossy(&output.stdout);
duration_str.trim().parse::<f64>().unwrap_or(0.0)
} else {
emit_log(&app, "WARN", "ffprobe failed, duration = 0");
0.0
}
},
Err(_) => {
emit_log(&app, "WARN", "ffprobe not found, duration = 0");
0.0
}
};
emit_log(&app, "SUCCESS", &format!("Metadata: {}s, {} bytes", duration, size));
Ok(AudioMetadata {
duration,
size,
format,
})
}
#[tauri::command] #[tauri::command]
fn open_audio_midi_setup() -> Result<(), String> { fn open_audio_midi_setup() -> Result<(), String> {
Command::new("open") Command::new("open")
@@ -640,6 +707,49 @@ async fn read_log_file(app: AppHandle) -> Result<String, String> {
#[cfg_attr(mobile, tauri::mobile_entry_point)] #[cfg_attr(mobile, tauri::mobile_entry_point)]
pub fn run() { pub fn run() {
tauri::Builder::default() tauri::Builder::default()
.setup(|app| {
// Setup Tray Icon
let quit_i = MenuItem::with_id(app, "quit", "Quit Hearbit AI", true, None::<&str>).unwrap();
let show_i = MenuItem::with_id(app, "show", "Show Window", true, None::<&str>).unwrap();
let menu = Menu::with_items(app, &[&show_i, &quit_i]).unwrap();
let _tray = TrayIconBuilder::new()
.icon(app.default_window_icon().unwrap().clone())
.menu(&menu)
.show_menu_on_left_click(true)
.on_menu_event(|app, event| {
match event.id.as_ref() {
"quit" => app.exit(0),
"show" => {
if let Some(window) = app.get_webview_window("main") {
let _ = window.show();
let _ = window.set_focus();
}
}
_ => {}
}
})
.on_tray_icon_event(|tray, event| {
if let TrayIconEvent::Click { .. } = event {
let app = tray.app_handle();
if let Some(window) = app.get_webview_window("main") {
let _ = window.show();
let _ = window.set_focus();
}
}
})
.build(app)?;
Ok(())
})
.on_window_event(|window, event| {
if let WindowEvent::CloseRequested { api, .. } = event {
// Prevent window from closing, just hide it
window.hide().unwrap();
api.prevent_close();
}
})
.plugin(tauri_plugin_shell::init())
.plugin(tauri_plugin_log::Builder::default() .plugin(tauri_plugin_log::Builder::default()
.targets([ .targets([
tauri_plugin_log::Target::new(tauri_plugin_log::TargetKind::Stdout), tauri_plugin_log::Target::new(tauri_plugin_log::TargetKind::Stdout),
@@ -670,6 +780,7 @@ pub fn run() {
auth::get_calendar_events, auth::get_calendar_events,
save_text_file, save_text_file,
read_log_file, read_log_file,
get_audio_metadata,
email::send_smtp_email email::send_smtp_email
]) ])
.run(tauri::generate_context!()) .run(tauri::generate_context!())

View File

@@ -7,6 +7,7 @@ import TranscriptionView from "./components/TranscriptionView";
import Tabs from "./components/Tabs"; import Tabs from "./components/Tabs";
import MeetingsView from "./components/MeetingsView"; import MeetingsView from "./components/MeetingsView";
import HistoryView from "./components/HistoryView"; import HistoryView from "./components/HistoryView";
import Import from "./components/Import";
import ToastContainer, { ToastMessage, ToastType } from "./components/ui/Toast"; import ToastContainer, { ToastMessage, ToastType } from "./components/ui/Toast";
export interface PromptTemplate { export interface PromptTemplate {
@@ -24,8 +25,8 @@ export interface EmailTemplate {
} }
function App() { function App() {
const [view, setView] = useState<'recorder' | 'settings' | 'transcription' | 'meetings' | 'history'>('recorder'); const [view, setView] = useState<'recorder' | 'settings' | 'transcription' | 'meetings' | 'history' | 'import'>('recorder');
const [lastTab, setLastTab] = useState<'recorder' | 'transcription' | 'meetings' | 'history'>('recorder'); const [lastTab, setLastTab] = useState<'recorder' | 'transcription' | 'meetings' | 'history' | 'import'>('recorder');
// Auto-start recording state to handle "Join & Record" transition // Auto-start recording state to handle "Join & Record" transition
@@ -311,6 +312,14 @@ Thanks!`
} }
}; };
const handleRenameHistory = (id: string, newSubject: string) => {
const newHistory = history.map(item =>
item.id === id ? { ...item, subject: newSubject } : item
);
setHistory(newHistory);
localStorage.setItem('infomaniak_history', JSON.stringify(newHistory));
};
const handleDeleteHistory = (id: string) => { const handleDeleteHistory = (id: string) => {
const newHistory = history.filter(item => item.id !== id); const newHistory = history.filter(item => item.id !== id);
setHistory(newHistory); setHistory(newHistory);
@@ -343,7 +352,7 @@ Thanks!`
</div> </div>
<Tabs <Tabs
currentTab={view as 'recorder' | 'transcription' | 'meetings' | 'history'} currentTab={view as 'recorder' | 'transcription' | 'meetings' | 'history' | 'import'}
onTabChange={(t) => setView(t)} onTabChange={(t) => setView(t)}
/> />
</div> </div>
@@ -410,6 +419,7 @@ Thanks!`
history={history} history={history}
onLoad={handleLoadHistory} onLoad={handleLoadHistory}
onDelete={handleDeleteHistory} onDelete={handleDeleteHistory}
onRename={handleRenameHistory}
/> />
)} )}
@@ -429,6 +439,23 @@ Thanks!`
/> />
)} )}
{view === 'import' && (
<Import
apiKey={apiKey}
productId={productId}
prompts={prompts}
selectedModel={selectedModel}
onSaveToHistory={handleSaveToHistory}
onComplete={() => setView('transcription')}
addToast={addToast}
setTranscription={setTranscription}
setSummary={setSummary}
/>
)}
{view === 'settings' && ( {view === 'settings' && (

View File

@@ -1,4 +1,5 @@
import { FileText, Trash2, Calendar } from 'lucide-react'; import { FileText, Trash2, Calendar, Pencil, Check, X } from 'lucide-react';
import { useState } from 'react';
interface HistoryItem { interface HistoryItem {
id: string; id: string;
@@ -13,9 +14,30 @@ interface HistoryViewProps {
history: HistoryItem[]; history: HistoryItem[];
onLoad: (item: HistoryItem) => void; onLoad: (item: HistoryItem) => void;
onDelete: (id: string) => void; onDelete: (id: string) => void;
onRename: (id: string, newSubject: string) => void;
} }
export default function HistoryView({ history, onLoad, onDelete }: HistoryViewProps) { export default function HistoryView({ history, onLoad, onDelete, onRename }: HistoryViewProps) {
const [editingId, setEditingId] = useState<string | null>(null);
const [editValue, setEditValue] = useState("");
const startEditing = (item: HistoryItem) => {
setEditingId(item.id);
setEditValue(item.subject || "Untitled Recording");
};
const saveEdit = () => {
if (editingId && editValue.trim()) {
onRename(editingId, editValue.trim());
setEditingId(null);
}
};
const cancelEdit = () => {
setEditingId(null);
setEditValue("");
};
return ( return (
<div className="flex flex-col w-full h-full bg-background p-6"> <div className="flex flex-col w-full h-full bg-background p-6">
<h1 className="text-2xl font-bold mb-6 flex items-center gap-2"> <h1 className="text-2xl font-bold mb-6 flex items-center gap-2">
@@ -33,26 +55,58 @@ export default function HistoryView({ history, onLoad, onDelete }: HistoryViewPr
{history.map(item => ( {history.map(item => (
<div key={item.id} className="bg-card border border-border rounded-xl p-4 hover:shadow-md transition-all group"> <div key={item.id} className="bg-card border border-border rounded-xl p-4 hover:shadow-md transition-all group">
<div className="flex justify-between items-start"> <div className="flex justify-between items-start">
<div <div className="flex-1">
className="flex-1 cursor-pointer" {editingId === item.id ? (
onClick={() => onLoad(item)} <div className="flex items-center gap-2 mb-2" onClick={(e) => e.stopPropagation()}>
> <input
<h3 className="text-lg font-semibold group-hover:text-primary transition-colors mb-1"> autoFocus
{item.subject || "Untitled Recording"} type="text"
</h3> className="flex-1 bg-background border border-input rounded px-2 py-1 text-sm font-semibold focus:outline-none focus:ring-1 focus:ring-ring"
<div className="flex items-center gap-2 text-xs text-muted-foreground mb-2"> value={editValue}
onChange={(e) => setEditValue(e.target.value)}
onKeyDown={(e) => {
if (e.key === 'Enter') saveEdit();
if (e.key === 'Escape') cancelEdit();
}}
/>
<button onClick={saveEdit} className="p-1 text-green-500 hover:bg-green-500/10 rounded">
<Check size={16} />
</button>
<button onClick={cancelEdit} className="p-1 text-muted-foreground hover:bg-muted rounded">
<X size={16} />
</button>
</div>
) : (
<div
className="cursor-pointer"
onClick={() => onLoad(item)}
>
<h3 className="text-lg font-semibold group-hover:text-primary transition-colors mb-1 flex items-center gap-2">
{item.subject || "Untitled Recording"}
<button
onClick={(e) => { e.stopPropagation(); startEditing(item); }}
className="opacity-0 group-hover:opacity-100 text-muted-foreground hover:text-foreground p-1 rounded hover:bg-muted transition-all"
title="Rename"
>
<Pencil size={14} />
</button>
</h3>
</div>
)}
<div className="flex items-center gap-2 text-xs text-muted-foreground mb-2" onClick={() => !editingId && onLoad(item)}>
<Calendar size={12} /> <Calendar size={12} />
{item.date} {item.date}
{item.filename && <span className="bg-secondary px-1.5 py-0.5 rounded text-[10px] font-mono">{item.filename}</span>} {item.filename && <span className="bg-secondary px-1.5 py-0.5 rounded text-[10px] font-mono">{item.filename}</span>}
</div> </div>
<p className="text-sm text-foreground/70 line-clamp-2"> <p className="text-sm text-foreground/70 line-clamp-2 cursor-pointer" onClick={() => !editingId && onLoad(item)}>
{item.summary ? item.summary.substring(0, 150) + "..." : "No summary available."} {item.summary ? item.summary.substring(0, 150) + "..." : "No summary available."}
</p> </p>
</div> </div>
<button <button
onClick={(e) => { e.stopPropagation(); onDelete(item.id); }} onClick={(e) => { e.stopPropagation(); onDelete(item.id); }}
className="text-muted-foreground hover:text-destructive p-2 rounded-lg hover:bg-destructive/10 transition-colors opacity-0 group-hover:opacity-100" className="text-muted-foreground hover:text-destructive p-2 rounded-lg hover:bg-destructive/10 transition-colors opacity-0 group-hover:opacity-100 ml-2"
title="Delete" title="Delete"
> >
<Trash2 size={18} /> <Trash2 size={18} />

385
src/components/Import.tsx Normal file
View File

@@ -0,0 +1,385 @@
import React, { useState, useCallback } from 'react';
import { Upload, FileAudio, X, Check, Loader2 } from 'lucide-react';
import { invoke } from "@tauri-apps/api/core";
import { open } from '@tauri-apps/plugin-dialog';
import logo from '../assets/logo.png';
interface PromptTemplate {
id: string;
name: string;
content: string;
keywords?: string[];
}
interface ImportProps {
apiKey: string;
productId: string;
prompts: PromptTemplate[];
selectedModel: string;
onSaveToHistory: (transcription: string, summary: string) => void;
onComplete: () => void; // Navigate to Transcription view
addToast: (msg: string, type: 'success' | 'error' | 'info', duration?: number) => void;
setTranscription: (text: string) => void;
setSummary: (text: string) => void;
}
interface AudioMetadata {
duration: number;
size: number;
format: string;
}
type ProcessingStage = 'idle' | 'validating' | 'transcribing' | 'summarizing' | 'complete';
const SUPPORTED_FORMATS = ['mp3', 'mp4', 'm4a', 'wav', 'flac', 'ogg', 'aac', 'wma'];
const Import: React.FC<ImportProps> = ({
apiKey,
productId,
prompts,
selectedModel,
onSaveToHistory,
onComplete,
addToast,
setTranscription,
setSummary
}) => {
const [isDragging, setIsDragging] = useState(false);
const [selectedFile, setSelectedFile] = useState<string | null>(null);
const [metadata, setMetadata] = useState<AudioMetadata | null>(null);
const [meetingTitle, setMeetingTitle] = useState('');
const [stage, setStage] = useState<ProcessingStage>('idle');
const [selectedPromptId, setSelectedPromptId] = useState<string>('');
// Set default prompt
React.useEffect(() => {
if (prompts.length > 0 && !selectedPromptId) {
setSelectedPromptId(prompts[0].id);
}
}, [prompts, selectedPromptId]);
const validateFile = (filePath: string): boolean => {
const extension = filePath.split('.').pop()?.toLowerCase();
if (!extension || !SUPPORTED_FORMATS.includes(extension)) {
addToast(`Unsupported format. Supported: ${SUPPORTED_FORMATS.join(', ').toUpperCase()}`, 'error', 5000);
return false;
}
return true;
};
const extractFilename = (path: string): string => {
const parts = path.split(/[/\\]/);
const filename = parts[parts.length - 1];
return filename.replace(/\.[^/.]+$/, ''); // Remove extension
};
const formatDuration = (seconds: number): string => {
const mins = Math.floor(seconds / 60);
const secs = Math.floor(seconds % 60);
return `${mins}:${secs.toString().padStart(2, '0')}`;
};
const formatSize = (bytes: number): string => {
if (bytes < 1024 * 1024) {
return `${(bytes / 1024).toFixed(1)} KB`;
}
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
};
const handleFileSelect = async (filePath: string) => {
if (!validateFile(filePath)) return;
setStage('validating');
setSelectedFile(filePath);
setMeetingTitle(extractFilename(filePath));
try {
// Get metadata (we'll need to implement this in Rust backend)
const meta = await invoke<AudioMetadata>('get_audio_metadata', { filePath });
setMetadata(meta);
setStage('idle');
addToast('File loaded successfully', 'success', 2000);
} catch (e) {
console.error('Metadata error:', e);
// Even if metadata fails, allow processing
setMetadata(null);
setStage('idle');
}
};
const handleDrop = useCallback((e: React.DragEvent) => {
e.preventDefault();
setIsDragging(false);
const files = Array.from(e.dataTransfer.files);
if (files.length > 0) {
// @ts-ignore - Tauri provides path on File object
const filePath = files[0].path;
if (filePath) {
handleFileSelect(filePath);
} else {
addToast('Failed to get file path', 'error');
}
}
}, []);
const handleManualSelect = async () => {
try {
const selected = await open({
multiple: false,
filters: [{
name: 'Audio/Video',
extensions: SUPPORTED_FORMATS
}]
});
if (selected && typeof selected === 'string') {
handleFileSelect(selected);
}
} catch (e) {
console.error('File picker error:', e);
addToast('Failed to open file picker', 'error');
}
};
const handleProcess = async () => {
if (!selectedFile) return;
if (!apiKey || !productId) {
addToast('Please configure API key in Settings', 'error');
return;
}
try {
setStage('transcribing');
const transText = await invoke<string>('transcribe_audio', {
filePath: selectedFile,
apiKey,
productId
});
setTranscription(transText);
if (!transText || transText.trim().length === 0) {
addToast('No speech detected in file', 'error');
setStage('idle');
return;
}
// Smart prompt selection (copied from Recorder.tsx)
let activePrompt = prompts.find(p => p.id === selectedPromptId);
const lowerText = transText.toLowerCase();
let bestMatchId = selectedPromptId;
let maxMatches = 0;
for (const p of prompts) {
if (!p.keywords) continue;
let matches = 0;
for (const kw of p.keywords) {
if (lowerText.includes(kw.toLowerCase())) {
matches++;
}
}
if (matches > maxMatches) {
maxMatches = matches;
bestMatchId = p.id;
}
}
if (bestMatchId !== selectedPromptId) {
const newPrompt = prompts.find(p => p.id === bestMatchId);
if (newPrompt) {
addToast(`Smart Select: Switched to "${newPrompt.name}"`, 'info', 3000);
activePrompt = newPrompt;
}
}
const promptContent = activePrompt ? activePrompt.content : "Summarize this.";
setStage('summarizing');
const sumText = await invoke<string>('summarize_text', {
text: transText,
apiKey,
productId,
prompt: promptContent,
model: selectedModel
});
setSummary(sumText);
// Save to history
onSaveToHistory(transText, sumText);
setStage('complete');
addToast('Import complete!', 'success', 3000);
// Navigate to Transcription view
setTimeout(() => {
onComplete();
}, 1000);
} catch (e) {
console.error('Processing error:', e);
addToast(`Error: ${e}`, 'error');
setStage('idle');
}
};
const handleReset = () => {
setSelectedFile(null);
setMetadata(null);
setMeetingTitle('');
setStage('idle');
};
const getStageInfo = () => {
switch (stage) {
case 'validating': return { icon: Loader2, text: 'Validating file...', color: 'text-blue-500' };
case 'transcribing': return { icon: Loader2, text: 'Transcribing audio...', color: 'text-purple-500' };
case 'summarizing': return { icon: Loader2, text: 'Generating summary...', color: 'text-green-500' };
case 'complete': return { icon: Check, text: 'Complete!', color: 'text-green-500' };
default: return null;
}
};
const stageInfo = getStageInfo();
const isProcessing = stage !== 'idle' && stage !== 'complete';
return (
<div className="flex flex-col w-full h-full bg-background relative">
{/* Header */}
<div className="w-full flex justify-center items-center p-4 shrink-0">
<img src={logo} alt="Logo" className="h-10 object-contain" />
</div>
{/* Main Content */}
<div className="flex-1 overflow-y-auto px-6 pb-6 flex flex-col items-center">
<h1 className="text-xl font-bold mb-2 text-foreground">Import Audio File</h1>
<p className="text-muted-foreground mb-6 text-center text-sm">
Upload a recording for transcription and summarization
</p>
{/* Drag & Drop Zone */}
<div
onDragOver={(e) => { e.preventDefault(); setIsDragging(true); }}
onDragLeave={() => setIsDragging(false)}
onDrop={handleDrop}
className={`w-full max-w-md border-2 border-dashed rounded-lg p-8 mb-6 transition-all ${isDragging
? 'border-primary bg-primary/5 scale-105'
: selectedFile
? 'border-green-500 bg-green-500/5'
: 'border-border bg-secondary/30 hover:border-primary/50'
}`}
>
<div className="flex flex-col items-center justify-center gap-4">
{selectedFile ? (
<>
<FileAudio size={48} className="text-green-500" />
<div className="text-center">
<p className="font-semibold text-foreground">{meetingTitle}</p>
{metadata && (
<p className="text-xs text-muted-foreground mt-1">
{formatDuration(metadata.duration)} {formatSize(metadata.size)} {metadata.format.toUpperCase()}
</p>
)}
</div>
<button
onClick={handleReset}
className="text-xs text-muted-foreground hover:text-foreground flex items-center gap-1"
>
<X size={14} /> Change file
</button>
</>
) : (
<>
<Upload size={48} className="text-muted-foreground" />
<div className="text-center">
<p className="font-semibold text-foreground">Drag & Drop audio file</p>
<p className="text-xs text-muted-foreground mt-1">
or click below to browse
</p>
</div>
<button
onClick={handleManualSelect}
disabled={isProcessing}
className="px-4 py-2 bg-primary text-primary-foreground rounded-lg hover:bg-primary/90 disabled:opacity-50 text-sm font-semibold transition-all"
>
Select File
</button>
<p className="text-xs text-muted-foreground">
Supported: MP3, MP4, WAV, M4A, FLAC, OGG, AAC, WMA
</p>
</>
)}
</div>
</div>
{/* Configuration Section */}
{selectedFile && (
<div className="w-full max-w-md space-y-4">
{/* Meeting Title */}
<div>
<label className="text-xs font-semibold text-muted-foreground uppercase tracking-wider block mb-1">
Meeting Title
</label>
<input
type="text"
value={meetingTitle}
onChange={(e) => setMeetingTitle(e.target.value)}
disabled={isProcessing}
className="w-full p-2 text-sm bg-secondary rounded border border-border outline-none focus:ring-2 focus:ring-primary disabled:opacity-50"
placeholder="Enter meeting title..."
/>
</div>
{/* AI Template */}
<div>
<label className="text-xs font-semibold text-muted-foreground uppercase tracking-wider block mb-1">
AI Template
</label>
<select
value={selectedPromptId}
onChange={(e) => setSelectedPromptId(e.target.value)}
disabled={isProcessing || prompts.length === 0}
className="w-full p-2 text-sm bg-secondary rounded border border-border outline-none focus:ring-2 focus:ring-primary disabled:opacity-50"
>
{prompts.map(p => (
<option key={p.id} value={p.id}>{p.name}</option>
))}
{prompts.length === 0 && <option value="">No templates</option>}
</select>
</div>
{/* Process Button */}
<button
onClick={handleProcess}
disabled={!selectedFile || isProcessing || !apiKey}
className="w-full py-3 text-base font-semibold bg-primary text-primary-foreground rounded-lg hover:bg-primary/90 disabled:opacity-50 disabled:cursor-not-allowed transition-all shadow-md hover:shadow-lg flex items-center justify-center gap-2"
>
{isProcessing ? (
<>
<Loader2 size={20} className="animate-spin" />
Processing...
</>
) : (
<>
<Upload size={20} />
Transcribe & Summarize
</>
)}
</button>
{/* Progress Indicator */}
{stageInfo && (
<div className="flex items-center justify-center gap-2 p-3 bg-secondary/50 rounded-lg border border-border">
<stageInfo.icon size={16} className={`${stageInfo.color} ${stage !== 'complete' ? 'animate-spin' : ''}`} />
<span className={`text-sm font-medium ${stageInfo.color}`}>
{stageInfo.text}
</span>
</div>
)}
</div>
)}
</div>
</div>
);
};
export default Import;

View File

@@ -1,4 +1,4 @@
import React, { useState, useEffect } from 'react'; import React, { useState, useEffect, useRef } from 'react';
import { Mic, Square, Users, Headphones } from 'lucide-react'; import { Mic, Square, Users, Headphones } from 'lucide-react';
import { invoke } from "@tauri-apps/api/core"; import { invoke } from "@tauri-apps/api/core";
import { listen } from '@tauri-apps/api/event'; import { listen } from '@tauri-apps/api/event';
@@ -58,6 +58,10 @@ const Recorder: React.FC<RecorderProps> = ({
const [isRecording, setIsRecording] = useState(false); const [isRecording, setIsRecording] = useState(false);
const [isStopping, setIsStopping] = useState(false); // New lock state const [isStopping, setIsStopping] = useState(false); // New lock state
const [isPaused, setIsPaused] = useState(false); const [isPaused, setIsPaused] = useState(false);
const [isWaiting, setIsWaiting] = useState(false); // New state for Auto-Start
const [autoStartEnabled, setAutoStartEnabled] = useState(false); // Toggle state
const [status, setStatus] = useState<string>('Ready to record'); const [status, setStatus] = useState<string>('Ready to record');
const [selectedDevice, setSelectedDevice] = useState<string>(''); const [selectedDevice, setSelectedDevice] = useState<string>('');
const [selectedPromptId, setSelectedPromptId] = useState<string>(''); const [selectedPromptId, setSelectedPromptId] = useState<string>('');
@@ -149,19 +153,33 @@ const Recorder: React.FC<RecorderProps> = ({
const startRecording = async (deviceIdOverride?: string) => { const startRecording = async (deviceIdOverride?: string) => {
try { try {
setStatus('Starting...');
setStatus('Starting...'); setStatus('Starting...');
// Check override or state // Check override or state
const targetDeviceId = deviceIdOverride || selectedDevice; const targetDeviceId = deviceIdOverride || selectedDevice;
// Pass customFilename (camelCase key maps to snake_case in Rust automatically or we need to check Tauri mapping, usually it maps camel to camel? Rust expects snake. Let's use snake_case in invoke args to be safe) // Pass customFilename (camelCase key maps to snake_case in Rust automatically or we need to check Tauri mapping, usually it maps camel to camel? Rust expects snake. Let's use snake_case in invoke args to be safe)
await invoke('start_recording', { deviceId: targetDeviceId, savePath: savePath || null, customFilename: props.recordingSubject || null }); await invoke('start_recording', {
deviceId: targetDeviceId,
savePath: savePath || null,
customFilename: props.recordingSubject || null,
waitForSpeech: autoStartEnabled // Pass the toggle state
});
setIsRecording(true); setIsRecording(true);
setIsPaused(false); setIsPaused(false);
setTranscription(''); setTranscription('');
setSummary(''); setSummary('');
setStatus('Recording...');
addToast('Recording started', 'success', 2000); if (autoStartEnabled) {
setIsWaiting(true);
setStatus('Waiting for audio...');
addToast('Standing by for audio...', 'info', 3000);
} else {
setIsWaiting(false);
setStatus('Recording...');
addToast('Recording started', 'success', 2000);
}
} catch (e) { } catch (e) {
console.error(e); console.error(e);
setStatus(`Error: ${e}`); setStatus(`Error: ${e}`);
@@ -170,43 +188,83 @@ const Recorder: React.FC<RecorderProps> = ({
} }
}; };
// VAD & Auto-Stop Logic // Refs for interval access to avoid dependency cycles
useEffect(() => { const lastSpeechTimeRef = useRef<number>(Date.now());
let unlisten: () => void; const isStoppingRef = useRef(false);
const setupListener = async () => { // Update refs when state changes
unlisten = await listen<{ is_speech: boolean, probability: number }>('vad-event', (event) => { useEffect(() => {
lastSpeechTimeRef.current = lastSpeechTime;
}, [lastSpeechTime]);
useEffect(() => {
isStoppingRef.current = isStopping;
}, [isStopping]);
// 1. Event Listeners Effect (Run ONCE when recording starts)
useEffect(() => {
let unlistenVAD: () => void;
let unlistenTrigger: () => void;
const setupListeners = async () => {
if (!isRecording) return;
console.log("Setting up VAD listeners...");
// VAD Event Listener
unlistenVAD = await listen<{ is_speech: boolean, probability: number }>('vad-event', (event) => {
if (event.payload.is_speech) { if (event.payload.is_speech) {
setLastSpeechTime(Date.now()); setLastSpeechTime(Date.now());
lastSpeechTimeRef.current = Date.now(); // Update ref immediately
setSilenceDuration(0); setSilenceDuration(0);
} }
}); });
// Auto-Start Trigger Listener
unlistenTrigger = await listen('auto-recording-triggered', () => {
console.log("Auto-Start Triggered from Backend!");
// Only trigger if we are actually waiting
setIsWaiting((prev) => {
if (prev) {
addToast("Audio detected! Recording started.", 'success', 4000);
return false;
}
return prev;
});
setStatus('Recording (Auto-Started)...');
setLastSpeechTime(Date.now());
});
}; };
if (isRecording && !isPaused) { if (isRecording) {
setupListener(); setupListeners();
setLastSpeechTime(Date.now()); // Reset on start
} }
const interval = setInterval(() => { return () => {
if (isRecording && !isPaused) { // Cleanup listeners
const diff = (Date.now() - lastSpeechTime) / 1000; if (unlistenVAD) unlistenVAD();
setSilenceDuration(diff); if (unlistenTrigger) unlistenTrigger();
};
}, [isRecording, addToast]); // Dependencies for listener setup
// Auto-stop after 30 seconds of silence // Auto-Stop Interval Effect
if (diff > 30 && !isStopping) { // Check lock useEffect(() => {
console.log("Auto-stopping due to silence"); if (!isRecording || isPaused || isWaiting) return;
addToast("Auto-stopping (Silence detected)", "info", 3000);
stopRecording(); const interval = setInterval(() => {
} const now = Date.now();
const diff = (now - lastSpeechTimeRef.current) / 1000;
setSilenceDuration(diff);
// Auto-stop after 30 seconds of silence
if (diff > 30 && !isStoppingRef.current) {
console.log("Auto-stopping due to silence");
addToast("Auto-stopping (Silence detected)", "info", 3000);
stopRecording();
} }
}, 1000); }, 1000);
return () => { return () => clearInterval(interval);
if (unlisten) unlisten(); }, [isRecording, isPaused, isWaiting, addToast]); // Dependencies for interval lifecycle
clearInterval(interval);
};
}, [isRecording, isPaused, lastSpeechTime]);
// Handle Auto Start Prop // Handle Auto Start Prop
useEffect(() => { useEffect(() => {
@@ -273,6 +331,7 @@ const Recorder: React.FC<RecorderProps> = ({
try { try {
setIsRecording(false); setIsRecording(false);
setIsPaused(false); setIsPaused(false);
setIsWaiting(false); // Reset waiting state
setStatus('Processing...'); setStatus('Processing...');
const filePath = await invoke<string>('stop_recording'); const filePath = await invoke<string>('stop_recording');
@@ -357,6 +416,8 @@ const Recorder: React.FC<RecorderProps> = ({
} }
}; };
return ( return (
<div className="flex flex-col w-full h-full bg-background relative"> <div className="flex flex-col w-full h-full bg-background relative">
{/* Fixed Header - Reduced padding */} {/* Fixed Header - Reduced padding */}
@@ -367,9 +428,9 @@ const Recorder: React.FC<RecorderProps> = ({
{/* Scrollable Content - Reduced spacing */} {/* Scrollable Content - Reduced spacing */}
<div className="flex-1 overflow-y-auto px-6 pb-6 flex flex-col items-center"> <div className="flex-1 overflow-y-auto px-6 pb-6 flex flex-col items-center">
<div className="mb-4 relative shrink-0"> <div className="mb-4 relative shrink-0">
<div className={`w-24 h-24 rounded-full flex items-center justify-center transition-all duration-300 ${isRecording ? (isPaused ? 'bg-yellow-500/10' : 'bg-red-500/10 animate-pulse') : 'bg-secondary'}`}> <div className={`w-24 h-24 rounded-full flex items-center justify-center transition-all duration-300 ${isRecording ? (isWaiting ? 'bg-blue-500/20' : isPaused ? 'bg-yellow-500/10' : 'bg-red-500/10 animate-pulse') : 'bg-secondary'}`}>
{isRecording ? ( {isRecording ? (
<div className={`w-16 h-16 rounded-full flex items-center justify-center shadow-[0_0_20px_rgba(239,68,68,0.5)] ${isPaused ? 'bg-yellow-500' : 'bg-red-500'}`}> <div className={`w-16 h-16 rounded-full flex items-center justify-center shadow-[0_0_20px_rgba(239,68,68,0.5)] ${isWaiting ? 'bg-blue-500 animate-pulse' : isPaused ? 'bg-yellow-500' : 'bg-red-500'}`}>
<Mic size={32} className="text-white animate-bounce" /> <Mic size={32} className="text-white animate-bounce" />
</div> </div>
) : ( ) : (
@@ -381,12 +442,12 @@ const Recorder: React.FC<RecorderProps> = ({
</div> </div>
<h1 className="text-xl font-bold mb-1 text-foreground"> <h1 className="text-xl font-bold mb-1 text-foreground">
{isRecording ? (isPaused ? 'Paused' : 'Listening...') : 'Ready to Record'} {isRecording ? (isWaiting ? 'Waiting for Audio...' : isPaused ? 'Paused' : 'Listening...') : 'Ready to Record'}
</h1> </h1>
<p className="text-muted-foreground mb-4 text-center text-xs h-5"> <p className="text-muted-foreground mb-4 text-center text-xs h-5">
{status} {status}
{isRecording && !isPaused && silenceDuration > 10 && ( {isRecording && !isPaused && !isWaiting && silenceDuration > 10 && (
<span className="block text-xs text-yellow-500 mt-0.5 opacity-80"> <span className="block text-xs text-yellow-500 mt-0.5 opacity-80">
Silence detected: {Math.floor(silenceDuration)}s Silence detected: {Math.floor(silenceDuration)}s
</span> </span>
@@ -395,30 +456,46 @@ const Recorder: React.FC<RecorderProps> = ({
<div className="w-full max-w-sm space-y-3 mb-4 shrink-0"> <div className="w-full max-w-sm space-y-3 mb-4 shrink-0">
{!isRecording ? ( {!isRecording ? (
<button <>
onClick={() => startRecording()} <button
disabled={!apiKey || !productId} onClick={() => startRecording()}
className="w-full py-3 text-base font-semibold bg-primary text-primary-foreground rounded-lg hover:bg-primary/90 disabled:opacity-50 disabled:cursor-not-allowed transition-all shadow-md hover:shadow-lg" disabled={!apiKey || !productId}
> className="w-full py-3 text-base font-semibold bg-primary text-primary-foreground rounded-lg hover:bg-primary/90 disabled:opacity-50 disabled:cursor-not-allowed transition-all shadow-md hover:shadow-lg"
{!apiKey ? 'Configure API Key First' : 'Start Recording'} >
</button> {!apiKey ? 'Configure API Key First' : (autoStartEnabled ? 'Standby (Auto-Start)' : 'Start Recording')}
</button>
<div className="flex items-center justify-center gap-2 mt-2">
<label className="flex items-center gap-2 cursor-pointer select-none">
<input
type="checkbox"
checked={autoStartEnabled}
onChange={(e) => setAutoStartEnabled(e.target.checked)}
className="w-4 h-4 accent-primary rounded cursor-pointer"
/>
<span className="text-xs text-muted-foreground font-medium">Auto-start when audio detected</span>
</label>
</div>
</>
) : ( ) : (
<div className="flex gap-2 w-full"> <div className="flex gap-2 w-full">
<button {/* In Waiting mode, we can only Stop (Cancel) */}
onClick={togglePause} {!isWaiting && (
className={`flex-1 py-4 text-lg font-semibold rounded-lg transition-all shadow-md hover:shadow-lg flex items-center justify-center gap-2 ${isPaused <button
? 'bg-blue-600 text-white hover:bg-blue-700' onClick={togglePause}
: 'bg-yellow-500 text-white hover:bg-yellow-600' className={`flex-1 py-4 text-lg font-semibold rounded-lg transition-all shadow-md hover:shadow-lg flex items-center justify-center gap-2 ${isPaused
}`} ? 'bg-blue-600 text-white hover:bg-blue-700'
> : 'bg-yellow-500 text-white hover:bg-yellow-600'
{isPaused ? 'Resume' : 'Pause'} }`}
</button> >
{isPaused ? 'Resume' : 'Pause'}
</button>
)}
<button <button
onClick={stopRecording} onClick={stopRecording}
className="flex-1 py-4 text-lg font-semibold bg-destructive text-destructive-foreground rounded-lg hover:bg-destructive/90 transition-all shadow-md hover:shadow-lg flex items-center justify-center gap-2" className="flex-1 py-4 text-lg font-semibold bg-destructive text-destructive-foreground rounded-lg hover:bg-destructive/90 transition-all shadow-md hover:shadow-lg flex items-center justify-center gap-2"
> >
<Square size={20} fill="currentColor" /> <Square size={20} fill="currentColor" />
Stop {isWaiting ? 'Cancel' : 'Stop'}
</button> </button>
</div> </div>
)} )}

View File

@@ -1,9 +1,9 @@
import React from 'react'; import React from 'react';
import { Mic, FileText, Calendar } from 'lucide-react'; import { Mic, FileText, Calendar, Upload } from 'lucide-react';
interface TabsProps { interface TabsProps {
currentTab: 'recorder' | 'transcription' | 'settings' | 'meetings' | 'history'; currentTab: 'recorder' | 'transcription' | 'settings' | 'meetings' | 'history' | 'import';
onTabChange: (tab: 'recorder' | 'transcription' | 'settings' | 'meetings' | 'history') => void; onTabChange: (tab: 'recorder' | 'transcription' | 'settings' | 'meetings' | 'history' | 'import') => void;
} }
const Tabs: React.FC<TabsProps> = ({ currentTab, onTabChange }) => { const Tabs: React.FC<TabsProps> = ({ currentTab, onTabChange }) => {
@@ -16,6 +16,13 @@ const Tabs: React.FC<TabsProps> = ({ currentTab, onTabChange }) => {
<Mic size={16} /> <Mic size={16} />
Recording Recording
</button> </button>
<button
onClick={() => onTabChange('import')}
className={`flex items-center gap-2 px-4 py-2 rounded-lg text-sm font-medium transition-colors ${currentTab === 'import' ? 'bg-secondary text-foreground' : 'text-muted-foreground hover:text-foreground hover:bg-secondary/50'}`}
>
<Upload size={16} />
Import
</button>
<button <button
onClick={() => onTabChange('transcription')} onClick={() => onTabChange('transcription')}
className={`flex items-center gap-2 px-4 py-2 rounded-lg text-sm font-medium transition-colors ${currentTab === 'transcription' ? 'bg-secondary text-foreground' : 'text-muted-foreground hover:text-foreground hover:bg-secondary/50'}`} className={`flex items-center gap-2 px-4 py-2 rounded-lg text-sm font-medium transition-colors ${currentTab === 'transcription' ? 'bg-secondary text-foreground' : 'text-muted-foreground hover:text-foreground hover:bg-secondary/50'}`}