VOCV2 FBase Calculator Logic

DomainsDOL EnglishUX1.307 words7 min read

active

1. Mục tiêu

Đây là lõi logic tính F_base. Hệ thống kết hợp nhiều hệ số difficulty để tạo ra F_base cuối cùng. AI không trả F_base trực tiếp, mà chấm các score thành phần để engine tự tính.

Hướng chốt hiện tại:

AI chấm toàn bộ các score factor
nhưng AI nhận thêm 2 field từ data entry làm context đầu vào:
- Loại từ
- Level học thuật
engine vẫn là bên tính công thức cuối cùng

File này chỉ giữ phần dev cần nắm:

cần những yếu tố nào để tính
AI nhận input gì
AI trả về gì
công thức tính ra sao
khi apply thì phải sync thêm state nào

2. Luồng tính tổng quát

term + selected_part_of_speech + selected_academic_level
-> xác định word hay phrase
-> AI chấm các factor score
-> engine tính F_base
-> apply F_base vào word state
-> recalc d0 và các giá trị phụ thuộc

3. Nguyên tắc chốt

Loại từ và Level học thuật là input có cấu trúc từ data entry.
AI phải dùng 2 field này làm context khi chấm điểm.
AI không nên tự bỏ qua 2 field đó để relabel item theo cảm tính.
Engine vẫn là nguồn sự thật cho công thức F_base.
Tên key có thể thay đổi theo implementation, miễn là giữ đúng nghĩa logic.

4. Các yếu tố tạo ra `F_base`

4.1 Word mode

Các yếu tố đi vào F_base:

len_score: điểm độ dài term
pos_score: điểm từ loại
level_score: điểm độ nâng cao / học thuật
abs: độ trừu tượng
cog: độ quen thuộc với người học VN
poly: độ đa nghĩa

Trong đó:

data entry cung cấp: selected_part_of_speech, selected_academic_level
AI trả về: pos_score, level_score, abs, cog, poly
engine tự lấy: len_score

4.2 Phrase mode

Các yếu tố đi vào F_base:

wordCount_score: điểm theo số từ trong phrase
lexical_count: số từ khó / học thuật / obscure
idiom: độ thành ngữ
syntax: độ phức tạp cú pháp
cog: độ quen thuộc với người học VN
poly: độ đa nghĩa theo ngữ cảnh

Trong đó:

data entry vẫn cung cấp: selected_part_of_speech, selected_academic_level
AI trả về: lexical_count, idiom, syntax, cog, poly
engine tự lấy: wordCount_score

Lưu ý:

với phrase mode, selected_part_of_speech chủ yếu giúp xác nhận item này đi vào luồng phrase
selected_academic_level là context để AI chấm lexical_count và tổng độ khó ổn định hơn

5. Prompt đầy đủ nên dùng

5.1 Prompt cho từ đơn

Role: Linguistic Data Engine for a Vietnamese English Learner.
Task: Analyze the word "{word}" in its most common modern usage.

Input context:
- selected_part_of_speech: "{selected_part_of_speech}"
- selected_academic_level: "{selected_academic_level}"

Rules:
- Use the provided input context as the primary reference when assigning scores.
- Do not ignore or replace the provided labels unless they are empty or clearly invalid.
- Return exactly one strictly valid JSON object.
- No code blocks, no explanation.
- Include all 5 keys.
- Use integers only.
- If ambiguous, choose the most common learner-facing interpretation.

1. "pos_score" (0-10: score the provided part of speech by learner difficulty. 0=very easy lexical class, 5=medium, 10=hard functional/abstract class.)
2. "level_score" (0-20: score the provided academic level by overall learner difficulty. 0=very basic, 10=intermediate, 20=advanced/academic.)
3. "abs" (Abstractness 0-20: 0=Tangible object (apple), 20=Philosophical concept (existentialism).)
4. "cog" (Cognate/Familiarity for Vietnamese: 0=Loanword/Very familiar (internet, coffee), 15=Alien/Unknown root.)
5. "poly" (Polysemy 0-10: 0=Monosemous, 10=Highly polysemous (set, run, get).)

Output Example: {"pos_score": 0, "level_score": 3, "abs": 5, "cog": 0, "poly": 5}

5.2 Prompt cho cụm từ

Role: Linguistic Data Engine for a Vietnamese English Learner.
Task: Analyze the phrase "{phrase}" in its most common modern usage.

Input context:
- selected_part_of_speech: "{selected_part_of_speech}"
- selected_academic_level: "{selected_academic_level}"

Rules:
- Use the provided input context as the primary reference when judging difficulty.
- Return exactly one strictly valid JSON object.
- No code blocks, no explanation.
- Include all 5 keys.
- Use integers only.
- If ambiguous, choose the most common learner-facing interpretation.

1. "lexical_count" (Count difficult content words only: C1-C2 / Academic / Obscure. Ignore simple function words. 0 to 5.)
2. "idiom" (Idiomaticity Score 0-20: 0=Transparent/Literal, 10=Semi-idiomatic, 20=Opaque/Figurative.)
3. "syntax" (Syntactic Complexity 0-10: 0=Simple phrase, 5=Moderately complex phrase/clause, 10=Complex structure.)
4. "cog" (Cognate/Familiarity for Vietnamese: 0=Literal translation works / very familiar, 15=Completely different structure.)
5. "poly" (Polysemy 0-10: 0=Specific meaning, 10=Multiple context meanings.)

Output Example: {"lexical_count": 1, "idiom": 10, "syntax": 5, "cog": 5, "poly": 4}

6. Ý nghĩa của prompt

AI không tính thẳng F_base
AI chỉ trả về score cho từng factor
2 field từ data entry được đưa vào prompt để AI chấm điểm nhất quán hơn
engine dùng các score đó để tự tính F_base
nếu implementation cũ vẫn dùng key legacy như pos / cefr, có thể map tương đương với pos_score / level_score

7. Công thức tính `F_base`

7.1 Word mode

staticScore = len_score + pos_score + level_score
aiScore = abs + cog + poly
deduction = 30 + staticScore + aiScore
F_base = clamp(100 - deduction, 10, 70)

Rút gọn:

F_base = clamp(70 - (len_score + pos_score + level_score + abs + cog + poly), 10, 70)

7.2 Phrase mode

staticScore = wordCount_score + min(30, lexical_count * 8)
aiScore = idiom + syntax + cog + poly
deduction = 30 + staticScore + aiScore
F_base = clamp(100 - deduction, 10, 70)

8. Scale chính đang dùng

8.1 Engine auto score

len_score
- <= 4 ký tự -> 0
- 5-8 -> 5
- 9-12 -> 10
- > 12 -> 15
wordCount_score
- 1-3 từ -> 0
- 4-5 -> 5
- 6-7 -> 10
- > 7 -> 20

8.2 AI score

pos_score -> 0..10
level_score -> 0..20
abs -> 0..20
cog -> 0..15
poly -> 0..10
lexical_count -> 0..5
idiom -> 0..20
syntax -> 0..10

9. Parse và apply kết quả

Sau khi có JSON từ AI:

parse text thành object
map score vào calculator state
gọi engine tính lại F_base
apply F_base vào word state

Khi apply vào state thật:

update word.F_base
recalc word.d0 = computeD0(F_base)
nếu D_min / D_max đang bám default cũ thì update theo d0 mới
nếu word.d đang có giá trị thì clamp lại theo range mới
lưu lại genetics hiện tại

Điểm này quan trọng vì nếu chỉ update F_base mà không sync d0, state sẽ bị nửa cũ nửa mới.

10. Design rule để dev nhớ nhanh

AI chấm score cho toàn bộ factor, không trả F_base
Loại từ và Level học thuật phải được gửi vào prompt như context đầu vào
engine vẫn là nơi tính công thức cuối cùng
word và phrase dùng 2 công thức riêng
apply F_base phải sync thêm d0 và range decay