Frequently asked questions
How does your app evaluate questions automatically?
First, the question is given to the language model, together with instructions to answer briefly – some models tend to produce long answers otherwise. Then, the model's answer is evaluated in a two-step process: first string comparison and then LLM comparison.
In the first step, the answer string is simply compared to the correct answer string provided by the user. If the two are the same, the answer is evaluated as correct. Otherwise, the model and correct answer are given to another LLM (GPT-3.5-Turbo). This evaluation model is asked to determine whether the two answers are semantically equivalent. If the model determines that the two answers are semantically equivalent, the answer is evaluated as correct. Otherwise, the answer is evaluated as incorrect.
This entire procedure is preliminary, reach out with feedback if you have any!
I am getting a 502 Bad Gateway Error, what should I do?
Sometimes the API requests to the language models take too long and our hosting server times out. Just reload the page (sending the same data) and it should continue where it left off. A more permanent solution is in the works.
What model configuration are you using?
All models are used with the default parameters in LangChain other than that the temperature is set to 0. For models hosted by Replicate (llama-2, falcon) the temperature is set to 0.01.
I need some support or want to share feedback, where can I reach you?
Just write a message to email@example.com and we will try to get back to you as soon as possible! Thanks a lot for your help!