In these last couple of weeks, I made so many new friends and really got to explore the character of San Francisco. Now that my internship, along with the summer, has come to an end, I’m so grateful for the time that I got to spend there. At times it was hard and tedious scripting inside when I knew that the weather outside was so nice, but the sense of accomplishment when you finished a project was more than enough to fuel my progress.
I would say that I’ve met my learning goals because I have learned so much in terms of information extraction from working with sources with all sorts of formats and different languages; and source analysis, especially since the projects that I was working on were a part of a much large collective project to collect and document linguistic information. Unfortunately, I didn’t get to learn as much about the translational algorithms that we use as I would have liked because of the time constraint, but it was still interesting to argue and research about semantic ambiguity and sense disambiguation in order to provide the best translating through our database. But, I think that I learned the most by absorbing information from the collective experiences of the wonderful staff that I worked with.
I think that this summer has made it clear that I am capable of data extraction work, but I also learned that if the sources are too similar to each other, the work eventually became tedious to do because at that point, you aren’t writing code but rather changing variables and conditional statements. I tried to combat that by switching which types of sources that I was working on as well as the language that I was processing through so that the challenges that I would face would be different. This internship has shown me that I am still very interested in how a computer understands languages, but I would rather process information that is not as regular as the dictionaries, webinaries, and sources that I have been working on over the summer. I’ve learned that I’m also very much into researching different ways to tackle a problem and debating with someone the pros and cons of implementing within a system.
My advice to anyone who would be interested in working at PanLex is to be really interested in the work that they are doing, and to take initiative to research and bring up projects that you would like to do with the staff. The staff is very open to different views and ideas as long as you can support why this would be more beneficial than the current way. Furthermore, take advantage of all the resources and opportunities that come with working for a branch of a larger parent organization, and the fact that you are in San Francisco. I went to talks that were held by the Long Now Foundation, including one on Quantum Computing and the Rosetta Project, and have gone to different conferences, such as IMUG, with PanLex. As for the field, at some times, the work will be tedious, and others you will be trying to debug a problem for hours without making progress. Take it one step at a time, and try to set mini goals for yourself. Don’t be afraid to ask questions or ask someone to look over your code, and most of all, don’t be afraid to take breaks. Sometimes, it’s a matter of being in a different mindset, and looking at the problem with fresh eyes.
I think that the projects that I’m most proud of are the ones that focused on lesser-known, endangered, or extinct languages because I feel that by adding them to our database, we are doing our part in trying to fight against language death and proving a resource for languages that usually don’t get funding for translational programs such as Google translate. My favorite moments included when our database could translate something that Google translated as question marks, and I added linguistic data of a language into our database that was not supported by Google.
Sooyoung ’18