Wildcards Used to Help Avoid Too Many Near Duplicates

Short, easy-to-remember URL to this page


  • These are "wildcards" or "tokens" used to avoid submitting too many near duplicate sentences.
  • "Near duplicates" are things like simple transformations or similar sentences that any moderately competent second language learner can create on his own. For example: I'm happy. => I am happy. / Tom's happy. => He's happy. => She's happy.
  • Of course, sometimes near duplicates are unavoidable.

The Wildcards

Tom, Mary, John, Alice
Unless a sentence doesn't make sense, use names in this order. If a sentence seems to sound more natural with a female name, use Mary first.
These were the top 4 names being used in sentences on the Tatoeba Project when these names were chosen, so it seemed logical to choose these.
This helps avoid near duplicates such as
  • Tom likes Mary. Jane likes Fred. Mr. Jones likes Ms. Smith.
  • Tom went shopping. Ted went shopping. He went shopping. She went shopping.
  • Tom asked Mary to help John. John asked Tom to help Mary.
If a sentence sounds natural using the above names, I don't usually contribute sentences with pronouns.
This helps avoid near duplicates such as
  • Tom swims. He swims. She swims.
  • Tom and Mary swim. They swim.
  • Give Tom this. Give him this. Give her this.
Jackson (If you need a 2nd family name in the same sentence, use Smith.)
Default family name (surname). Mr. Jackson, Dr. Jackson, Mr. and Mrs. Jackson. Tom Jackson (when full name is needed.)
If you need a city name, use Boston, whenever you can. Of course, sometimes a specific city needs to be mentioned (___ is the largest city in Australia.) If another city name is needed in the same sentence, use Chicago.
Default country name.
Default nationality.
Default day of the week.
Default month.
October 20th.
Default date.
Default age when it doesn't matter. However, sometimes younger makes sense (13) or (3), or older "Tom retired at 65."
Three, 13, 2013
When different numbers seem more appropriate, use one with a 3, if possible.
Default time. (Use 6:30 for early morning, early evening.)
French, English
Default language used in this order
I study French and English.
Default university name.
Default pet name. dog, cat, hamster, ...
Park Street
Default street name. It is the first non-numbered street name on the list of high-frequency street names in the USA. It was already being used in the Tatoeba Corpus.
If other street names are needed, use in this order: Park Street, Main Street, Oak Street, Pine Street, Maple Street (in the frequency order found by US Census Bureau.)
Use contractions whenever they sound more natural. This helps make the audio files more natural-sounding.
For example, "I don't like to jog," instead of "I do not like to jog."
Of course, for sentences that would primarily be used as written language, contractions may not sound natural.
I don't use exclamation marks (!) when periods (.) will do. For out-of-context sentence examples, this is more natural, I think.
Note: Many non-native English speakers tend to overuse exclamation marks. Sometimes, I have to submit a "near duplicate" in order to have the more natural example sentence for use in my projects.
I use punctuation within quotes, when that's the standard way Americans do it.
Still incomplete: CK, 2013-02-21, updated 2013-08-10, 2013-12-30, 2014-01-26, 2015-12-27


  • Wildcard Demo #1
    2013 - This shows various simple English-only substitutions.

  • Wildcard Demo #2
    2017 - This one shows bilingual English-Japanese substitutions. There are many subpages, showing various things.

  • Wildcard Demo #3
    2017 - This is similar to Demo #1, but just quickly shows the name "Tom" being changed to the name "Fadil".