Searchable Karnataka State, District, and Taluk Gazetteers

First of all, what is a Gazetteer?

Here is the definition from the Karnataka Government Gazetteer website from which I have downloaded all the gazetteers:

Earlier, a Gazetteer signified a geographical index or geographical dictionary or guidebook of important places and people. But with the passage of time its range has vastly widened and it had come to mean a veritable mine of knowledge about the numerous aspects of life of the people and of the country or region they inhabit.

Till I looked up the definition for this article, I thought all these are gazettes. But as mentioned on the Gazetteer website, “Gazetteers are distinctly reference volumes of lasting value while the Gazettes are official newspapers or bulletins.

Benefits

Gazetteers, as mentioned earlier, give extensive details of a region. Coming from the government makes them a well-researched, credible, and authoritative source.

The historical insights, right from the story behind the name of an area to the chronology of events and important people who shaped history help us in appreciating the rich past and its journey.

Population and its composition, languages, occupations, cultural beliefs, tools and techniques, demographics, etc., provide a wealth of information to those interested in knowing about their heritage and diversity.

Administrative details, government policies, divisions and sub-divisions of a region, maps, public services, etc., help in knowing the evolution of boundaries, infrastructure, and decision-making process of the bygone era.

Agriculture and irrigation methods give us information on what worked earlier and why.

Town planning, revenue streams, transportation and communication, health etc., give us vital geographical details of an area and help in making informed decisions about its future.

Researchers and students will find this information useful for their projects.

Treasure trove for quizzers.

Outcome

Total PDFs downloaded: 1200
Total non-converted files: 127
Total searchable files: 1073

I have shared all the 1200 files in my One Drive..

Notes on non-converted files are available in respective folders. Some gazetteers are made searchable by the government itself. I have included them here for easy access.

Rest is my story about this project. If you are only interested in files, you can stop reading here.

Why did I take up this task?

I was curious about freedom fighters from my hometown, Pavagada, and the role my native villages played in the freedom struggle. Around the same time, I had lost my book on Hoysala temples. So, I started looking for sources that would provide me with reliable information on these. Whenever I need such reliable or authentic sources, I generally look for official documents. In this case, it was a gazetteer.

(Why was I curious about freedom fighters from Pavagada? That is for another day, another story.)

Soon, I realized I had to read through several pages in multiple PDFs to get even a small piece of information. This was the trigger. I had to make them searchable. I had the right tools to do that. But time was a challenge. So, I set myself a very generous deadline for completing this project by the end of the year. My passion for accessibility and ensuring that all information is readily and easily available to everyone added to the cause.

Another, rather minor reason, is that I don’t want these documents to disappear when government websites migrate or update. (See the challenges later in this article for an example.)

Process

The process is simple but monotonous.

  1. Access the Gazetteer website.
  2. Click the links.
  3. Download PDFs.
  4. Verify all PDFs are downloaded.
  5. Run Scan and OCR in Adobe Acrobat Pro.
  6. Do random checks in converted PDFs by searching for text.
  7. Make a note about the ones that are not converted along with the reason/s.

That’s it.

Challenges

There are hundreds of PDFs, 1200 to be precise. Each PDF has to be downloaded separately. No bulk download options. (My techie friends might have easier methods like scraping to download all PDFs from a webpage in one go.)

Some links are broken on the website. So, PDFs are missing.

Some links are broken in the Kannada version of the website but work in the English version.

Accessing each district/ taluk/ publication year, clicking the chapter’s PDF link to download, and running the OCR in Acrobat Pro is excruciatingly time-consuming. I started in mid-2023 and completed it on 18th Feb 2024. There were days I downloaded or converted 100+ PDFs. There were months without a single download/ conversion.

Fonts are a nightmare for any document digitization project. These gazetteers use a whole lot of Kannada fonts – Nudi non-unicode fonts, Baraha fonts, and some unrecognizable ones. Generally old Kannada documents use Shree Lipi fonts for Kannada. But even the official Shree Lipi to Unicode converter didn’t recognize these. No clue what those fonts are.

But what kept me going? It was helping me. So had to do it. Also, I had spoken about this project with too many people. I couldn’t embarrass myself after that. 🙂

Next steps (or pending tasks)

OCRs are not perfect. While I have randomly checked the conversion accuracy, I know there will be some misread content. So, all these PDFs must be proofread.

About 100 PDFs use Non-unicode Nudi or Baraha fonts. Currently, their content can be searched only after converting to Word format. But I guess these will be searchable PDFs if the font is converted to Unicode. And, the only tool I know for this job is Aravinda’s Sanka Unicode Converter. Converting this massive quantity of text is a mini-project in itself.

I have not downloaded/ converted special editions that are available on the Gazetteer website. They were not in my scope when I started. But they are precious and must be preserved.

Got any suggestions or feedback? Let me know in the comments. (Comments are moderated to avoid spam.)

There is no U in “technology”

Do you have apps on your mobile that seems to do so many things as per the user manual or reviews but you are unable to find even half of them?

Have you seen a presentation that seem to have useful information but is barely readable even if you are just 3 feet away from the projected screen?

Have you come across a poster of your favorite director’s or actor’s movie in which you can’t even read the title?

Heck, why so far… do you know the purpose of all the buttons, fields, etc., (UI elements for techies) on the homepage of your most used desktop/ mobile application? Even if you know, did you really figure it out as soon as you installed the app?

In this section, I will make an attempt to write about the importance of “sense” in technology.

While I have been curious (sometimes even frustrated) about why designers, developers, and organizations in general focus too much making their products look good while losing out on making it easier to use, Luke Wroblewski inspired me to dig more into it seriously. Ever since I came across his website, I started reading more about user experience and asking “Is it easy to use by the intended audience?” in many things I see or use, both tech and non tech.

Well, “u” is not there in “technology”, but it ends with a “y”, isn’t it?

And one more thing

(From Steve Jobs of course.)

If you are looking for theories and principles of UI/ UX, you are in the wrong place. I don’t like/ believe those and you won’t find any here. I will just pick up some product and share my opinion about it.


How I learned that Adobe Captivate is more than a demo capturing tool?

Recently, I was honored to be rewarded with the Most Valuable Participant (MVP) badge for my contributions to the Adobe community forums. This made me look back how I ended up getting here from where I was about 2-3 years ago.

Till Adobe Captivate 6, I used the tool primarily to capture transactions and generate simulations in demo, try-me, and assess-me modes. I just add introduction and closing slides before the simulation and I was done with Captivate.

But one day while looking for some solution on the Adobe community forums I came across several posts where users were asking questions about features that I didn’t even know about. More importantly, these were features I could use so well in my own courses. Soon after that, I ended up spending hours on these forums browsing through the posts as old as a month.

In this exercise, I discovered the real power of Captivate. The biggest influencer for me has been Lieve Weymeis, or Lilybiri as she is popularly known as in Adobe circles. She is one of the few experts who selflessly help fellow users on the community every single day. I got a reference to her blog in one of her replies to a user query and since then she has become my Captivate guru. This is the only blog which primarily focuses on advanced features of Captivate. Honestly, I am not aware of any other website/ blog which offers such in-depth information for free.

Apart from Lilybiri, I am also learning from Jim Leichliter (Captivate-Javascript), Rick Stone a.k.a Captiv8r, and Rod Ward.

As I started getting more hands-on with the tool, I felt a better way to learn is to take up real-life examples. For this, along with working on use cases of my own, I started to reproduce the issues users posted on the forums and share the solution whenever I had one. If somebody else posted a solution, I practiced that as well. If the solution didn’t work for me, I asked for more info. I spent many evenings and weekends doing this. Gradually, I was ready to offer to immediate solutions for many issues.

This helped me a lot even professionally. My newfound knowledge helped me come up with some great interactivities, reusable components, and game-based quizzes for our own courses, making them more engaging than ever.

I am aware that I have only scratched the surface of Captivate and there are many more things to uncover. Even today, I try to practice the same way. However, I must admit the frequency is not as much as I want to. I have my reasons, but then they are probably excuses. But one thing is for sure. it will not stop anytime soon.

CES 2015

Every year, I look forward to CES (Consumer Electronics Show) and MWC (Mobile World Congress) conferences. This year’s CES was mostly about Internet of Things. In fact so much that, some said it was about Internet of Everything and I could not agree more. 

Here is a glimpse.

Aren’t these cool?

Parrot Pot: The first question my mom asks me whenever she comes back from a day’s outing, is whether I watered plants? My usual answer, “Oh! I forgot”. If you are anything like me, you will love Parrot Pot. Why? Coz it automates your job. Check out this video.
https://www.youtube.com/watch?v=vwbsNkBbnAY

AmpStrip: There are many healthcare-wearables now at various stages of development. But this one caught my eye because of its size and ease. Just peel it and stick it. It does all the hard work of monitoring the most critical part of your body, the heart. What else does it do? Check it out here.
https://www.youtube.com/watch?v=9i4iF7mNulI

MakerBot’s composite metal 3D printer: I have always been fascinated by the realistic output of 3D printers. Now, with the composite metal option, I can only see their benefits growing.

Sony’s Smart Eye: Yayy!!! Finally I have some hopes of getting a “Glass”. 
http://www.cnet.com/news/sony-teases-new-head-wearables-at-ces-2015-smarteyeglass-attach-and-the-smart-b-trainer/

Connected cars: Wow!!! Cars that charge, drive, and park on their own, cars that assess your driving skills and send report to your insurance company, gesture controlled car, watch-controlled cars…

Do we really need these tech?

mamaRoo’s baby rocker: Putting your kid back to sleep when it suddenly wakes up from a nap is no easy task. But is it really worth “automating”? 
http://coolmomtech.com/2015/01/4moms-mamaroo-infant-seat-and-swing/

Baby Gigl: Inclinometer in this bottle tells you how to hold the it while feeding your baby.
http://mashable.com/2015/01/04/smart-baby-bottle-baby-gigl/

Smart pacifier Pacif-i: GPS tracker for your toddler. Well, among other things.
http://www.popsci.com/ces-2015-smart-pacifier-peace-mind-blue-maestro

Sensoria’s smart socks: Socks that track your exercise and send the data to your mobile. Just curious. Do they stink as well? 
http://www.cnet.com/products/sensoria-fitness-smart-sock/

Digitsole insoles: What if you don’t like wearing socks? Don’t worry, your insoles can track them. 
http://www.digitsole.com/#video

Sidewing:

You get into the car to go to office. But it stops by your favorite coffee shop on the way. The guy at the takeaway counter hands over your preferred coffee. By the time your finish your coffee, you have arrived at your office. You get down, wrap up your work, and your car is at the door to take you to the dinner party your friends have organized for you. Once you are out of it, you are taken to the airport just in time to receive your son who is coming home for vacations. Your car drops you at your house, goes to the 24/7 service center to get some minor repairs done, and is back in the garage fully geared up for another day’s grind.

You car knows everything. It is connected. And, this is just a sample.