eBay Product Scraping, Manta Data Scraping, Website Screen Scraping, Website Screen Scraping, Website Scraper, Scraping Data from Websites, Website Information Scraping, Web Scraping Services, Scraping Data from Websites, Website Information Scraping

Wednesday, 20 August 2014

Web Scraping data from different sites


I am looking for a few ideas on how can I solve a design problem I'm going to be faced with building a web scraper to scrape multiple sites. Writing the scraper(s) is not the problem, matching the data from different sites (which may have small differences) is.

For the sake of being generic assume that I am scraping something like this from two or more different sites:

    public class Data {
        public int id;
        public String firstname;
        public String surname;
        ....
    }

If i scrape this from two different sites, I will encounter the situation where I could have the following:

Site A: id=100, firstname=William, surname=Doe

Site B: id=1974, firstname=Bill, surname=Doe

Essentially, I would like to consider these two sets of data the same (they are the same person but with their name slightly different on each site). I am looking for possible design solutions that can handle this.

The only idea I've come up with is scraping the data from a third location and using it as a reference list. Then when I scrape site A or B I can, over time, build up a list of failures and store them in a list for each scraper so that it can know (if i find id=100 then i know that the firstname will be William etc). I can't help but feel this is a rubbish idea!

If you need any more info, or if you think my description is a bit naff, let me know!

Thanks,

DMcB


Source: http://stackoverflow.com/questions/23970057/web-scraping-data-from-different-sites

Tuesday, 19 August 2014

Scrape Data Point Using Python


I am looking to scrape a data point using Python off of the url http://www.cavirtex.com/orderbook .

The data point I am looking to scrape is the lowest bid offer, which at the current moment looks like this:

<tr>
 <td><b>Jan. 19, 2014, 2:37 a.m.</b></td>
 <td><b>0.0775/0.1146</b></td>
 <td><b>860.00000</b></td>
 <td><b>66.65 CAD</b></td>
</tr>

The relevant point being the 860.00 . I am looking to build this into a script which can send me an email to alert me of certain price differentials compared to other exchanges.

I'm quite noobie so if in your explanations you could offer your thought process on why you've done certain things it would be very much appreciated.

Thank you in advance!

Edit: This is what I have so far which will return me the name of the title correctly, I'm having trouble grabbing the table data though.

import urllib2, sys
from bs4 import BeautifulSoup

site= "http://cavirtex.com/orderbook"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
print soup.title



Here is the code for scraping the lowest bid from the 'Buying BTC' table:

from selenium import webdriver

fp = webdriver.FirefoxProfile()
browser = webdriver.Firefox(firefox_profile=fp)
browser.get('http://www.cavirtex.com/orderbook')

lowest_bid = float('inf')
elements = browser.find_elements_by_xpath('//div[@id="orderbook_buy"]/table/tbody/tr/td')

for element in elements:
    text = element.get_attribute('innerHTML').strip('<b>|</b>')
    try:
        bid = float(text)
        if lowest_bid > bid:
            lowest_bid = bid
    except:
        pass

browser.quit()
print lowest_bid

In order to install Selenium for Python on your Windows-PC, run from a command line:

pip install selenium (or pip install selenium --upgrade if you already have it).

If you want the 'Selling BTC' table instead, then change "orderbook_buy" to "orderbook_sell".

If you want the 'Last Trades' table instead, then change "orderbook_buy" to "orderbook_trades".

Note:

If you consider performance critical, then you can implement the data-scraping via URL-Connection instead of Selenium, and have your program running much faster. However, your code will probably end up being a lot "messier", due to the tedious XML parsing that you'll be obliged to apply...

Here is the code for sending the previous output in an email from yourself to yourself:

import smtplib,ssl

def SendMail(username,password,contents):
    server = Connect(username)
    try:
        server.login(username,password)
        server.sendmail(username,username,contents)
    except smtplib.SMTPException,error:
        Print(error)
    Disconnect(server)

def Connect(username):
    serverName = username[username.index("@")+1:username.index(".")]
    while True:
        try:
            server = smtplib.SMTP(serverDict[serverName])
        except smtplib.SMTPException,error:
            Print(error)
            continue
        try:
            server.ehlo()
            if server.has_extn("starttls"):
                server.starttls()
                server.ehlo()
        except (smtplib.SMTPException,ssl.SSLError),error:
            Print(error)
            Disconnect(server)
            continue
        break
    return server

def Disconnect(server):
    try:
        server.quit()
    except smtplib.SMTPException,error:
        Print(error)

serverDict = {
    "gmail"  :"smtp.gmail.com",
    "hotmail":"smtp.live.com",
    "yahoo"  :"smtp.mail.yahoo.com"
}

SendMail("your_username@your_provider.com","your_password",str(lowest_bid))

The above code should work if your email provider is either gmail or hotmail or yahoo.

Please note that depending on your firewall configuration, it may ask your permission upon the first time you try it...



Source: http://stackoverflow.com/questions/21217034/scrape-data-point-using-python

Saturday, 16 August 2014

Has It Been Done Before? Optimize Your Patent Search Using Patent Scraping Technology

Has it been done before? Optimize your Patent Search using Patent Scraping Technology.

Since the US patent office opened in 1790, inventors across the United States have been submitting all sorts of great products and half-baked ideas to their database. Nowadays, many individuals get ideas for great products only to have the patent office do a patent search and tell them that their ideas have already been patented by someone else! Herin lies a question: How do I perform a patent search to find out if my invention has already been patented before I invest time and money into developing it?

The US patent office patent search database is available to anyone with internet access.

US Patent Search Homepage

Performing a patent search with the patent searching tools on the US Patent office webpage can prove to be a very time consuming process. For example, patent searching the database for "dog" and "food" yields 5745 patent search results. The straight-forward approach to investigating the patent search results for your particular idea is to go through all 5745 results one at a time looking for yours. Get some munchies and settle in, this could take a while! The patent search database sorts results by patent number instead of relevancy. This means that if your idea was recently patented, you will find it near the top but if it wasn't, you could be searching for quite a while. Also, most patent search results have images associated with them. Downloading and displaying these images over the internet can be very time consuming depending on you internet connection and the availability of the patent search database servers.

Because patent searches take such a long time, many companies and organizations are looking ways to improve the process. Some organizations and companies will hire employees for the sole purpose of performing patent searches for them. Others contract out the job to small business that specialize in patent searches. The latest technology for performing patent searches is called patent scraping.

Patent scraping is the process of writing computer automated scripts that analyze a website and copy only the content you are interested in into easily accessible databases or spreadsheets on your computer. Because it is a computerized script performing the patent search, you don't need a separate employee to get the data, you can let it run the patent scraping while you perform other important tasks! Patent scraping technology can also extract text content from images. By saving the images and textual content to your computer, you can then very efficiently search them for content and relevancy; thus saving you lots of time that could be better spent actually inventing something!

To put a real-world face on this, let us consider the pharmaceutical industry. Many different companies are competing for the patent on the next big drug. It has become an indispensible tactic of the industry for one company to perform patent searches for what patents the other companies are applying for, thus learning in which direction the research and development team of the other company is taking them. Using this information, the company can then choose to either pursue that direction heavily, or spin off in a different direction. It would quickly become very costly to maintain a team of researchers dedicated to only performing patent searches all day. Patent scraping technology is the means for figuring out what ideas and technologies are coming about before they make headline news. It is by utilizing patent scraping technology that the large companies stay up to date on the latest trends in technology.

While some companies choose to hire their own programming team to do their patent scraping scripts for them, it is much more cost effective to contract out the job to a qualified team of programmers dedicated to performing such services.

Source:http://ezinearticles.com/?Has-It-Been-Done-Before?-Optimize-Your-Patent-Search-Using-Patent-Scraping-Technology&id=171000

Tuesday, 5 August 2014

Scrape Online Selling - Estate Sales

You can find a number of real bargains and few great items to vend at estate sales. The true estate sales are exactly as the name denotes; when someone tries to sell an entire estate or house full of items. The items are generally sold individually or in lots. An estate sale transpires when somebody may have passed away and no one knows what to do with all their junk. Perhaps the relatives can't take all the stuff with them. They hired the company to perform an estate sale or are performing the estate sale themselves.

At a typical sale you will see the house and everything attached to it will be for sale. These sales are publicized in your local newspaper's classified ads section. Estate sales are more modest than garage sales as there are lots of deals to be made.

Estate Sales Tips

• Get there early! Just as with garage sales, the most valuable items are going to be gone in the first few hours.

• Haggle. It is still a good idea to try to lower the sales price of items you are thinking of buying.

• Return to the sale the next day. Most os the time, these types of sales are two day events. I have found that on the second day the operators usually lower their prices significantly. The incentive is also there for the operators to unload everything that is left over.

• Introduce yourself. Usually these type of sales are performed by companies on behalf of the family of the estate. You can leave your business card with the operators and ask them to give you a call about future estate sales. In fact, most of the companies that perform the these sales let you sign up to be notified by e-mail of up and coming sales.

Used Bookstores

If you have any used bookstores around your area you can use the store to find used books to sell books online. Most used bookstores get infested with books but have a limited space in which to store all those books. Because of this, they may be willing to sell you some of their overstocked inventory at a deep discount.

It never hurts to speak to the owner or manager and try to get them to sell you some of their inventory. You can leave them with your business card and let them know that you are always in the market for volume sales. Most owners will always listen to a buyer who is willing to buy in volume since shelf space is a large problem for traditional brick and mortar used bookstores.

Source: http://ezinearticles.com/?Scrape-Online-Selling---Estate-Sales&id=7413912

Thursday, 31 July 2014

How to Trick Google With Your SEO Articles and Web Content

So you're spending time writing SEO articles and creating highly optimised web content, or you're using an article service to create articles for you? What made you click on the link that brought you to this article then?

Perhaps you're looking for a sneaky little trick that will power your articles to the top of the search results in no time at all? You're looking for an edge that no one else has got that will let your content rush to the top of the results like a flatulent cork in water wings? Well read on...

Even the most average internet marketer cannot help but to have become aware that keyword stuffing is no longer effective. Indeed, keyword stuffing is highly likely to see a website demoted or even blacklisted. Today there is a need for high quality content, and for content which is unique and original, as well as popular. The trouble is that this can make the job much harder. Having to spend time creating good, solid, readable content which is useful and interesting is time-consuming.

Having to spend time creating content which might be considered worthwhile by real people is a lengthy an involved process. It used to be so much easier when you could just fling any old rubbish online and let the search engines lap it all up like hungry dogs. Today it seems that those dogs have turned, and unless you want them to bite, you need to spend time actually thinking about your potential customers, rather than just those nice friendly bots and spiders you've been so used to.

This is clearly a difficult situation, and the only option seems to be to succumb to the will of the search engines and spend time creating well-written, highly optimised content that appeals to both the search engines and real people. Goodness - you might even write something people really find interesting, and may want to link to. You never do know these days.

But of course, you clicked the link for this article, because you're looking to change all that. Rather than spending time crafting you'd rather be churning; rather than writing readable content you'd prefer to be chucking out text that looks as though your word processor and your thesaurus have been having an affair!

What you really want is to be able to press a magic button and have your articles fly up the search results, and magically draw thousands of keen, enthusiastic customers flooding to your website, ripping open their purses and wallets with such feverish excitement that you'll hardly know what to do with all that easy cash you'll be wallowing in.

As someone who provides an article service to internet marketers and business owners, and who writes SEO articles for a living, I have a few words of advice for those of you who want to try to get your articles above mine, who want to see your articles power ahead of mine and take hold of the search results pages by the horns.

Whilst I may sit here taking time to research each and every article I write, plan every article so that it has something to say, write it in a way that makes it entertaining, enjoyable and informative for those real live people who exist out there on the other side of the web, craft articles in a way that takes full advantage of Google's algorithms, optimised for latent semantic indexing, yet making it almost entirely undetectable, you want to discover a secret formula that will launch your articles with barely more than a flick of your wrist.

You probably want to find out what this secret formula is so that you can spend less time hurling hundreds, perhaps thousands of articles out every week just to scrape by. Meanwhile, I'll write an article once every week or so. You'll notice them because they always end up boosting my website up to the very top of Google for all the major keywords and key phrases I have chosen, despite several billion other sites all appearing for the same searches.

Well, here it is. The magic formula, the button you want to press is coming right up. Forget those black hat techniques that simply blast meaningless content at thousands of identical directories. To really achieve success with your SEO articles and enjoy the same level of exposure as my article service, the magic formula is this: forget writing SEO articles. That's it. When you're writing your next article, forget that it's an SEO article.

Source: http://ezinearticles.com/?How-to-Trick-Google-With-Your-SEO-Articles-and-Web-Content&id=4078570

Thursday, 10 July 2014

Restaurant Internet Marketing - It's Not Your Father's Yellow Pages

As a restaurant owner or manager are you taking advantage of all the opportunities offered up by the Internet or are you still counting on word of mouth and a traditional print marketing program to get diners filling your place? Are you even aware of what the Internet marketing can do for you? If you're not you're giving up business to your competition.

Take a moment to Google your type of restaurant (Italian, seafood, deli etc) and the name of your town and see what comes up on the first page. Do you see your name? How many competitors do you see? Now ask yourself "How many potential diners are going to come to my restaurant based on these results"? The answer of course is nada. Can you afford to give that business to your competition?

Location, location, location

Many local small and medium sized businesses have discovered the value of having an internet presence but it wasn't always that way. Most local businesses thought the internet best served big companies. On top of that, most small local businesses simply didn't have a clue how to participate effectively and they were too busy running their businesses to learn how.

The fact of the matter is that 72% of all searches are related to a search for local content. Nearly a year ago the monster search engine Google changed their algorithms so that local results show up on a search inquiry whenever appropriate. That means your shop can be just as competitive as a national restaurant chain when it comes to being found in search.

The difference between search and traditional advertising

So why is search such a big deal? The reason it is so much more effective than traditional marketing is that it responds to a specific need at a time when the potential diner is interested in the information. Unlike a weekly flier that's dropped in the home mailbox every Thursday, search doesn't clutter up a prospective diner's life but rater provides relevant information when the diner wants it not when the USPS delivers it.

When that potential customer is thinking about where to go for lunch he or she is going to search "pizza san pedro, ca" to find what's available in the area. In other words search delivers the information when the searcher is most receptive to it.

But wait there's more

Search by itself is reason to have a presence on the internet but the benefits certainly don't end there.

How would you like to be able to:

- Notify all of your customers what your daily specials are every day automatically.
- Invite your customers to place their order in advance so they don't have to wait for their food and do it without spending any time on the phone.
- Accept and confirm reservations electronically.
- Run promotional campaigns without spending a dime on advertising.
- Build solid customer relationships that will strengthen your word of mouth referrals.

And that's just the beginning. A professional internet presence will cost you significantly less than you are currently paying today for marketing. What's more, professionally managed internet presence keeps on giving even when you stop paying. Can you say that about your Yellow Page listing or print ads?

Online Reputation Management

There can be a potential dark side to this whole internet thing. You live or die on your reputation and it is important that you manage that reputation online. Never has the consumer had such an easy way to post anonymous comments for all the world to see. If you have a customer who has a bad experience don't be surprised if he or she posts it to a blog or a forum or gives you a bad review in the business directories.

Speaking of business directories, you already are on the internet even if you didn't know it. These directories scrape information out of Yellow Pages and other sources and add it to a business profile. You have a right to claim those listings and they are actually a great way to promote the business. But registered visitors also have the right to post reviews and if the only review your store has is from that disgruntled customer...your rep is shot.

There are just too many reasons that you have to get involved with the internet marketing. To not do so is to pass up an incredible opportunity for new business and to cement loyalty with regular customers. It's effective, inexpensive and essential if you want to remain competitive.

WebFrootz is an internet marketing agency, Baltimore website design, and Baltimore web development company MD specializing in comprehensive marketing solutions for small and medium business including branding, Website design and development, graphic design, copywriting, and SEO.

Source: http://ezinearticles.com/?Restaurant-Internet-Marketing---Its-Not-Your-Fathers-Yellow-Pages&id=6498349

Wednesday, 9 July 2014

Infinity Logo Design Launches Content Writing Service

Infinity Logo Design recently added Content Writing Service to its pool of services, available at a staggering discount of 70%. Following the huge success of its Animated Video Service, Infinity Logo Design has now officially commenced its Content Writing Service as well. With a clientele of nearly a million, Infinity Logo Design is one of the fastest growing design companies of the world. And its recent diversification advocates the fact that the company looks fanatical in achieving its goals and objects with one of them being "to become the number 1 design & development company of the world". Providing a complete range of Content Writing services which includes Web Content Writing, SEO Content Writing, Creative writing, Press Release writing, Article Writing and Brand/Product Name Suggestions, Infinity Logo Design is dedicated to providing a superior experience to the readers.

According to the company's Creative Manager, "We have been offering Content Writing service as a secondary service for quite some time, but following its augmented demand and some great feedback from our clients, we decided to launch it officially. We consider Content Writing as an art that truly fascinates and inspires the readers, and makes them believe in the message that the writer is trying to covey. Our copy writers are handpicked, and have a proven track record. We provide our customers with nothing but the very best!"

Infinity Logo Design believes that Content Writing in an effective tool for customer retention and generating a profound user experience. Infinity Logo Design is offering this service at quite a substantial price, and its exceptional discount of 70% is a bonus for all those who are in pursuit of quality content at an affordable price. Judging from the company's recent endeavors, there's no denying that we'll be seeing some great things from them in the years to come.

About Infinity Logo Design:

Infinity Logo Design is one of the world's leading design companies. The company offers a wide range of services including custom logo design, website design, brochure design, banner design, stationery design and promotional items. ILD provides high-quality designs to small- and medium-sized businesses at the most competitive prices. In addition, the company engages the services of skilled and talented designers who can create your design exactly according to your specifications. The Company also offers 100% satisfaction and 100% money back guarantee, making it one of the most sought-after design companies in the world.

Source: http://www.digitaljournal.com/pr/1985193