
Я в Google+ Я в Twitter Я в Facebook

Monday, March 21, 2011

API catalog (semantic)

Каталог API пополнился еще интересными семантическими предложениями:

1 Diffbot Article API
: The Diffbot Article API is used to to extract clean article text from article web pages. It’s powerful when combined with the permalinks that are extracted by the above APIs. The Diffbot Article API takes in as input any news story page. Statistical machine learning algorithms are run over all of the visual elements on the page to extract out the article text and associated metadata, such as its images, videos, and tags. If the article spans multiple pages, Diffbot will follow the next pages to get the whole article. There is also experimental support for extracting reader comments.

Итак, это API выделяет из веб-страницы "чистый" текст. Проверка на украиноязычных и русскоязычных сайтах дала очень неплохие результаты. Очень полезное и интересно API.

А вообще компания Diffbot предлагает собственную уникальную технологию индексирования в режиме реального времени.


  Diffbot has developed a real-time search and content aggregation technology. Our algorithm can take in any document and automatically and reliably determine the structural organization of the page, independent of the layout of the page or language of the text. Our algorithm uses statistical signal processing techniques to analyze visual properties, much like a human does, to determine how the parts of the page fit together.

High-resolution indexing enables new relevance algorithms

Current search engines treat URLs (and the webpage corresponding to the URL) as the indexing atom. Typically, results pages comprise a listing of URLs. However, modern webpages no longer represent a monolithic article. Instead, they are commonly generated from a template and database backend, pulling together multiple pieces of content to render a screen, making the actual URL increasingly irrelevant.
Diffbot technology allows for indexing of the microdocuments, the components a webpage is made from. Diffbot can recognize the internal structure of the page, whether a component is a blog post/article/navigation bar/spam, etc. This can be used to improve the terms used in indexing (i.e. navigation text/spam/non-relevant posts on the page can be disregarded) or enable new classes of relevance algorithms (such as microdocument focused pagerank-like algorithms).

Improved Summary Generation

Two common approaches used to generate the text snippet in a search result are to generate a text snippet for the entire page or to generate a context-specific snippet based on the text neighborhood of the query terms. Text summary generation can be improved if precise boundaries of the relevant part of the page are known. New interfaces for showing the search results are also possible, such as more magazine-like interfaces that show the microdocument’s title, image, and context.

2. DocRaptor API
: DocRaptor is a RESTful API that allows users to generate PDF and Excel documents using simple HTML. Users can use styles to style the HTML and it will translate to some XLS/PDF formatting. The API also has the ability to run any JavaScript in the HTML document before converting it. The API works in any programming language and even supports Java.
 API для генерации по html-документу pdf и excel - файлов.

3.ICanLocalize API
: ICanLocalize is a translation management tool that provides website translation, software/app localization, and other text translation services. With the ICanLocalize API, you can set up a project, send documents to translate, and return the translation when it’s complete. With the API, all interaction between clients and translators can be embedded in the host application.

4.Sugestio API
: Sugestio is a scalable and fault tolerant service that now brings the power of web personalization to all developers. The service provides an easy to use service interface and a set of development libraries that enable you to enrich your content portals, e-commerce sites and other content based websites. The service is currently in beta, with several pilot projects and the API.

Маленький шажок в сторону trusted data - доверию (верхний слой пирога Тима)?

5.  tru.ly Verification API

is a free verification platform based on government and private data. It allows users to link various social accounts including Facebook, Twitter, and Linkedin, while protecting personal information as they wish; generate a QR code that is unique to the user’s identity, making it easy to share, without divulging details; utilize a browser plugin to see what profiles are verified on social networks and request someone be tru.ly
: tru.ly
 verified in order to authenticate their online identity.

Currently tru.ly offers an API for dating sites. Meant as an online dating profile enhancer, the API allows attributes like age, sex and location to be verified. The API is in beta and public documentation is not available.

Сорри, заметка составлена памяткой для себя и не переводилась. Кому интересно - заходите на сайты производителей и разбирайтесь.

No comments:

Post a Comment


Использование материалов сайта

Информация, представленная на сайте, может свободно использоваться и распространяться при обязательном указании активной прямой ссылки на сайт http://in-search-of-semantics.blogspot.com/, а тексты научных статей – при указании авторства и ссылки на бумажную публикацию.

При размещении текстов статей на своих сайтах, блогах и пр., пожалуйста, присылайте ссылки.
Комментарии перед публикацией предварительно модерируются.

Хотите сказать спасибо автору? Не откажусь :) - Поставьте ретвит на пост, или другую социальную закладку. Спасибо.